MCP is not a tooling problem. It is a governance problem.
After shipping 458 governed tools across an MCP platform, the lesson isn't about volume. It's about the four production failure modes that nobody writes blog posts about — and why most teams ship MCP the same way they shipped REST APIs in 2009.
The chart everyone uses to introduce MCP is a hub-and-spoke diagram: an agent in the middle, tools on the outside, arrows pointing inward. It is correct. It is also misleading. After eighteen months of running an MCP platform with 458 tools and twenty-three packaged Claude skills across four enterprise B2B clients, I can tell you the diagram you actually need is a permission matrix and a paper trail. The hub-and-spoke version makes MCP look like plumbing. It is not plumbing. It is a governance surface that has plumbing behind it.
This matters because most teams I have advised in 2026 are shipping MCP the same way they shipped REST APIs in 2009: write the connector, document the request shape, ship. Then a quarter later they have the same problems REST APIs had — and a few new ones REST APIs did not have — and no one is sure why their agent platform is fragile.
What the tool count actually measures
458 sounds impressive at a dinner. It is the wrong number to brag about.
The tool count is a measure of exposed capability, not of governed capability. The two are different in ways that decide whether the platform is durable. A platform with 50 governed tools beats a platform with 458 ungoverned ones every quarter and every audit. What matters is the layer underneath the count: contracts, scopes, approvals, and naming. The four problems nobody writes about. Each one looks small in isolation. Each one ate at least a week of my last year.
Problem one: tool-name collisions
When you mount three MCP servers — say, an internal data warehouse, a SaaS CRM, and a marketing automation tool — and each one decides to expose a search or a query or a report tool, the agent will reach for whichever one resolves first. Sometimes that is the one you want. Often it is not. The classic failure mode is the agent calling search against the CRM when the user clearly meant the warehouse, then quietly returning a confidently wrong answer because the CRM has a row that loosely matches the question.
The fix is not technical. The fix is a naming convention enforced at the registry: <system>.<verb>.<noun> (netsuite.list.invoices, snowflake.query.audience, linkedin.post.update). It looks bureaucratic. It is not. It is the only thing standing between you and the failure mode where your agent silently picks the wrong system. MCP is not a free-for-all namespace. It is a contract registry, and the registry has rules.
Problem two: scope drift
Every MCP tool has an authorization profile — what it can read, what it can write, what it can publish to a customer-visible system. The mistake teams make is treating those scopes as static. They are not. They drift, because a developer adds a feature, a vendor changes a product line, a connector starts supporting a new endpoint. Six weeks later the tool you thought was read-only has a write path you did not know about, and your agent, given enough latitude, will eventually find it.
Two practices fix this. The first is scope diffing: when an MCP server's manifest changes, fail the build until a human re-reviews the new capabilities. The second is write-path approval boundaries — every write tool requires a structured approval object before execution, and that object must come from a human action, not from the agent's own reasoning. This is the unromantic part. It is also the part that turns a demo into infrastructure.
Problem three: approval fatigue
The opposite failure of "agent has too much authority" is "humans approve everything, every time, and stop reading after week three." Approval fatigue produces false-pass cascades: the human clicks approve on autopilot, the system records consent, and an action that should have been blocked rolls through with a perfect audit trail. The audit log says everything is fine. The customer says everything is not.
The mitigation is meaningful difference detection. Approvals should only be requested for actions that are genuinely novel or genuinely consequential. Routine, repeated, low-risk actions should be batched and approved in aggregate. Novel actions — first time this agent has ever called this tool against this tenant, or first time this tool has ever been called with these arguments — should interrupt and ask. The eval harness has to be smart enough to know the difference. Most are not.
Problem four: capability versus belief
The most subtle failure: the agent's mental model of what the tool can do diverges from what the tool actually does. You added a parameter. The tool description in the registry was not regenerated. The agent's prompt template still describes the old shape. The agent confidently calls the tool with the old arguments, the tool returns success with a degraded result, and nobody notices for nine days because the eval suite was checking answer quality, not capability fidelity.
You fix this with provenance. Every tool call gets logged with the registry version it executed against. Every eval case asserts that the tool description, the prompt template, and the actual tool surface were on the same generation. Drift is a first-class metric, not an afterthought.
What I would tell a team starting MCP today
Build the registry before the agent. Name everything. Diff the scopes on every change. Treat write-path approvals as inviolable, including for yourself. Measure capability drift the same way you measure latency and cost — as a release-gate, not a dashboard.
The 458 tools are not the proof. The proof is that I can answer the question what changed in the last seven days, who approved it, and what eval gated it — for any tool, on any client, in under a minute. That is the artifact. Everything else is plumbing.