Every enabled MCP server contributes its tool definitions to the context at session start, and some servers register 15+ tools with verbose descriptions. This schema bloat is a hidden token cost that most Claude Code users never audit. The Gandalf pretooluse trick (pre-filtering which tools the model considers) cuts this waste, but the simpler fix is fewer, better-scoped MCP servers.
How MCP Schema Bloat Happens
When Claude Code starts a session, it loads tool schemas from every enabled MCP server. Each schema includes the tool name, description, parameter definitions, and type information. A single MCP server registering 15 tools with detailed descriptions can add 3,000-5,000 tokens to every session context before the user types anything. With 5-6 MCP servers enabled, you can burn 15,000-25,000 tokens on tool schemas alone.
This cost repeats on every context refresh. If your session compresses and reloads, you pay the schema tax again. Over a day of active coding, schema bloat can account for 10-20% of total token spend.
Audit Your Current MCP Token Usage
Check how many MCP servers you have enabled and estimate the schema cost. Each tool definition averages 200-400 tokens depending on description length and parameter count.
# Count MCP servers in your config
cat ~/.claude/mcp.json | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'{len(d.get(\"mcpServers\", {}))} servers enabled')"
# Estimate schema tokens (rough: 300 tokens per tool)
# A server with 10 tools = ~3,000 tokens per session startThree Strategies to Reduce Schema Bloat
First, disable MCP servers you do not use in every session. If you only use the GitHub MCP server once a week, disable it by default and enable it when needed. Second, prefer MCP servers with fewer, well-scoped tools. A server with 3 focused tools (search Google, search Reddit, extract URL) is more token-efficient than one with 20 granular tools. Third, use the Gandalf pretooluse pattern: a system prompt addition that instructs the model to mentally filter tools before calling them, reducing the effective consideration set.
Choosing Token-Efficient Search MCP Servers
Search MCP servers vary significantly in schema size. Some register separate tools for every search platform (google_search, reddit_search, youtube_search, amazon_search, walmart_search = 5 tool schemas). Others consolidate into one tool with a platform parameter (search = 1 schema). The consolidated approach saves 800-1,600 tokens per session.
Response format also matters. An MCP server that returns structured JSON (title, snippet, URL) adds fewer tokens to context than one returning full page Markdown. For a typical search returning 10 results, structured JSON uses ~300 tokens per result while Markdown can use 2,000+ tokens per result.
{
"mcpServers": {
"scavio": {
"url": "https://mcp.scavio.dev/mcp",
"headers": { "x-api-key": "YOUR_KEY" }
}
}
}Scavio's hosted MCP server registers a compact tool set with structured JSON responses, keeping both schema cost and result cost low. The hosted nature also means no local server process competing for resources during coding sessions.