Agent Routing vs. Tool Search: Two Paths to 85% Context Reduction
Agent Routing vs. Tool Search: Two Paths to 85% Context Reduction
Anthropic just released advanced tool use features that achieve 85% context reduction through on-demand tool discovery.
GPTCode already does this—but with a fundamentally different architecture.
The Problem: Token Bloat
Anthropic’s scenario:
- 58 tools across 5 MCP servers (GitHub, Slack, Jira, Sentry, Grafana)
- 55,000 tokens consumed before any work begins
- Add more servers → 72,000+ tokens
- Real production systems: 134,000+ tokens
GPTCode’s scenario:
- Large codebase with 500+ files
- 100,000+ tokens of potential context
- Need to identify relevant files for each task
- Need to execute multi-step changes safely
Same fundamental problem: too much context drowns the signal.
Solution #1: Discovery
Anthropic: Tool Search Tool
Instead of loading all 58 tools upfront, Anthropic defers most tools and discovers them on-demand:
- Load only a search tool (~500 tokens)
- When Claude needs GitHub capabilities, search for “github”
- Load only
github.createPullRequestandgithub.listIssues(~3K tokens) - Leave the other 56 tools deferred
Result: 85% reduction (77K → 8.7K tokens)
Trade-off: Adds search latency to every task
GPTCode: Agent Routing + Semantic Filtering
GPTCode doesn’t use a tool search. Instead, it routes to specialized agents:
gptcode do "add authentication"
↓
ML Classifier (1ms) → OrchestratedMode
↓
Router Agent → Analyzer Agent → Planner Agent → Editor Agent → Validator Agent
Each agent has a narrow, focused capability set:
- Analyzer: File scanning, dependency graphs, PageRank
- Planner: Implementation planning, success criteria
- Editor: Code generation, file editing
- Validator: Test running, lint checking
Context is pre-filtered via PageRank + dependency analysis:
- 100K codebase → identify 14 relevant files
- Load only those files into Planner context
Result: 80% reduction (100K → 20K tokens)
Trade-off: Agents must be designed upfront (less flexible than dynamic tool discovery)
Solution #2: Code Orchestration
Both systems keep intermediate results out of the LLM’s context.
Anthropic: Programmatic Tool Calling
Claude writes Python code that orchestrates tools:
# Claude writes this orchestration code
team = await get_team_members("engineering")
expenses = await asyncio.gather(*[
get_expenses(user_id, "Q3") for user_id in team
])
# Only final result enters Claude's context
exceeded = [
user for user in team
if sum(expenses[user]) > budget[user["level"]]
]
print(json.dumps(exceeded)) # Just 3 people
Impact: Process 2,000+ expense line items, but only 3 results enter context.
Token savings: 37% reduction (43,588 → 27,297 tokens on complex tasks)
GPTCode: Maestro Pipeline
GPTCode’s autonomous mode (gptcode do) implements orchestration as an agent pipeline:
Analyzer (scans 100 files)
↓ outputs: dependency graph only
Planner (creates 5-step plan)
↓ outputs: file list + success criteria only
Editor (edits 3 files)
↓ outputs: diffs only
Validator (runs 50 tests)
↓ outputs: pass/fail summary only
Impact: 100+ files analyzed, but only 3 file paths + test results in final context.
The Validation Gate ensures intermediate bloat never enters the editor:
- Planner outputs:
[auth/index.ts, middleware/jwt.ts, tests/auth.test.ts] - Editor receives: Only those 3 files
- Validator outputs:
12/12 tests passing
Key difference: GPTCode’s orchestration is structural (agent handoffs), while Anthropic’s is programmatic (LLM-written code).
Solution #3: Parameter Accuracy
This is where Anthropic’s innovation shines—and where GPTCode has room to improve.
Anthropic: Tool Use Examples
JSON Schema defines structure, but not usage patterns. Anthropic adds concrete examples:
{
"name": "create_ticket",
"input_schema": {
"properties": {
"title": {"type": "string"},
"due_date": {"type": "string"},
"reporter": {
"properties": {
"id": {"type": "string"},
"contact": {"properties": {...}}
}
}
}
},
"input_examples": [
{
"title": "Login page returns 500",
"priority": "critical",
"due_date": "2024-11-06",
"reporter": {
"id": "USR-12345",
"contact": {"email": "jane@acme.com"}
}
},
{
"title": "Add dark mode",
"labels": ["feature-request"],
"reporter": {"id": "USR-67890"}
}
]
}
From these examples, Claude learns:
- Date format:
YYYY-MM-DD - ID convention:
USR-XXXXX - When to include optional fields (contact info for critical bugs, not for features)
Result: 72% → 90% accuracy on complex parameters
GPTCode: Validation + Feedback (Current)
GPTCode currently relies on:
- File validation: Prevents unintended file creation
- Success criteria: Test-based verification
- Auto-recovery: Switches models when validation fails
This catches errors after the fact, but doesn’t prevent them upfront.
What We’re Adding: Concrete Examples in Prompts
We’re borrowing Anthropic’s best idea and adding explicit examples to each agent:
Analyzer agent:
Example: Analyzing Go authentication code
Input: "How does user authentication work?"
Output:
- Files found: [auth/index.go, middleware/jwt.go, handlers/login.go]
- Dependencies: jwt.go → auth.go → handlers.go
- PageRank scores: [0.82, 0.71, 0.45]
Planner agent:
Example: Plan for adding authentication
Task: "add user authentication"
Output:
Phase 1: Core authentication
- Create: auth/handler.go (login, logout, verify)
- Modify: server.go (add auth middleware)
Phase 2: JWT implementation
- Create: auth/jwt.go (sign, verify tokens)
- Modify: middleware/ (auth middleware)
Success criteria:
- Tests pass: auth_test.go
- Lints clean
Editor agent:
Example: File edit using search/replace
File: auth/handler.go
Change: Add JWT verification
<<<<<<< SEARCH
func VerifyToken(token string) bool {
// TODO: implement
return false
}
=======
func VerifyToken(token string) (*Claims, error) {
claims := &Claims{}
parsed, err := jwt.ParseWithClaims(token, claims, keyFunc)
if err != nil || !parsed.Valid {
return nil, err
}
return claims, nil
}
>>>>>>> REPLACE
Expected impact: 10-20% accuracy improvement, fewer retry loops.
Conclusion: Architecture Matters
Advanced tool use isn’t just about features—it’s about system design.
| Anthropic | GPTCode | |
|---|---|---|
| Philosophy | Single mega-agent + dynamic discovery | Multi-agent specialization |
| Discovery | Tool Search (on-demand) | Agent Routing (1ms classifier) |
| Orchestration | Programmatic (Python scripts) | Structural (pipeline) |
| Context Reduction | 85% (77K → 8.7K) | 80% (100K → 20K) |
| Flexibility | High (add tools dynamically) | Medium (agents designed upfront) |
| Latency | +search overhead per task | Minimal (1ms routing) |
Both achieve ~85% context reduction through different paths:
- On-demand flexibility (Anthropic) vs. Upfront specialization (GPTCode)
- Dynamic tool discovery vs. Static agent pipeline
We’re adopting Anthropic’s Tool Use Examples pattern while keeping our fast, specialized architecture.
Try It
gptcode do "add user authentication"
# 1ms routing → Analyzer → Planner → Editor → Validator
# 100K codebase → 20K relevant context → 3 files modified
# Total cost: $0.000556 (vs. $0.01+ with full context)
# Auto-retry with model switching if tests fail
Read more: