Dependency Graph: 5x Token Reduction Through Smart Context
Dependency Graph: 5x Token Reduction Through Smart Context
In Context Engineering, we discussed managing context windows. Today, we’re diving into how GPTCode achieves 5x token reduction through dependency graph analysis.
The Problem: Context Overload
When you ask an AI coding assistant a question, it needs context. But how much?
Naive approach: Send the entire codebase
- 100,000+ tokens
- Expensive ($0.30-$3.00 per query with large models)
- Slow (processing time scales with context)
- Noisy (LLM sees irrelevant code)
Traditional approach: Keyword matching
- Search for files containing query terms
- Still sends too much or too little
- Misses important dependencies
The Solution: Dependency Graph + PageRank
GPTCode analyzes your codebase structure to provide only relevant context to the LLM.
How It Works
1. Build dependency graph from imports/requires
go: import statements
python: import/from statements
js/ts: import/require statements
ruby: require statements
rust: use statements
2. Run PageRank to identify central files
Files with many dependencies = higher importance
Example: config.go, types.go, utils.go
3. Match query terms to relevant files
"authentication" → auth.go, middleware.go, session.go
4. Expand to 1-hop neighbors
Include direct dependencies and dependents
auth.go → user.go, token.go, db.go
5. Select top N most relevant files (default: 5)
Ranked by PageRank score + query relevance
Example: Authentication Bug
Your query: gptcode chat "fix bug in authentication"
Without graph (naive):
- Sends: All 142 files (100,000 tokens)
- Cost: $0.30 with GPT-4 Turbo
- Time: 8 seconds to process
With graph (smart):
- Analyzes: Graph of 142 nodes, 287 edges
- Selects: 5 files (18,000 tokens)
internal/auth/handler.go(score: 0.842)internal/auth/middleware.go(score: 0.731)internal/models/user.go(score: 0.689)internal/auth/session.go(score: 0.654)config/security.go(score: 0.612)
- Cost: $0.05 (83% savings)
- Time: 1.5 seconds (81% faster)
Result: Better answer, lower cost, faster response.
Real-World Impact
Token Reduction
Typical codebase (50-200 files):
- Before: 80,000-120,000 tokens per query
- After: 15,000-25,000 tokens per query
- Reduction: 5-6x fewer tokens
Cost Savings
For 1,000 queries per month:
Without graph:
- 100M tokens × $0.30/1M = $30/month (GPT-4 Turbo input)
With graph:
- 20M tokens × $0.30/1M = $6/month (GPT-4 Turbo input)
Savings: $24/month (80% reduction)
With free models (OpenRouter), both costs → $0, but graph still provides better responses through focused context.
Better Responses
More tokens ≠ better answers.
The problem with too much context:
- LLM sees noise and signal equally
- Attention diluted across irrelevant code
- Higher chance of hallucination
Smart context wins:
- LLM focuses on relevant code only
- Better reasoning with less distraction
- More accurate suggestions
Example Comparison
Query: “Add rate limiting to login endpoint”
Without graph (entire codebase):
I see you have multiple authentication methods across
different files. I recommend...
[suggests changes to OAuth, SAML, JWT handlers]
[suggests modifying test files]
❌ Unfocused, touches too many files
With graph (auth + middleware only):
Looking at your auth/handler.go and middleware stack,
here's a focused approach:
1. Add rate limiter middleware before auth
2. Configure in security.go
3. Update handler to return 429 on limit exceeded
✓ Focused, actionable, correct scope
Language Support
Currently supported:
- Go:
importstatements - Python:
import,from ... import - JavaScript/TypeScript:
import,require - Ruby:
require,require_relative - Rust:
use,mod
More languages coming soon (Java, C++, etc.)
How to Use It
Automatic (Default)
Graph analysis works transparently in gptcode chat:
gptcode chat "explain how routing works"
The system:
- Builds graph (cached, ~100ms first time)
- Finds relevant files
- Sends focused context to LLM
- Returns answer
Debug Mode
See what the graph is doing:
GPTCODE_DEBUG=1 gptcode chat "your query"
Output:
[GRAPH] Building dependency graph...
[GRAPH] Built graph: 142 nodes, 287 edges
[GRAPH] Selected 5 files:
[GRAPH] 1. internal/agents/router.go (score: 0.842)
[GRAPH] 2. internal/llm/provider.go (score: 0.731)
[GRAPH] 3. cmd/gptcode/main.go (score: 0.689)
[GRAPH] 4. internal/modes/chat.go (score: 0.654)
[GRAPH] 5. internal/config/setup.go (score: 0.612)
Configuration
Control how many files to include:
# View current setting
gptcode config get defaults.graph_max_files
# Change to 3 files (more focused)
gptcode config set defaults.graph_max_files 3
# Change to 10 files (broader context)
gptcode config set defaults.graph_max_files 10
Recommendation: Start with 5 (default), adjust based on codebase size.
Disable (if needed)
GPTCODE_GRAPH=false gptcode chat "query"
Technical Details
Graph Building
The system uses Go’s AST parser to extract imports:
func parseGoImports(filePath string) []string {
fset := token.NewFileSet()
node, _ := parser.ParseFile(fset, filePath, nil, parser.ImportsOnly)
var imports []string
for _, imp := range node.Imports {
importPath := strings.Trim(imp.Path.Value, "\"")
imports = append(imports, importPath)
}
return imports
}
Similar parsers for Python, JS, Ruby, Rust.
PageRank Algorithm
Classic PageRank with damping factor 0.85:
PR(A) = (1-d)/N + d * Σ(PR(Ti)/C(Ti))
Where:
- d = 0.85 (damping factor)
- N = total nodes
- Ti = nodes linking to A
- C(Ti) = outbound links from Ti
Runs for 20 iterations, converges in ~50ms for typical codebases.
Query Matching
Simple term frequency:
func scoreFileForQuery(file string, queryTerms []string) float64 {
content := readFile(file)
score := 0.0
for _, term := range queryTerms {
score += float64(strings.Count(content, term))
}
return score
}
Combined with PageRank to get final relevance score.
Caching
Graph is cached in memory and rebuilt only when:
- Files added/removed
- Imports modified
- 5 minutes elapsed (stale check)
Performance
Benchmarks on medium codebase (150 files):
Graph build: 98ms (first time)
Graph load (cached): 2ms
File selection: 5ms
Total overhead: 7ms per query (cached)
The 5x token reduction more than compensates for the 7ms overhead.
Future Enhancements
1. Semantic Search Integration
Combine graph analysis with embeddings:
- Graph: structural relevance
- Embeddings: semantic similarity
- Hybrid score for best of both
2. Function-Level Granularity
Currently file-level. Planned:
- Extract function definitions
- Build function-level dependency graph
- Include only relevant functions, not entire files
3. Change Impact Analysis
Predict which files might break:
gptcode impact "change auth signature"
# Shows: Files that import auth + test files
4. Multi-Language Projects
Better handling of polyglot codebases:
- Go backend + TypeScript frontend
- Python ML + Go API
- Cross-language dependency tracking
Comparison: Graph vs Embeddings
| Aspect | Dependency Graph | Semantic Embeddings |
|---|---|---|
| Basis | Code structure | Content meaning |
| Speed | Fast (~5ms) | Slower (~100ms) |
| Accuracy | High for refactoring | High for exploration |
| Setup | Zero | Requires model |
| Cost | Free | Free (local models) |
| Best for | “fix bug in X” | “how does X work?” |
GPTCode’s approach: Start with graph (fast, structural), add embeddings later for semantic queries.
Getting Started
1. Update GPTCode
cd ~/gptcode
git pull origin main
make install
2. Try It Out
# Enable debug to see graph in action
GPTCODE_DEBUG=1 gptcode chat "explain authentication flow"
3. Observe the Magic
Watch as GPTCode:
- Builds the graph (first time only)
- Selects 5 relevant files
- Provides focused, accurate answer
4. Compare
Try the same query with graph disabled:
GPTCODE_GRAPH=false gptcode chat "explain authentication flow"
Notice:
- Slower response
- Less focused answer
- Higher token usage (check with
gptcode stats)
Best Practices
Let the Cache Work
Don’t disable graph unless debugging. The cache makes it nearly free after first build.
Use Descriptive Queries
Better queries → better file selection:
- ✓ “fix bug in authentication middleware”
- ✗ “fix bug”
Check Debug Output
If answers seem off, use GPTCODE_DEBUG=1 to see which files were selected. Adjust graph_max_files if needed.
Combine with Other Features
Graph works great with:
- ML intent classification: Fast routing to right agent
- Model selection: Optimal model for each query
- Feedback system: Learn from corrections
Community Examples
Real user queries where graph made a difference:
Query: “Add logging to database operations”
- Selected: db.go, logger.go, config.go (3 files, 8k tokens)
- Result: Precise changes, 2 minutes to implement
Query: “Refactor error handling”
- Selected: errors.go, handler.go, middleware.go, utils.go, types.go (5 files, 15k tokens)
- Result: Comprehensive refactoring plan, consistent across codebase
Query: “Why is the API slow?”
- Selected: handler.go, db.go, cache.go, middleware.go (4 files, 12k tokens)
- Result: Identified N+1 query issue, suggested caching
Summary
Dependency graph delivers:
- 5x token reduction (100k → 20k tokens)
- 80% cost savings on cloud models
- Better responses through focused context
- Faster processing (less context = quicker)
- Automatic (zero configuration)
Try it today:
gptcode chat "your question about the codebase"
References
- Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab.
Have questions about dependency graphs? Join our GitHub Discussions
See Also
- Context Engineering - Making AI work in real codebases
- Agent Routing vs Tool Search - Context reduction strategies
- Intelligent Model Selection - Cost optimization