Optimal Groq Configurations: Technical Analysis & Profile Guide
Optimal Groq Configurations: Technical Analysis & Profile Guide
Groq offers blazing-fast inference speeds with their LPU technology. This guide provides rigorously analyzed configurations based on real model pricing, role-specific requirements, and actual I/O patterns.
Understanding Agent Roles & I/O Patterns
Each agent has different I/O characteristics that dramatically affect cost:
| Agent | I/O Ratio | Critical Factor | Why |
|---|---|---|---|
| Router | Input » Output | Speed | Called most frequently, minimal output |
| Query | Balanced | Reasoning | Analyzes code, balanced I/O |
| Editor | Output » Input (1:5) | Output Cost | Generates lots of code, output dominates |
| Research | Variable | Tools + Quality | Complex queries with external APIs |
Key Insight: Editor Output Cost Dominates
For a typical editing session:
- Input: 1,000 tokens (prompt + context)
- Output: 5,000 tokens (generated code)
Total cost = (Input × $I) + (Output × 5 × $O)
This means output price is 5x more important than input price for editors.
Budget Profile: Cost-Effective Configuration
Use: gptcode profile use groq.budget
backend:
groq:
profiles:
budget:
agent_models:
router: llama-3.1-8b-instant
query: openai/gpt-oss-120b
editor: qwen/qwen3-32b
research: groq/compound
Estimated cost: $0.85/month for 3M tokens (~150k/day)
| Usage Level | Tokens/Month | Cost/Month |
|---|---|---|
| Light | 1M (~50k/day) | $0.28 |
| Moderate | 3M (~150k/day) | $0.85 |
| Heavy | 5M (~250k/day) | $1.42 |
Technical Rationale
Router: llama-3.1-8b-instant ($0.05/$0.08)
- 8B sufficient for routing decisions
- Fastest inference on Groq LPU
- Called most frequently → speed critical
- Cheapest option available
Query: openai/gpt-oss-120b ($0.15/$0.60)
- 120B parameters for deep reasoning
- Strong code comprehension
- Balanced I/O → reasonable cost
Editor: qwen/qwen3-32b ($0.29/$0.59) ← KEY CHOICE
- Total cost: $0.29 + ($0.59 × 5) = $3.24/1M
- Coding-specialized: Qwen series optimized for code
- 40% cheaper than llama-3.3-70b ($4.54/1M total)
- 32B sufficient for most editing tasks
- Why NOT llama-3.3-70b? Generic “versatile” model costs more and isn’t coding-focused
Research: groq/compound ($0.15/$0.60 base)
- GPT-OSS-120B + Llama 4 Scout internally
- Built-in tools: web search, code execution, browser
- Base pricing + tool costs ($5-8/1000 searches)
- NOT free - conservative estimate uses highest base price
Performance Profile: Quality-First Configuration
Use: gptcode profile use groq.performance
backend:
groq:
profiles:
performance:
agent_models:
router: llama-3.1-8b-instant
query: openai/gpt-oss-120b
editor: moonshotai/kimi-k2-instruct
research: groq/compound
Estimated cost: $2.41/month for 3M tokens (~150k/day)
| Usage Level | Tokens/Month | Cost/Month |
|---|---|---|
| Light | 1M (~50k/day) | $0.80 |
| Moderate | 3M (~150k/day) | $2.41 |
| Heavy | 5M (~250k/day) | $4.01 |
Technical Rationale
Router: llama-3.1-8b-instant ($0.05/$0.08) ← Still fast!
- Why NOT llama-3.3-70b? Speed matters more than size for routing
- 8B sufficient for intent classification
- 70B would add latency without meaningful benefit
Query: openai/gpt-oss-120b ($0.15/$0.60)
- Same as budget (already optimal)
- 120B parameters for reasoning
- No better option at this price point
Editor: moonshotai/kimi-k2-instruct ($1.00/$3.00) ← Premium choice
- Total cost: $1.00 + ($3.00 × 5) = $16/1M
- 5x more expensive than budget, but worth it for:
- Large context window (262K vs 131K)
- Superior quality for complex refactoring
- Better handling of multi-file edits
- Why NOT llama-3.3-70b? More expensive ($4.54/1M) and lower quality than Kimi K2
Research: groq/compound ($0.15/$0.60 base)
- Same as budget (best option with tools)
- Unified interface vs managing separate APIs
Available Models Analysis
Real pricing from Groq API (verified Nov 2025):
| Model | Size | Input $/1M | Output $/1M | Best For |
|---|---|---|---|---|
| llama-3.1-8b-instant | 8B | $0.05 | $0.08 | Router (speed) |
| openai/gpt-oss-safeguard-20b | 20B | $0.075 | $0.30 | Budget editing |
| meta-llama/llama-4-scout-17b | 17B | $0.11 | $0.34 | Tool use |
| openai/gpt-oss-120b | 120B | $0.15 | $0.60 | Query/reasoning |
| qwen/qwen3-32b | 32B | $0.29 | $0.59 | Coding-focused |
| llama-3.3-70b-versatile | 70B | $0.59 | $0.79 | Generic tasks |
| moonshotai/kimi-k2-instruct | Large | $1.00 | $3.00 | Premium quality |
| groq/compound | Composite | $0.15* | $0.60* | Research + tools |
*Compound: Base model + tool costs ($5-8/1000 searches, $0.18/hour code exec)
Editor Cost Comparison (1:5 I/O ratio)
| Model | Input | Output×5 | Total/1M | Notes |
|---|---|---|---|---|
| qwen/qwen3-32b | $0.29 | $2.95 | $3.24 | Coding-specialized |
| llama-3.3-70b | $0.59 | $3.95 | $4.54 | Generic, 40% more expensive |
| moonshotai/kimi-k2 | $1.00 | $15.00 | $16.00 | Premium, 5x more expensive |
Key Insights
1. Router Should Stay Fast (Even in Performance Profile)
❌ Wrong: Use llama-3.3-70b for router in performance profile ✅ Right: Use llama-3.1-8b-instant even in performance
Why?
- Router is called most frequently
- 8B is sufficient for intent classification
- 70B adds latency without meaningful benefit
- Speed > size for this role
2. Editor Output Cost Dominates
❌ Wrong: Choose editor by parameter count ✅ Right: Calculate total cost with 1:5 I/O ratio
Example:
- llama-3.3-70b: $0.59 + ($0.79 × 5) = $4.54/1M
- qwen/qwen3-32b: $0.29 + ($0.59 × 5) = $3.24/1M
Result: 32B coding-focused model is 40% cheaper AND better for code
3. Compound Has Composite Pricing
❌ Wrong: “groq/compound is free” ✅ Right: Base model ($0.15/$0.60) + tool costs
Tool costs:
- Basic web search: $5/1000 requests
- Advanced search: $8/1000 requests
- Code execution: $0.18/hour
- Browser automation: $0.08/hour
Still cost-effective: Unified interface vs managing separate APIs
4. Coding-Specialized 32B > Generic 70B
❌ Wrong: Bigger model = better for code ✅ Right: Qwen series specialized for coding
Budget profile: qwen/qwen3-32b beats llama-3.3-70b:
- 40% cheaper
- Coding-focused architecture
- Sufficient parameters for editing
Performance profile: moonshotai/kimi-k2-instruct for complex tasks:
- Large context (262K)
- Superior quality
- Worth 5x cost premium
Setting Up Profiles
Option 1: Use Pre-configured Profiles
# Budget profile ($2-5/month)
gptcode profile use groq.budget
# Performance profile ($10-20/month)
gptcode profile use groq.performance
# Verify current profile
gptcode profile
Option 2: Configure Manually
Edit ~/.gptcode/setup.yaml:
defaults:
backend: groq
profile: budget # or performance
backend:
groq:
type: openai
base_url: https://api.groq.com/openai/v1
profiles:
budget:
agent_models:
router: llama-3.1-8b-instant
query: openai/gpt-oss-120b
editor: qwen/qwen3-32b
research: groq/compound
performance:
agent_models:
router: llama-3.1-8b-instant
query: openai/gpt-oss-120b
editor: moonshotai/kimi-k2-instruct
research: groq/compound
Understanding Groq Compound
Compound is not a single model - it’s a system that dynamically routes between:
- GPT-OSS-120B: Base reasoning model
- Llama 4 Scout 17B: Tool use specialist
Built-in Tools
- Web search (basic & advanced)
- Code execution
- Browser automation
- Website visits
Pricing Structure
Base: $0.15 input / $0.60 output (GPT-OSS-120B rate)
Tools (per-use charges):
- Basic web search: $5/1000 requests
- Advanced search: $8/1000 requests
- Visit website: $1/1000 visits
- Code execution: $0.18/hour
- Browser automation: $0.08/hour
Conservative estimate: Use $0.15/$0.60 as baseline + monitor tool usage
Why Use Compound?
- Unified interface (no separate API management)
- Automatic model routing
- Integrated tools for research
- Cost-effective vs separate services
Migration Guide
From Old Configs
If you have old profile configurations, update them:
❌ Old (suboptimal):
agent_models:
router: llama-3.3-70b-versatile # too slow
editor: llama-3.3-70b-versatile # not coding-focused
✅ New (optimized):
agent_models:
router: llama-3.1-8b-instant # faster
editor: qwen/qwen3-32b # coding-specialized, cheaper
Update Model Catalog
After changing profiles, update catalog to see available models:
gptcode model update --all
gptcode model list groq
Monitoring & Optimization
Check Current Configuration
gptcode profile # Show current backend and profile
gptcode profile show groq.budget # Show specific profile details
Monitor Usage
- Visit console.groq.com for usage stats
- Track which agent is called most (usually router)
- Identify bottlenecks in your workflow
Optimization Tips
- Start with budget, upgrade specific agents as needed
- Router should always be fast (llama-3.1-8b-instant)
- Editor output cost dominates - calculate with 1:5 ratio
- Query balance - gpt-oss-120b is sweet spot
- Compound is not free - monitor tool usage
Common Mistakes
❌ Using 70B for Router
Problem: Adds latency without benefit Fix: Use llama-3.1-8b-instant even in performance profile
❌ Choosing Editor by Parameters
Problem: Ignores output cost (5x more important) Fix: Calculate total cost, choose coding-specialized models
❌ Assuming Compound is Free
Problem: Unexpected tool costs Fix: Budget for $0.15/$0.60 base + $5-8/1000 searches
❌ Generic 70B Over Specialized 32B
Problem: Higher cost, not coding-focused Fix: qwen/qwen3-32b beats llama-3.3-70b for budget editing
Cost Comparison
Groq vs Claude Pro Max
Claude Pro Max: $200/month for ~4.8M tokens (~800 prompts/session)
Groq (5M tokens/month, ~250k/day):
- Budget: $1.42/month (~99% cheaper)
- Performance: $4.01/month (~98% cheaper)
Why So Cheap?
- Groq LPU efficiency: Hardware-level optimization
- Open-source models: No model training costs passed to users
- Competitive pricing: Groq subsidizing to gain market share
- Smart agent routing: 40% of calls use cheapest model (router)
Usage Breakdown (3M tokens/month typical)
Budget Profile ($0.85/month):
- Router (40%): $0.06 - llama-3.1-8b-instant
- Query (30%): $0.34 - openai/gpt-oss-120b
- Editor (25%): $0.40 - qwen/qwen3-32b
- Research (5%): $0.06 - groq/compound
Performance Profile ($2.41/month):
- Router (40%): $0.06 - llama-3.1-8b-instant (same)
- Query (30%): $0.34 - openai/gpt-oss-120b (same)
- Editor (25%): $1.95 - moonshotai/kimi-k2-instruct (5x more)
- Research (5%): $0.06 - groq/compound (same)
Key insight: Editor accounts for 46% of budget cost but 81% of performance cost.
Questions or better configurations? Share on GitHub Discussions!