Preparing for Autonomous Execution: File Validation, Telemetry & Intelligence
Preparing for Autonomous Execution: File Validation, Telemetry & Intelligence
Today we’re announcing a set of foundational improvements that bring gptcode closer to autonomous execution—the ability to run implementation tasks automatically when repository issues are created, with built-in validation, telemetry, and intelligent model selection.
The Vision
Imagine opening a GitHub issue with:
Title: Add user authentication
Description: Implement JWT-based authentication with refresh tokens
And having gptcode:
- Automatically detect the new issue
- Create an implementation plan
- Execute the plan with file validation
- Verify the changes work
- Track telemetry for continuous improvement
- Open a PR with the implementation
This release lays the groundwork for that future.
What’s New
1. File Validation & Tracking
The Problem: In autonomous execution, the agent could modify files outside the scope of the plan, causing unintended side effects.
The Solution: Both write_file and apply_patch now:
- Enforce allowlist validation (only modify planned files)
- Return the actual files modified
- Pass real modifications to the Validator (not just the plan’s whitelist)
// Before: No validation on apply_patch
result := tools.ApplyPatch(call, workdir)
// After: Validated and tracked
result := tools.ApplyPatch(call, workdir)
// result.ModifiedFiles = ["internal/auth/handler.go"]
Why it matters:
- Prevents scope creep in autonomous execution
- Validator sees actual changes, not assumptions
- Better error messages when validation fails
2. Explicit Success Criteria in Plans
The Problem: Plans had vague validation steps like “verify it worked.”
The Solution: The Planner now requires 2-5 specific, testable success criteria:
## Success Criteria
- Tests pass: make test
- File internal/auth/handler.go contains JWT validation
- Command curl /api/auth/login returns 401 without token
- Documentation updated in docs/auth.md
Why it matters:
- Validator can check concrete conditions
- Plans are more actionable
- Autonomous execution knows when to stop
3. Model Capability Catalog
The Problem: Model capabilities (tool-calling, cost, speed) were hardcoded in multiple places.
The Solution: Created ModelCatalog to centralize model metadata:
catalog := intelligence.NewModelCatalog()
models := catalog.GetModelsForAgent("editor")
for _, model := range models {
fmt.Printf("%s: $%.2f/1M, %d TPS, Functions: %v\n",
model.Name, model.CostPer1M, model.SpeedTPS, model.SupportsFunctions)
}
Fallback support:
// Unknown model? Returns sensible defaults
info := catalog.GetModelInfo("new-backend", "new-model")
// info.CostPer1M = 1.0 (default)
// info.SpeedTPS = 300 (default)
Why it matters:
- Easy to add new models (update catalog only)
- Consistent capabilities across the system
- Graceful handling of unknown models
4. OpenTelemetry-Based Telemetry
The Problem: No visibility into what the system does during autonomous execution.
The Solution: Implemented OpenTelemetry-based telemetry with:
Step tracking:
tel := telemetry.NewTelemetry()
event := telemetry.StepEvent{
StepIndex: 0,
StepName: "Implement Authentication",
FilesTouched: []string{"auth/handler.go", "auth/middleware.go"},
Success: true,
DurationMs: 2500,
}
tel.RecordStep(ctx, event)
Usage tracking:
tracker := telemetry.NewUsageTracker()
tracker.RecordRequest("openai", "gpt-4", 1500)
stats := tracker.GetStats()
// stats["openai/gpt-4"] = {Requests: 1, Tokens: 1500}
Why it matters:
- Observe autonomous execution in real-time
- Track API costs per backend/model
- Debug failures with structured events
- Foundation for
gptcode usagecommand
Technical Deep Dive
File Validation Flow
// agent/editor.go
func (e *EditorAgent) Execute(ctx context.Context, history []llm.ChatMessage,
statusCallback StatusCallback) (string, []string, error) {
var modifiedFiles []string
for _, toolCall := range resp.ToolCalls {
// Validate write_file AND apply_patch
if toolCall.Name == "write_file" || toolCall.Name == "apply_patch" {
if err := e.validateFileWrite(argsMap); err != nil {
// Reject modification outside allowlist
return "", nil, err
}
}
result := tools.ExecuteToolFromLLM(toolCall, e.cwd)
modifiedFiles = append(modifiedFiles, result.ModifiedFiles...)
}
return response, modifiedFiles, nil
}
Model Catalog Structure
type ModelCatalog struct {
Models map[string]ModelInfo
}
type ModelInfo struct {
Backend string
Name string
SupportsFunctions bool
CostPer1M float64
SpeedTPS int
Agents []string // ["editor", "query", ...]
}
// Usage
catalog := NewModelCatalog()
editorModels := catalog.GetModelsForAgent("editor")
Telemetry Integration Points
In Guided Mode:
func (g *GuidedMode) Implement(ctx context.Context, plan string) error {
tel := telemetry.NewTelemetry()
start := time.Now()
result, modifiedFiles, err := editorAgent.Execute(ctx, history, statusCallback)
tel.RecordStep(ctx, telemetry.StepEvent{
StepName: "Implementation",
FilesTouched: modifiedFiles,
Success: err == nil,
DurationMs: time.Since(start).Milliseconds(),
})
return err
}
In Orchestrated Mode:
func (m *Maestro) executeStep(ctx context.Context, step PlanStep) error {
tel := telemetry.NewTelemetry()
_, modifiedFiles, err := editorAgent.Execute(ctx, history, statusCallback)
tel.RecordStep(ctx, telemetry.StepEvent{
StepIndex: m.CurrentStepIdx,
StepName: step.Title,
FilesTouched: modifiedFiles,
Success: err == nil,
})
return err
}
Testing
All improvements come with comprehensive unit tests:
Catalog tests:
$ go test -v ./internal/intelligence/...
=== RUN TestNewModelCatalog
--- PASS: TestNewModelCatalog
=== RUN TestGetModelsForAgent
--- PASS: TestGetModelsForAgent
=== RUN TestGetModelInfo
--- PASS: TestGetModelInfo
PASS
Telemetry tests:
$ go test -v ./internal/telemetry/...
=== RUN TestRecordStep
--- PASS: TestRecordStep
=== RUN TestUsageTrackerMultipleModels
--- PASS: TestUsageTrackerMultipleModels
PASS
Impact on Existing Workflows
Guided Mode (gptcode guided)
Before:
gptcode guided "add auth"
# Validator checks plan whitelist (not actual changes)
After:
gptcode guided "add auth"
# Validator checks actually modified files
# Telemetry records what was changed
Orchestrated Mode (Maestro)
Before:
gptcode auto plan.md
# No file validation on apply_patch
# No telemetry on steps
After:
gptcode auto plan.md
# Both write_file and apply_patch validated
# Every step emits telemetry events
Roadmap to Autonomous Execution
Note: The following features are part of the roadmap and not yet implemented. The improvements described above lay the foundation for these capabilities.
Still needed:
1. GitHub Actions Integration
Trigger execution on issue creation:
on:
issues:
types: [opened]
jobs:
gptcode-auto:
runs-on: ubuntu-latest
steps:
- run: gptcode auto --from-issue $
2. Usage Command (Future)
$ gptcode usage
Usage Statistics
================
openai/gpt-4:
Requests: 15
Tokens: 45000
Cost: $0.45
openrouter/kimi:free:
Requests: 32
Tokens: 120000
Cost: $0.00
Total:
Requests: 47
Tokens: 165000
Cost: $0.45
3. Graph Context Injection (Future)
Language-specific heuristics to inject relevant code:
- Extract imports/signatures in Go
- Include only referenced functions
- Limit tokens per file
4. End-to-End Verification (Future)
gptcode verify plan.md
# Runs all success criteria
# Returns structured results
What You Can Do Today
1. Try File Validation
Create a plan with specific files:
gptcode plan "add logout endpoint"
gptcode implement ~/.gptcode/plans/2025-11-27-logout.md
The editor will only modify files mentioned in the plan.
2. Check Telemetry
export GPTCODE_DEBUG=1
gptcode guided "add feature"
Look for step events in stderr.
3. Explore the Catalog
import "gptcode/internal/intelligence"
catalog := intelligence.NewModelCatalog()
for key, info := range catalog.Models {
fmt.Printf("%s: $%.2f/1M\n", key, info.CostPer1M)
}
4. Run Tests
go test ./internal/intelligence ./internal/telemetry
Community Feedback
We’re building autonomous execution with the community. We’d love to hear:
- Which features are most important for autonomous execution?
- Should we prioritize GitHub integration or local improvements first?
- What telemetry data would be most valuable?
Join the discussion on GitHub.
References
Posted on November 27, 2025. All features tested with unit tests and integration workflows.