Preparing for Autonomous Execution: File Validation, Telemetry & Intelligence

Today we’re announcing a set of foundational improvements that bring gptcode closer to autonomous execution—the ability to run implementation tasks automatically when repository issues are created, with built-in validation, telemetry, and intelligent model selection.

The Vision

Imagine opening a GitHub issue with:

Title: Add user authentication
Description: Implement JWT-based authentication with refresh tokens

And having gptcode:

Automatically detect the new issue
Create an implementation plan
Execute the plan with file validation
Verify the changes work
Track telemetry for continuous improvement
Open a PR with the implementation

This release lays the groundwork for that future.

What’s New

1. File Validation & Tracking

The Problem: In autonomous execution, the agent could modify files outside the scope of the plan, causing unintended side effects.

The Solution: Both write_file and apply_patch now:

Enforce allowlist validation (only modify planned files)
Return the actual files modified
Pass real modifications to the Validator (not just the plan’s whitelist)

// Before: No validation on apply_patch
result := tools.ApplyPatch(call, workdir)

// After: Validated and tracked
result := tools.ApplyPatch(call, workdir)
// result.ModifiedFiles = ["internal/auth/handler.go"]

Why it matters:

Prevents scope creep in autonomous execution
Validator sees actual changes, not assumptions
Better error messages when validation fails

2. Explicit Success Criteria in Plans

The Problem: Plans had vague validation steps like “verify it worked.”

The Solution: The Planner now requires 2-5 specific, testable success criteria:

## Success Criteria
- Tests pass: make test
- File internal/auth/handler.go contains JWT validation
- Command curl /api/auth/login returns 401 without token
- Documentation updated in docs/auth.md

Why it matters:

Validator can check concrete conditions
Plans are more actionable
Autonomous execution knows when to stop

3. Model Capability Catalog

The Problem: Model capabilities (tool-calling, cost, speed) were hardcoded in multiple places.

The Solution: Created ModelCatalog to centralize model metadata:

catalog := intelligence.NewModelCatalog()
models := catalog.GetModelsForAgent("editor")

for _, model := range models {
    fmt.Printf("%s: $%.2f/1M, %d TPS, Functions: %v\n",
        model.Name, model.CostPer1M, model.SpeedTPS, model.SupportsFunctions)
}

Fallback support:

// Unknown model? Returns sensible defaults
info := catalog.GetModelInfo("new-backend", "new-model")
// info.CostPer1M = 1.0 (default)
// info.SpeedTPS = 300 (default)

Why it matters:

Easy to add new models (update catalog only)
Consistent capabilities across the system
Graceful handling of unknown models

4. OpenTelemetry-Based Telemetry

The Problem: No visibility into what the system does during autonomous execution.

The Solution: Implemented OpenTelemetry-based telemetry with:

Step tracking:

tel := telemetry.NewTelemetry()
event := telemetry.StepEvent{
    StepIndex:    0,
    StepName:     "Implement Authentication",
    FilesTouched: []string{"auth/handler.go", "auth/middleware.go"},
    Success:      true,
    DurationMs:   2500,
}
tel.RecordStep(ctx, event)

Usage tracking:

tracker := telemetry.NewUsageTracker()
tracker.RecordRequest("openai", "gpt-4", 1500)

stats := tracker.GetStats()
// stats["openai/gpt-4"] = {Requests: 1, Tokens: 1500}

Why it matters:

Observe autonomous execution in real-time
Track API costs per backend/model
Debug failures with structured events
Foundation for gptcode usage command

Technical Deep Dive

File Validation Flow

// agent/editor.go
func (e *EditorAgent) Execute(ctx context.Context, history []llm.ChatMessage, 
    statusCallback StatusCallback) (string, []string, error) {
    
    var modifiedFiles []string
    
    for _, toolCall := range resp.ToolCalls {
        // Validate write_file AND apply_patch
        if toolCall.Name == "write_file" || toolCall.Name == "apply_patch" {
            if err := e.validateFileWrite(argsMap); err != nil {
                // Reject modification outside allowlist
                return "", nil, err
            }
        }
        
        result := tools.ExecuteToolFromLLM(toolCall, e.cwd)
        modifiedFiles = append(modifiedFiles, result.ModifiedFiles...)
    }
    
    return response, modifiedFiles, nil
}

Model Catalog Structure

type ModelCatalog struct {
    Models map[string]ModelInfo
}

type ModelInfo struct {
    Backend           string
    Name              string
    SupportsFunctions bool
    CostPer1M         float64
    SpeedTPS          int
    Agents            []string // ["editor", "query", ...]
}

// Usage
catalog := NewModelCatalog()
editorModels := catalog.GetModelsForAgent("editor")

Telemetry Integration Points

In Guided Mode:

func (g *GuidedMode) Implement(ctx context.Context, plan string) error {
    tel := telemetry.NewTelemetry()
    start := time.Now()
    
    result, modifiedFiles, err := editorAgent.Execute(ctx, history, statusCallback)
    
    tel.RecordStep(ctx, telemetry.StepEvent{
        StepName:     "Implementation",
        FilesTouched: modifiedFiles,
        Success:      err == nil,
        DurationMs:   time.Since(start).Milliseconds(),
    })
    
    return err
}

In Orchestrated Mode:

func (m *Maestro) executeStep(ctx context.Context, step PlanStep) error {
    tel := telemetry.NewTelemetry()
    
    _, modifiedFiles, err := editorAgent.Execute(ctx, history, statusCallback)
    
    tel.RecordStep(ctx, telemetry.StepEvent{
        StepIndex:    m.CurrentStepIdx,
        StepName:     step.Title,
        FilesTouched: modifiedFiles,
        Success:      err == nil,
    })
    
    return err
}

Testing

All improvements come with comprehensive unit tests:

Catalog tests:

$ go test -v ./internal/intelligence/...
=== RUN   TestNewModelCatalog
--- PASS: TestNewModelCatalog
=== RUN   TestGetModelsForAgent
--- PASS: TestGetModelsForAgent
=== RUN   TestGetModelInfo
--- PASS: TestGetModelInfo
PASS

Telemetry tests:

$ go test -v ./internal/telemetry/...
=== RUN   TestRecordStep
--- PASS: TestRecordStep
=== RUN   TestUsageTrackerMultipleModels
--- PASS: TestUsageTrackerMultipleModels
PASS

Impact on Existing Workflows

Guided Mode (`gptcode guided`)

Before:

gptcode guided "add auth"
# Validator checks plan whitelist (not actual changes)

After:

gptcode guided "add auth"
# Validator checks actually modified files
# Telemetry records what was changed

Orchestrated Mode (Maestro)

Before:

gptcode auto plan.md
# No file validation on apply_patch
# No telemetry on steps

After:

gptcode auto plan.md
# Both write_file and apply_patch validated
# Every step emits telemetry events

Roadmap to Autonomous Execution

Note: The following features are part of the roadmap and not yet implemented. The improvements described above lay the foundation for these capabilities.

Still needed:

1. GitHub Actions Integration

Trigger execution on issue creation:

on:
  issues:
    types: [opened]
    
jobs:
  gptcode-auto:
    runs-on: ubuntu-latest
    steps:
      - run: gptcode auto --from-issue $

2. Usage Command (Future)

$ gptcode usage
Usage Statistics
================

openai/gpt-4:
  Requests: 15
  Tokens:   45000
  Cost:     $0.45

openrouter/kimi:free:
  Requests: 32
  Tokens:   120000
  Cost:     $0.00

Total:
  Requests: 47
  Tokens:   165000
  Cost:     $0.45

3. Graph Context Injection (Future)

Language-specific heuristics to inject relevant code:

Extract imports/signatures in Go
Include only referenced functions
Limit tokens per file

4. End-to-End Verification (Future)

gptcode verify plan.md
# Runs all success criteria
# Returns structured results

What You Can Do Today

1. Try File Validation

Create a plan with specific files:

gptcode plan "add logout endpoint"
gptcode implement ~/.gptcode/plans/2025-11-27-logout.md

The editor will only modify files mentioned in the plan.

2. Check Telemetry

export GPTCODE_DEBUG=1
gptcode guided "add feature"

Look for step events in stderr.

3. Explore the Catalog

import "gptcode/internal/intelligence"

catalog := intelligence.NewModelCatalog()
for key, info := range catalog.Models {
    fmt.Printf("%s: $%.2f/1M\n", key, info.CostPer1M)
}

4. Run Tests

go test ./internal/intelligence ./internal/telemetry

Community Feedback

We’re building autonomous execution with the community. We’d love to hear:

Which features are most important for autonomous execution?
Should we prioritize GitHub integration or local improvements first?
What telemetry data would be most valuable?

Join the discussion on GitHub.

References

Posted on November 27, 2025. All features tested with unit tests and integration workflows.

Preparing for Autonomous Execution: File Validation, Telemetry & Intelligence

The Vision

What’s New

1. File Validation & Tracking

2. Explicit Success Criteria in Plans

3. Model Capability Catalog

4. OpenTelemetry-Based Telemetry

Technical Deep Dive

File Validation Flow

Model Catalog Structure

Telemetry Integration Points

Testing

Impact on Existing Workflows

Guided Mode (gptcode guided)

Orchestrated Mode (Maestro)

Roadmap to Autonomous Execution

1. GitHub Actions Integration

2. Usage Command (Future)

3. Graph Context Injection (Future)

4. End-to-End Verification (Future)

What You Can Do Today

1. Try File Validation

2. Check Telemetry

3. Explore the Catalog

4. Run Tests

Community Feedback

References

Guided Mode (`gptcode guided`)