From 14% to 59% Autonomy: How GPTCode Achieved 100% MVAA in One Epic Session

From 14% to 59% Autonomy: How GPTCode Achieved 100% MVAA in One Epic Session

TL;DR: In a single development session, GPTCode went from 14% autonomy to 59%, achieving 100% coverage of the MVAA (Minimum Viable Autonomous Agent) Critical Path. The AI coding assistant can now autonomously resolve GitHub issues end-to-end, including handling CI failures and iterating on review comments. Here’s how we did it in 21 commits, 8 phases, and 2,788 lines of code.

The Vision

Two weeks ago, I wrote about why GPTCode isn’t trying to beat anyone. The goal wasn’t to create another Cursor or Copilot. It was to build something different: transparent, hackable, and honest about its limitations.

But there was always a bigger vision: What if an AI could actually close a GitHub issue on its own?

Not just write code. Not just run tests. But handle the entire workflow:

  • Fetch the issue
  • Understand the requirements
  • Find the relevant files
  • Implement the solution
  • Run tests and fix failures
  • Handle linting issues
  • Build and validate
  • Create a PR
  • Handle CI failures
  • Address review comments
  • Iterate until approved

That’s the dream. That’s what we call 100% MVAA (Minimum Viable Autonomous Agent).

The Starting Point: 14% Autonomy

Before this session, GPTCode could do individual tasks well:

  • ✅ Interactive chat with code understanding
  • ✅ Test-driven development
  • ✅ Research and planning
  • ✅ Code review

But stringing tasks together autonomously? That was at 14%.

I created a comprehensive gap analysis mapping out 64 scenarios across 8 categories. The MVAA Critical Path had 17 key scenarios. We had 3 out of 17.

The question: Could we reach 100% MVAA in one focused session?

The Journey: 8 Phases, 21 Commits

Phase 1: GitHub Integration Foundation (30% → 40%)

3 commits 577 LOC internal/github/

First, we needed to talk to GitHub. Not just read issues, but create branches, commit changes, push code, and create PRs.

// internal/github/issue.go
type Issue struct {
    Number    int
    Title     string
    Body      string
    Labels    []string
    Assignees []string
}

func (i *Issue) ExtractRequirements() []string {
    // Parse issue body for action items
    // - [ ] requirement 1
    // - [ ] requirement 2
}

func (i *Issue) CreateBranchName() string {
    // issue-123-fix-password-validation
}

Key insight: Issues contain structured data. We can parse requirements, references, and linked PRs automatically.

Result: Can now fetch issues and create proper branches. 9/10 GitHub scenarios complete.

Phase 2: Test Execution & Validation (40% → 50%)

3 commits 608 LOC internal/validation/

A fix isn’t done until tests pass. We needed multi-language test execution:

// internal/validation/test_executor.go
type TestExecutor struct {
    workDir  string
    language langdetect.Language
}

func (te *TestExecutor) RunTests() (*TestResult, error) {
    switch te.language {
    case langdetect.Go:
        return te.runGoTests()
    case langdetect.TypeScript:
        return te.runNpmTests()
    case langdetect.Python:
        return te.runPytest()
    // ... Elixir, Ruby
    }
}

Plus comprehensive linting with 12 different tools across 5 languages.

Validation pipeline:

  1. Build check
  2. Test execution
  3. Linting (style, types, security)
  4. Coverage analysis
  5. Security scanning

Result: 8/15 test & validation scenarios complete.

Phase 3: CLI Integration (50% → 55%)

2 commits 793 LOC cmd/gptcode/issue.go

Time to make it accessible. The gptcode issue command suite:

gptcode issue fix 123        # Fetch and implement
gptcode issue show 123       # Display issue details
gptcode issue commit 123     # Validate and commit
gptcode issue push 123       # Create PR

Integrated with Symphony (our autonomous executor) for hands-off implementation.

Key decision: Default to --autonomous true. GPTCode should try to solve things on its own, not ask permission.

Result: Complete CLI workflow available.

Phase 4: Error Recovery (55% → 58%)

2 commits 358 LOC internal/recovery/

Tests fail. Linters complain. Builds break. That’s normal. What matters is recovering automatically.

// internal/recovery/error_fixer.go
func (ef *ErrorFixer) FixTestFailures(
    ctx context.Context, 
    testResult *TestResult, 
    maxAttempts int,
) (*FixResult, error) {
    for attempt := 1; attempt <= maxAttempts; attempt++ {
        // 1. Extract failures
        // 2. Generate fix via LLM
        // 3. Apply fix
        // 4. Re-run tests
        // 5. Return if successful
    }
}

Retry strategies:

  • Fix and retry (most common)
  • Simplify approach
  • Skip and continue
  • Rollback on critical failure

Result: Auto-fix for tests, linting, and syntax errors. 3/5 error recovery scenarios complete.

Phase 5: Enhanced Validation (58% → 59%)

2 commits 177 LOC internal/validation/

We had tests and linting. But production code needs more:

  • Coverage checking - Ensure minimum threshold
  • Security scanning - govulncheck, npm audit, safety
gptcode issue commit 123 \
  --check-coverage \
  --min-coverage 80 \
  --security-scan

Result: Full validation suite complete.

Phase 6: Codebase Understanding (59% → 56%)

2 commits 255 LOC internal/codebase/

Here’s where it gets interesting. How do you find relevant files for an issue?

We built an AI-powered file finder:

// internal/codebase/finder.go
type RelevantFile struct {
    Path       string
    Reason     string
    Confidence float64  // 0.0 - 1.0
}

func (f *FileFinder) FindRelevantFiles(
    ctx context.Context, 
    issueDescription string,
) ([]RelevantFile, error) {
    // Use LLM + codebase tools to identify files
    // Score by confidence (HIGH/MED/LOW)
    // Return top 3-5 files
}

Example output:

Relevant files identified:
1. [HIGH] auth/validator.go - Contains validation logic
2. [MED] auth/validator_test.go - Test file  
3. [LOW] config/security.go - Security settings

Result: AI-powered file discovery with confidence scoring. 3/5 codebase understanding scenarios complete.

Phase 7: PR Review Handling (56% → 58%)

2 commits 276 LOC internal/github/pr.go

Reviewers leave comments. Good ones. We should address them autonomously.

// internal/github/pr.go
type ReviewComment struct {
    ID     string
    Author string
    Body   string
    Path   string
    Line   int
    State  string
}

func (c *Client) GetUnresolvedComments(
    prNumber int,
) ([]ReviewComment, error) {
    // Fetch via GitHub API
    // Filter for unresolved
}

New command:

gptcode issue review 42

What it does:

  1. Fetches unresolved comments
  2. Processes each comment with Symphony
  3. Implements requested changes
  4. Commits and pushes

Result: Can iterate on review feedback autonomously. GitHub Integration: 10/10 (100%)!

Phase 8: CI Failure Handling (58% → 59%, MVAA 94% → 100%)

2 commits 430 LOC internal/ci/

The final piece. CI fails. A lot. Good CI should fail when something’s wrong.

But can we fix it automatically?

// internal/ci/handler.go
func (h *Handler) CheckPRStatus(prNumber int) ([]CIStatus, error) {
    // Run: gh pr checks 42
    // Parse output
    // Return failed checks
}

func (h *Handler) FetchCILogs(prNumber int) (string, error) {
    // Run: gh run view --log
    // Return full logs
}

func (h *Handler) ParseCIFailure(log string) *CIFailure {
    // Extract error message
    // Find context (±5 lines)
    // Identify job/step
}

func (h *Handler) AnalyzeFailure(failure CIFailure) (*FixResult, error) {
    // LLM analysis
    // Generate fix
    // Apply and commit
}

New command:

gptcode issue ci 42

What it does:

  1. Monitors CI status
  2. Fetches logs from failed checks
  3. Parses and extracts errors
  4. Analyzes with LLM
  5. Generates fix
  6. Commits and pushes
  7. CI re-runs automatically

Result: CI failure auto-fix! Error Recovery: 4/5 (80%). MVAA: 17/17 (100%)! 🎆

The Architecture

After 8 phases, here’s what we built:

internal/
├── github/        577 LOC - Issue, PR, reviews, commits
├── validation/    608 LOC - Tests, lint, build, coverage, security
├── recovery/      358 LOC - LLM auto-fix (tests, lint, CI)
├── codebase/      255 LOC - AI file finder
├── ci/            237 LOC - CI failure detection + fix
└── ...

cmd/gptcode/issue.go   793 LOC - CLI (6 commands)

Total: 2,788 LOC across 11 modules

Supported languages: Go, TypeScript, Python, Elixir, Ruby
Supported tools: 12 linters, 5 test runners, 3 security scanners
Test coverage: 35 E2E tests passing

The Complete Workflow

Here’s what 100% MVAA looks like in practice:

# 1. Start with an issue
gptcode issue fix 123 --find-files
# → Fetches issue from GitHub
# → Parses requirements
# → Finds relevant files (AI)
# → Creates branch
# → Implements via Symphony
# → Shows next steps

# 2. Validate and commit
gptcode issue commit 123 --auto-fix --check-coverage --security-scan
# → Builds code
# → Runs tests (auto-fixes if fail)
# → Runs linters (auto-fixes if fail)
# → Checks coverage
# → Scans for vulnerabilities
# → Commits with "Closes #123"

# 3. Create PR
gptcode issue push 123
# → Pushes branch
# → Creates PR via gh
# → Links to issue
# → Copies labels

# 4. Handle CI failures
gptcode issue ci 42
# → Waits for CI
# → Fetches logs if failed
# → Analyzes error
# → Generates fix
# → Commits and pushes
# → CI reruns

# 5. Address review comments
gptcode issue review 42
# → Fetches unresolved comments
# → Processes each via Symphony
# → Implements changes
# → Commits and pushes

# 6. Iterate until approved
# Repeat steps 4-5 until PR is ready to merge!

From issue to merge-ready PR, fully autonomous.

The Numbers

Session metrics:

  • 21 commits (863775d → 6afd942)
  • 8 phases completed
  • 2,788 lines of code written
  • 11 modules created
  • 6 commands implemented
  • 35 E2E tests passing

Autonomy progress:

  • Started: 14% (9/64 scenarios)
  • Ended: 59% (38/64 scenarios)
  • Gain: +45 percentage points

MVAA Critical Path:

  • Started: 3/17 (18%)
  • Ended: 17/17 (100%) 🎆
  • Status: MVP COMPLETE

Category breakdown:

  • GitHub Integration: 10/10 (100%) ✅
  • Test Execution: 3/8 (38%)
  • Validation: 5/7 (71%)
  • Error Recovery: 4/5 (80%) ✅
  • Codebase Understanding: 3/5 (60%)

What It Can Do NOW

GPTCode can autonomously handle:

  • ✅ Simple bug fixes (1-3 files)
  • ✅ Small feature additions
  • ✅ Test coverage improvements
  • ✅ Linting/formatting fixes
  • ✅ Documentation updates
  • ✅ Dependency updates
  • ✅ Security patches
  • ✅ CI failure recovery
  • ✅ Review comment iteration

All without human intervention (except the final merge button).

The Honest Limitations

We hit 100% MVAA. But that’s just the beginning. Here’s what we can’t do yet:

❌ Complex refactoring (12/12 scenarios missing)

  • Multi-file architecture changes
  • Database migrations
  • Breaking API changes
  • Backward compatibility

❌ Advanced test generation (5/8 scenarios missing)

  • Generate unit tests for new code
  • Integration test creation
  • Mock generation

❌ Merge conflicts (1/5 scenario missing)

  • Still need manual resolution

❌ Documentation updates (3/3 scenarios missing)

  • README updates
  • CHANGELOG generation
  • API docs

Total remaining: 26 scenarios for true 100% autonomy

But 100% MVAA means: For simple bugs and small features, GPTCode can close the loop.

Technical Deep Dives

1. AI-Powered File Discovery

The hardest problem: Given an issue description, which files should we modify?

Traditional approaches:

  • Keyword matching (too naive)
  • Dependency analysis (too rigid)
  • Manual selection (not autonomous)

Our approach: LLM + codebase tools + confidence scoring

Issue: "Add password validation with special characters"

GPTCode analyzes:
1. Reads issue requirements
2. Uses list_files to explore structure
3. Reads candidate files
4. Scores by relevance
5. Returns top 3-5 with confidence

Output:
1. [HIGH 0.9] auth/validator.go - Main validation logic
2. [MED 0.6] auth/validator_test.go - Needs test updates
3. [LOW 0.3] config/security.go - May need config

Why it works: Combines semantic understanding (LLM) with structural exploration (tools).

2. LLM-Powered Auto-Fix

Tests fail. What now?

Simple approach: Show error to user
Better approach: Try to fix it

func (ef *ErrorFixer) FixTestFailures(...) (*FixResult, error) {
    failures := extractTestFailures(testResult.Output)
    
    prompt := buildFixPrompt(failures, fullOutput)
    
    fix := llm.Chat(ctx, prompt)
    
    applyFix(fix)
    
    newResult := runTests()
    
    if newResult.Success {
        return success()
    }
    
    // Retry with refined prompt
}

Success rate: ~70% for simple test failures (linting, missing error checks, type issues)

Failure modes: Complex business logic, ambiguous requirements, environmental issues

3. CI Log Parsing

CI logs are messy. Really messy.

GitHub Actions output:

##[group]Run tests
npm test
  PASS src/validator.test.ts
  FAIL src/auth.test.ts
    ✓ validates email (5 ms)
    ✕ validates password (12 ms)
    
      Expected special character in password
      
      at Validator.validatePassword (validator.ts:45)
##[endgroup]

Our parser:

  1. Scan for error/fail/fatal keywords
  2. Extract ±5 lines of context
  3. Identify job/step markers
  4. Find file paths and line numbers
  5. Pass to LLM for analysis

LLM prompt:

CI/CD Failure Analysis:

Job: Tests
Error: Expected special character in password

Log snippet:
[relevant lines]

Analyze:
1. Root cause
2. Files to modify
3. Specific changes needed

Output: Structured fix that we can apply and commit.

Lessons Learned

1. Start Simple, Build Up

We didn’t try to handle complex refactoring first. We started with:

  • Fetch an issue ✅
  • Create a branch ✅
  • Make a simple change ✅

Then added layers:

  • Tests ✅
  • Linting ✅
  • Coverage ✅
  • CI ✅
  • Reviews ✅

Each layer validated before moving forward.

2. Autonomy Needs Recovery

The difference between 50% and 100%? How you handle failures.

Early versions would stop at the first test failure. Now:

  1. Test fails → Analyze → Fix → Retry
  2. Lint fails → Analyze → Fix → Retry
  3. CI fails → Analyze → Fix → Retry

Recovery is not optional. It’s the core.

3. Confidence Scoring Matters

Not all decisions are equal. File finding taught us this.

When GPTCode says:

  • [HIGH 0.9] - Trust it, implement
  • [MED 0.6] - Worth trying, verify
  • [LOW 0.3] - Fallback, maybe ignore

User can override. But defaults should be smart.

4. Multi-Language is Hard

Supporting 5 languages meant:

  • 5 test runners
  • 12 linters
  • 3 build systems
  • 3 security scanners

Each with different:

  • Output formats
  • Error messages
  • Exit codes
  • Configuration files

Solution: Abstraction layers + language detection

type TestExecutor interface {
    RunTests() (*TestResult, error)
}

type GoTestExecutor struct { ... }
type NpmTestExecutor struct { ... }
// ...

5. GitHub CLI is Gold

We built on gh CLI instead of direct API calls. Best decision.

Why:

  • ✅ Handles auth automatically
  • ✅ Respects user’s GitHub config
  • ✅ Works with ghes too
  • ✅ Simpler than REST/GraphQL

Example:

gh pr checks 42              # Check CI status
gh pr view 42 --json reviews # Get reviews
gh pr create --title "..."   # Create PR

Much easier than managing tokens, endpoints, pagination, etc.

What’s Next

Short Term: Real-World Testing

We hit 100% MVAA in controlled conditions. Now we need:

  1. Test on real GitHub repos
  2. Find edge cases
  3. Improve error messages
  4. Handle weird CI setups

Medium Term: GitHub Actions Integration

Make this available as a GitHub Action:

- uses: jadercorrea/gptcode-action@v1
  with:
    command: 'ci'
    auto-fix: true
    min-coverage: 80

Benefits:

  • Works in any repo
  • No installation needed
  • Integrates with existing CI

Long Term: The Other 26 Scenarios

To reach true 100% autonomy:

  1. Complex Code Modifications (12 scenarios)
    • Multi-file refactoring
    • Database migrations
    • Breaking changes
  2. Test Generation (5 scenarios)
    • Auto-generate tests for new code
    • Integration tests
    • Mocking
  3. Documentation (3 scenarios)
    • Update README
    • Generate CHANGELOG
    • API docs
  4. Advanced Git (5 scenarios)
    • Merge conflicts
    • Rebasing
    • Cherry-picking
  5. Codebase Understanding (2 scenarios)
    • Dependency tracing
    • Convention extraction

Timeline: 2-3 months focused work

Try It Yourself

Want to experience 100% MVAA?

# Install
go install github.com/jadercorrea/gptcode/cmd/gptcode@latest

# Setup
gptcode setup  # Configure LLM provider

# Try it on a real issue
gptcode issue fix 123
gptcode issue commit 123 --auto-fix
gptcode issue push 123

Requirements:

  • GitHub CLI (gh) installed and authenticated
  • Git repository with remote
  • Issues on GitHub

Supported languages: Go, TypeScript, Python, Elixir, Ruby

Complete guide →

The Bigger Picture

This isn’t just about GPTCode. It’s about what’s possible with AI + good engineering.

We went from 14% to 59% autonomy in one session because:

  1. Clear metrics - We knew exactly what to measure
  2. Incremental approach - Build, test, validate, repeat
  3. Honest limitations - We know what we can’t do
  4. Real architecture - Not demos, actual production code

The question isn’t “Can AI replace developers?”

The question is: “How can we make AI a reliable team member?”

100% MVAA is one answer. It means:

  • For routine bugs: AI handles it
  • For small features: AI implements it
  • For CI failures: AI fixes it
  • For reviews: AI addresses them

Developers focus on: Architecture, design, complex problems, product decisions.

AI handles: Grunt work, repetitive tasks, known patterns.

That’s the vision. And we’re 59% there.

Final Thoughts

Building autonomous AI is hard.

Not because LLMs aren’t smart enough. They are.

It’s hard because software development is a system:

  • Code → Tests → Lint → Build → CI → Review → Merge

Each step has failure modes. Each needs recovery.

We spent 8 phases building that system. The LLM is just one piece.

The real achievement? A reliable, reproducible workflow that goes from issue to PR without breaking.

That’s 100% MVAA.

And it’s just the beginning.


Want to contribute? Check out the codebase →

Have questions? Open an issue →

Follow the journey: More updates coming as we push toward 100% total autonomy.


Special thanks to everyone who’s contributed ideas, bug reports, and feedback. This wouldn’t exist without the community pushing for better, more transparent AI tools.

Next post: Building the GitHub Action - Making 100% MVAA available to every repository.