From 14% to 59% Autonomy: How GPTCode Achieved 100% MVAA in One Epic Session

TL;DR: In a single development session, GPTCode went from 14% autonomy to 59%, achieving 100% coverage of the MVAA (Minimum Viable Autonomous Agent) Critical Path. The AI coding assistant can now autonomously resolve GitHub issues end-to-end, including handling CI failures and iterating on review comments. Here’s how we did it in 21 commits, 8 phases, and 2,788 lines of code.

The Vision

Two weeks ago, I wrote about why GPTCode isn’t trying to beat anyone. The goal wasn’t to create another Cursor or Copilot. It was to build something different: transparent, hackable, and honest about its limitations.

But there was always a bigger vision: What if an AI could actually close a GitHub issue on its own?

Not just write code. Not just run tests. But handle the entire workflow:

Fetch the issue
Understand the requirements
Find the relevant files
Implement the solution
Run tests and fix failures
Handle linting issues
Build and validate
Create a PR
Handle CI failures
Address review comments
Iterate until approved

That’s the dream. That’s what we call 100% MVAA (Minimum Viable Autonomous Agent).

The Starting Point: 14% Autonomy

Before this session, GPTCode could do individual tasks well:

✅ Interactive chat with code understanding
✅ Test-driven development
✅ Research and planning
✅ Code review

But stringing tasks together autonomously? That was at 14%.

I created a comprehensive gap analysis mapping out 64 scenarios across 8 categories. The MVAA Critical Path had 17 key scenarios. We had 3 out of 17.

The question: Could we reach 100% MVAA in one focused session?

The Journey: 8 Phases, 21 Commits

Phase 1: GitHub Integration Foundation (30% → 40%)

3 commits 577 LOC internal/github/

First, we needed to talk to GitHub. Not just read issues, but create branches, commit changes, push code, and create PRs.

// internal/github/issue.go
type Issue struct {
    Number    int
    Title     string
    Body      string
    Labels    []string
    Assignees []string
}

func (i *Issue) ExtractRequirements() []string {
    // Parse issue body for action items
    // - [ ] requirement 1
    // - [ ] requirement 2
}

func (i *Issue) CreateBranchName() string {
    // issue-123-fix-password-validation
}

Key insight: Issues contain structured data. We can parse requirements, references, and linked PRs automatically.

Result: Can now fetch issues and create proper branches. 9/10 GitHub scenarios complete.

Phase 2: Test Execution & Validation (40% → 50%)

3 commits 608 LOC internal/validation/

A fix isn’t done until tests pass. We needed multi-language test execution:

// internal/validation/test_executor.go
type TestExecutor struct {
    workDir  string
    language langdetect.Language
}

func (te *TestExecutor) RunTests() (*TestResult, error) {
    switch te.language {
    case langdetect.Go:
        return te.runGoTests()
    case langdetect.TypeScript:
        return te.runNpmTests()
    case langdetect.Python:
        return te.runPytest()
    // ... Elixir, Ruby
    }
}

Plus comprehensive linting with 12 different tools across 5 languages.

Validation pipeline:

Build check
Test execution
Linting (style, types, security)
Coverage analysis
Security scanning

Result: 8/15 test & validation scenarios complete.

Phase 3: CLI Integration (50% → 55%)

2 commits 793 LOC cmd/gptcode/issue.go

Time to make it accessible. The gptcode issue command suite:

gptcode issue fix 123        # Fetch and implement
gptcode issue show 123       # Display issue details
gptcode issue commit 123     # Validate and commit
gptcode issue push 123       # Create PR

Integrated with Symphony (our autonomous executor) for hands-off implementation.

Key decision: Default to --autonomous true. GPTCode should try to solve things on its own, not ask permission.

Result: Complete CLI workflow available.

Phase 4: Error Recovery (55% → 58%)

2 commits 358 LOC internal/recovery/

Tests fail. Linters complain. Builds break. That’s normal. What matters is recovering automatically.

// internal/recovery/error_fixer.go
func (ef *ErrorFixer) FixTestFailures(
    ctx context.Context, 
    testResult *TestResult, 
    maxAttempts int,
) (*FixResult, error) {
    for attempt := 1; attempt <= maxAttempts; attempt++ {
        // 1. Extract failures
        // 2. Generate fix via LLM
        // 3. Apply fix
        // 4. Re-run tests
        // 5. Return if successful
    }
}

Retry strategies:

Fix and retry (most common)
Simplify approach
Skip and continue
Rollback on critical failure

Result: Auto-fix for tests, linting, and syntax errors. 3/5 error recovery scenarios complete.

Phase 5: Enhanced Validation (58% → 59%)

2 commits 177 LOC internal/validation/

We had tests and linting. But production code needs more:

Coverage checking - Ensure minimum threshold
Security scanning - govulncheck, npm audit, safety

gptcode issue commit 123 \
  --check-coverage \
  --min-coverage 80 \
  --security-scan

Result: Full validation suite complete.

Phase 6: Codebase Understanding (59% → 56%)

2 commits 255 LOC internal/codebase/

Here’s where it gets interesting. How do you find relevant files for an issue?

We built an AI-powered file finder:

// internal/codebase/finder.go
type RelevantFile struct {
    Path       string
    Reason     string
    Confidence float64  // 0.0 - 1.0
}

func (f *FileFinder) FindRelevantFiles(
    ctx context.Context, 
    issueDescription string,
) ([]RelevantFile, error) {
    // Use LLM + codebase tools to identify files
    // Score by confidence (HIGH/MED/LOW)
    // Return top 3-5 files
}

Example output:

Relevant files identified:
[HIGH] auth/validator.go - Contains validation logic
[MED] auth/validator_test.go - Test file  
[LOW] config/security.go - Security settings

Result: AI-powered file discovery with confidence scoring. 3/5 codebase understanding scenarios complete.

Phase 7: PR Review Handling (56% → 58%)

2 commits 276 LOC internal/github/pr.go

Reviewers leave comments. Good ones. We should address them autonomously.

// internal/github/pr.go
type ReviewComment struct {
    ID     string
    Author string
    Body   string
    Path   string
    Line   int
    State  string
}

func (c *Client) GetUnresolvedComments(
    prNumber int,
) ([]ReviewComment, error) {
    // Fetch via GitHub API
    // Filter for unresolved
}

New command:

gptcode issue review 42

What it does:

Fetches unresolved comments
Processes each comment with Symphony
Implements requested changes
Commits and pushes

Result: Can iterate on review feedback autonomously. GitHub Integration: 10/10 (100%)!

Phase 8: CI Failure Handling (58% → 59%, MVAA 94% → 100%)

2 commits 430 LOC internal/ci/

The final piece. CI fails. A lot. Good CI should fail when something’s wrong.

But can we fix it automatically?

// internal/ci/handler.go
func (h *Handler) CheckPRStatus(prNumber int) ([]CIStatus, error) {
    // Run: gh pr checks 42
    // Parse output
    // Return failed checks
}

func (h *Handler) FetchCILogs(prNumber int) (string, error) {
    // Run: gh run view --log
    // Return full logs
}

func (h *Handler) ParseCIFailure(log string) *CIFailure {
    // Extract error message
    // Find context (±5 lines)
    // Identify job/step
}

func (h *Handler) AnalyzeFailure(failure CIFailure) (*FixResult, error) {
    // LLM analysis
    // Generate fix
    // Apply and commit
}

New command:

gptcode issue ci 42

What it does:

Monitors CI status
Fetches logs from failed checks
Parses and extracts errors
Analyzes with LLM
Generates fix
Commits and pushes
CI re-runs automatically

Result: CI failure auto-fix! Error Recovery: 4/5 (80%). MVAA: 17/17 (100%)! 🎆

The Architecture

After 8 phases, here’s what we built:

internal/
├── github/        577 LOC - Issue, PR, reviews, commits
├── validation/    608 LOC - Tests, lint, build, coverage, security
├── recovery/      358 LOC - LLM auto-fix (tests, lint, CI)
├── codebase/      255 LOC - AI file finder
├── ci/            237 LOC - CI failure detection + fix
└── ...

cmd/gptcode/issue.go   793 LOC - CLI (6 commands)

Total: 2,788 LOC across 11 modules

Supported languages: Go, TypeScript, Python, Elixir, Ruby
Supported tools: 12 linters, 5 test runners, 3 security scanners
Test coverage: 35 E2E tests passing

The Complete Workflow

Here’s what 100% MVAA looks like in practice:

# 1. Start with an issue
gptcode issue fix 123 --find-files
# → Fetches issue from GitHub
# → Parses requirements
# → Finds relevant files (AI)
# → Creates branch
# → Implements via Symphony
# → Shows next steps

# 2. Validate and commit
gptcode issue commit 123 --auto-fix --check-coverage --security-scan
# → Builds code
# → Runs tests (auto-fixes if fail)
# → Runs linters (auto-fixes if fail)
# → Checks coverage
# → Scans for vulnerabilities
# → Commits with "Closes #123"

# 3. Create PR
gptcode issue push 123
# → Pushes branch
# → Creates PR via gh
# → Links to issue
# → Copies labels

# 4. Handle CI failures
gptcode issue ci 42
# → Waits for CI
# → Fetches logs if failed
# → Analyzes error
# → Generates fix
# → Commits and pushes
# → CI reruns

# 5. Address review comments
gptcode issue review 42
# → Fetches unresolved comments
# → Processes each via Symphony
# → Implements changes
# → Commits and pushes

# 6. Iterate until approved
# Repeat steps 4-5 until PR is ready to merge!

From issue to merge-ready PR, fully autonomous.

The Numbers

Session metrics:

21 commits (863775d → 6afd942)
8 phases completed
2,788 lines of code written
11 modules created
6 commands implemented
35 E2E tests passing

Autonomy progress:

Started: 14% (9/64 scenarios)
Ended: 59% (38/64 scenarios)
Gain: +45 percentage points

MVAA Critical Path:

Started: 3/17 (18%)
Ended: 17/17 (100%) 🎆
Status: MVP COMPLETE

Category breakdown:

GitHub Integration: 10/10 (100%) ✅
Test Execution: 3/8 (38%)
Validation: 5/7 (71%)
Error Recovery: 4/5 (80%) ✅
Codebase Understanding: 3/5 (60%)

What It Can Do NOW

GPTCode can autonomously handle:

✅ Simple bug fixes (1-3 files)
✅ Small feature additions
✅ Test coverage improvements
✅ Linting/formatting fixes
✅ Documentation updates
✅ Dependency updates
✅ Security patches
✅ CI failure recovery
✅ Review comment iteration

All without human intervention (except the final merge button).

The Honest Limitations

We hit 100% MVAA. But that’s just the beginning. Here’s what we can’t do yet:

❌ Complex refactoring (12/12 scenarios missing)

Multi-file architecture changes
Database migrations
Breaking API changes
Backward compatibility

❌ Advanced test generation (5/8 scenarios missing)

Generate unit tests for new code
Integration test creation
Mock generation

❌ Merge conflicts (1/5 scenario missing)

Still need manual resolution

❌ Documentation updates (3/3 scenarios missing)

README updates
CHANGELOG generation
API docs

Total remaining: 26 scenarios for true 100% autonomy

But 100% MVAA means: For simple bugs and small features, GPTCode can close the loop.

Technical Deep Dives

1. AI-Powered File Discovery

The hardest problem: Given an issue description, which files should we modify?

Traditional approaches:

Keyword matching (too naive)
Dependency analysis (too rigid)
Manual selection (not autonomous)

Our approach: LLM + codebase tools + confidence scoring

Issue: "Add password validation with special characters"

GPTCode analyzes:
Reads issue requirements
Uses list_files to explore structure
Reads candidate files
Scores by relevance
Returns top 3-5 with confidence

Output:
[HIGH 0.9] auth/validator.go - Main validation logic
[MED 0.6] auth/validator_test.go - Needs test updates
[LOW 0.3] config/security.go - May need config

Why it works: Combines semantic understanding (LLM) with structural exploration (tools).

2. LLM-Powered Auto-Fix

Tests fail. What now?

Simple approach: Show error to user
Better approach: Try to fix it

func (ef *ErrorFixer) FixTestFailures(...) (*FixResult, error) {
    failures := extractTestFailures(testResult.Output)
    
    prompt := buildFixPrompt(failures, fullOutput)
    
    fix := llm.Chat(ctx, prompt)
    
    applyFix(fix)
    
    newResult := runTests()
    
    if newResult.Success {
        return success()
    }
    
    // Retry with refined prompt
}

Success rate: ~70% for simple test failures (linting, missing error checks, type issues)

Failure modes: Complex business logic, ambiguous requirements, environmental issues

3. CI Log Parsing

CI logs are messy. Really messy.

GitHub Actions output:

##[group]Run tests
npm test
  PASS src/validator.test.ts
  FAIL src/auth.test.ts
    ✓ validates email (5 ms)
    ✕ validates password (12 ms)
    
      Expected special character in password
      
      at Validator.validatePassword (validator.ts:45)
##[endgroup]

Our parser:

Scan for error/fail/fatal keywords
Extract ±5 lines of context
Identify job/step markers
Find file paths and line numbers
Pass to LLM for analysis

LLM prompt:

CI/CD Failure Analysis:

Job: Tests
Error: Expected special character in password

Log snippet:
[relevant lines]

Analyze:
1. Root cause
2. Files to modify
3. Specific changes needed

Output: Structured fix that we can apply and commit.

Lessons Learned

1. Start Simple, Build Up

We didn’t try to handle complex refactoring first. We started with:

Fetch an issue ✅
Create a branch ✅
Make a simple change ✅

Then added layers:

Tests ✅
Linting ✅
Coverage ✅
CI ✅
Reviews ✅

Each layer validated before moving forward.

2. Autonomy Needs Recovery

The difference between 50% and 100%? How you handle failures.

Early versions would stop at the first test failure. Now:

Test fails → Analyze → Fix → Retry
Lint fails → Analyze → Fix → Retry
CI fails → Analyze → Fix → Retry

Recovery is not optional. It’s the core.

3. Confidence Scoring Matters

Not all decisions are equal. File finding taught us this.

When GPTCode says:

[HIGH 0.9] - Trust it, implement
[MED 0.6] - Worth trying, verify
[LOW 0.3] - Fallback, maybe ignore

User can override. But defaults should be smart.

4. Multi-Language is Hard

Supporting 5 languages meant:

5 test runners
12 linters
3 build systems
3 security scanners

Each with different:

Output formats
Error messages
Exit codes
Configuration files

Solution: Abstraction layers + language detection

type TestExecutor interface {
    RunTests() (*TestResult, error)
}

type GoTestExecutor struct { ... }
type NpmTestExecutor struct { ... }
// ...

5. GitHub CLI is Gold

We built on gh CLI instead of direct API calls. Best decision.

Why:

✅ Handles auth automatically
✅ Respects user’s GitHub config
✅ Works with ghes too
✅ Simpler than REST/GraphQL

Example:

gh pr checks 42              # Check CI status
gh pr view 42 --json reviews # Get reviews
gh pr create --title "..."   # Create PR

Much easier than managing tokens, endpoints, pagination, etc.

What’s Next

Short Term: Real-World Testing

We hit 100% MVAA in controlled conditions. Now we need:

Test on real GitHub repos
Find edge cases
Improve error messages
Handle weird CI setups

Medium Term: GitHub Actions Integration

Make this available as a GitHub Action:

- uses: jadercorrea/gptcode-action@v1
  with:
    command: 'ci'
    auto-fix: true
    min-coverage: 80

Benefits:

Works in any repo
No installation needed
Integrates with existing CI

Long Term: The Other 26 Scenarios

To reach true 100% autonomy:

Complex Code Modifications (12 scenarios)
- Multi-file refactoring
- Database migrations
- Breaking changes
Test Generation (5 scenarios)
- Auto-generate tests for new code
- Integration tests
- Mocking
Documentation (3 scenarios)
- Update README
- Generate CHANGELOG
- API docs
Advanced Git (5 scenarios)
- Merge conflicts
- Rebasing
- Cherry-picking
Codebase Understanding (2 scenarios)
- Dependency tracing
- Convention extraction

Timeline: 2-3 months focused work

Try It Yourself

Want to experience 100% MVAA?

# Install
go install github.com/jadercorrea/gptcode/cmd/gptcode@latest

# Setup
gptcode setup  # Configure LLM provider

# Try it on a real issue
gptcode issue fix 123
gptcode issue commit 123 --auto-fix
gptcode issue push 123

Requirements:

GitHub CLI (gh) installed and authenticated
Git repository with remote
Issues on GitHub

Supported languages: Go, TypeScript, Python, Elixir, Ruby

Complete guide →

The Bigger Picture

This isn’t just about GPTCode. It’s about what’s possible with AI + good engineering.

We went from 14% to 59% autonomy in one session because:

Clear metrics - We knew exactly what to measure
Incremental approach - Build, test, validate, repeat
Honest limitations - We know what we can’t do
Real architecture - Not demos, actual production code

The question isn’t “Can AI replace developers?”

The question is: “How can we make AI a reliable team member?”

100% MVAA is one answer. It means:

For routine bugs: AI handles it
For small features: AI implements it
For CI failures: AI fixes it
For reviews: AI addresses them

Developers focus on: Architecture, design, complex problems, product decisions.

AI handles: Grunt work, repetitive tasks, known patterns.

That’s the vision. And we’re 59% there.

Final Thoughts

Building autonomous AI is hard.

Not because LLMs aren’t smart enough. They are.

It’s hard because software development is a system:

Code → Tests → Lint → Build → CI → Review → Merge

Each step has failure modes. Each needs recovery.

We spent 8 phases building that system. The LLM is just one piece.

The real achievement? A reliable, reproducible workflow that goes from issue to PR without breaking.

That’s 100% MVAA.

And it’s just the beginning.

Want to contribute? Check out the codebase →

Have questions? Open an issue →

Follow the journey: More updates coming as we push toward 100% total autonomy.

Special thanks to everyone who’s contributed ideas, bug reports, and feedback. This wouldn’t exist without the community pushing for better, more transparent AI tools.

Next post: Building the GitHub Action - Making 100% MVAA available to every repository.