From 14% to 59% Autonomy: How GPTCode Achieved 100% MVAA in One Epic Session
From 14% to 59% Autonomy: How GPTCode Achieved 100% MVAA in One Epic Session
TL;DR: In a single development session, GPTCode went from 14% autonomy to 59%, achieving 100% coverage of the MVAA (Minimum Viable Autonomous Agent) Critical Path. The AI coding assistant can now autonomously resolve GitHub issues end-to-end, including handling CI failures and iterating on review comments. Here’s how we did it in 21 commits, 8 phases, and 2,788 lines of code.
The Vision
Two weeks ago, I wrote about why GPTCode isn’t trying to beat anyone. The goal wasn’t to create another Cursor or Copilot. It was to build something different: transparent, hackable, and honest about its limitations.
But there was always a bigger vision: What if an AI could actually close a GitHub issue on its own?
Not just write code. Not just run tests. But handle the entire workflow:
- Fetch the issue
- Understand the requirements
- Find the relevant files
- Implement the solution
- Run tests and fix failures
- Handle linting issues
- Build and validate
- Create a PR
- Handle CI failures
- Address review comments
- Iterate until approved
That’s the dream. That’s what we call 100% MVAA (Minimum Viable Autonomous Agent).
The Starting Point: 14% Autonomy
Before this session, GPTCode could do individual tasks well:
- ✅ Interactive chat with code understanding
- ✅ Test-driven development
- ✅ Research and planning
- ✅ Code review
But stringing tasks together autonomously? That was at 14%.
I created a comprehensive gap analysis mapping out 64 scenarios across 8 categories. The MVAA Critical Path had 17 key scenarios. We had 3 out of 17.
The question: Could we reach 100% MVAA in one focused session?
The Journey: 8 Phases, 21 Commits
Phase 1: GitHub Integration Foundation (30% → 40%)
| 3 commits | 577 LOC | internal/github/ |
First, we needed to talk to GitHub. Not just read issues, but create branches, commit changes, push code, and create PRs.
// internal/github/issue.go
type Issue struct {
Number int
Title string
Body string
Labels []string
Assignees []string
}
func (i *Issue) ExtractRequirements() []string {
// Parse issue body for action items
// - [ ] requirement 1
// - [ ] requirement 2
}
func (i *Issue) CreateBranchName() string {
// issue-123-fix-password-validation
}
Key insight: Issues contain structured data. We can parse requirements, references, and linked PRs automatically.
Result: Can now fetch issues and create proper branches. 9/10 GitHub scenarios complete.
Phase 2: Test Execution & Validation (40% → 50%)
| 3 commits | 608 LOC | internal/validation/ |
A fix isn’t done until tests pass. We needed multi-language test execution:
// internal/validation/test_executor.go
type TestExecutor struct {
workDir string
language langdetect.Language
}
func (te *TestExecutor) RunTests() (*TestResult, error) {
switch te.language {
case langdetect.Go:
return te.runGoTests()
case langdetect.TypeScript:
return te.runNpmTests()
case langdetect.Python:
return te.runPytest()
// ... Elixir, Ruby
}
}
Plus comprehensive linting with 12 different tools across 5 languages.
Validation pipeline:
- Build check
- Test execution
- Linting (style, types, security)
- Coverage analysis
- Security scanning
Result: 8/15 test & validation scenarios complete.
Phase 3: CLI Integration (50% → 55%)
| 2 commits | 793 LOC | cmd/gptcode/issue.go |
Time to make it accessible. The gptcode issue command suite:
gptcode issue fix 123 # Fetch and implement
gptcode issue show 123 # Display issue details
gptcode issue commit 123 # Validate and commit
gptcode issue push 123 # Create PR
Integrated with Symphony (our autonomous executor) for hands-off implementation.
Key decision: Default to --autonomous true. GPTCode should try to solve things on its own, not ask permission.
Result: Complete CLI workflow available.
Phase 4: Error Recovery (55% → 58%)
| 2 commits | 358 LOC | internal/recovery/ |
Tests fail. Linters complain. Builds break. That’s normal. What matters is recovering automatically.
// internal/recovery/error_fixer.go
func (ef *ErrorFixer) FixTestFailures(
ctx context.Context,
testResult *TestResult,
maxAttempts int,
) (*FixResult, error) {
for attempt := 1; attempt <= maxAttempts; attempt++ {
// 1. Extract failures
// 2. Generate fix via LLM
// 3. Apply fix
// 4. Re-run tests
// 5. Return if successful
}
}
Retry strategies:
- Fix and retry (most common)
- Simplify approach
- Skip and continue
- Rollback on critical failure
Result: Auto-fix for tests, linting, and syntax errors. 3/5 error recovery scenarios complete.
Phase 5: Enhanced Validation (58% → 59%)
| 2 commits | 177 LOC | internal/validation/ |
We had tests and linting. But production code needs more:
- Coverage checking - Ensure minimum threshold
- Security scanning - govulncheck, npm audit, safety
gptcode issue commit 123 \
--check-coverage \
--min-coverage 80 \
--security-scan
Result: Full validation suite complete.
Phase 6: Codebase Understanding (59% → 56%)
| 2 commits | 255 LOC | internal/codebase/ |
Here’s where it gets interesting. How do you find relevant files for an issue?
We built an AI-powered file finder:
// internal/codebase/finder.go
type RelevantFile struct {
Path string
Reason string
Confidence float64 // 0.0 - 1.0
}
func (f *FileFinder) FindRelevantFiles(
ctx context.Context,
issueDescription string,
) ([]RelevantFile, error) {
// Use LLM + codebase tools to identify files
// Score by confidence (HIGH/MED/LOW)
// Return top 3-5 files
}
Example output:
Relevant files identified:
1. [HIGH] auth/validator.go - Contains validation logic
2. [MED] auth/validator_test.go - Test file
3. [LOW] config/security.go - Security settings
Result: AI-powered file discovery with confidence scoring. 3/5 codebase understanding scenarios complete.
Phase 7: PR Review Handling (56% → 58%)
| 2 commits | 276 LOC | internal/github/pr.go |
Reviewers leave comments. Good ones. We should address them autonomously.
// internal/github/pr.go
type ReviewComment struct {
ID string
Author string
Body string
Path string
Line int
State string
}
func (c *Client) GetUnresolvedComments(
prNumber int,
) ([]ReviewComment, error) {
// Fetch via GitHub API
// Filter for unresolved
}
New command:
gptcode issue review 42
What it does:
- Fetches unresolved comments
- Processes each comment with Symphony
- Implements requested changes
- Commits and pushes
Result: Can iterate on review feedback autonomously. GitHub Integration: 10/10 (100%)!
Phase 8: CI Failure Handling (58% → 59%, MVAA 94% → 100%)
| 2 commits | 430 LOC | internal/ci/ |
The final piece. CI fails. A lot. Good CI should fail when something’s wrong.
But can we fix it automatically?
// internal/ci/handler.go
func (h *Handler) CheckPRStatus(prNumber int) ([]CIStatus, error) {
// Run: gh pr checks 42
// Parse output
// Return failed checks
}
func (h *Handler) FetchCILogs(prNumber int) (string, error) {
// Run: gh run view --log
// Return full logs
}
func (h *Handler) ParseCIFailure(log string) *CIFailure {
// Extract error message
// Find context (±5 lines)
// Identify job/step
}
func (h *Handler) AnalyzeFailure(failure CIFailure) (*FixResult, error) {
// LLM analysis
// Generate fix
// Apply and commit
}
New command:
gptcode issue ci 42
What it does:
- Monitors CI status
- Fetches logs from failed checks
- Parses and extracts errors
- Analyzes with LLM
- Generates fix
- Commits and pushes
- CI re-runs automatically
Result: CI failure auto-fix! Error Recovery: 4/5 (80%). MVAA: 17/17 (100%)! 🎆
The Architecture
After 8 phases, here’s what we built:
internal/
├── github/ 577 LOC - Issue, PR, reviews, commits
├── validation/ 608 LOC - Tests, lint, build, coverage, security
├── recovery/ 358 LOC - LLM auto-fix (tests, lint, CI)
├── codebase/ 255 LOC - AI file finder
├── ci/ 237 LOC - CI failure detection + fix
└── ...
cmd/gptcode/issue.go 793 LOC - CLI (6 commands)
Total: 2,788 LOC across 11 modules
Supported languages: Go, TypeScript, Python, Elixir, Ruby
Supported tools: 12 linters, 5 test runners, 3 security scanners
Test coverage: 35 E2E tests passing
The Complete Workflow
Here’s what 100% MVAA looks like in practice:
# 1. Start with an issue
gptcode issue fix 123 --find-files
# → Fetches issue from GitHub
# → Parses requirements
# → Finds relevant files (AI)
# → Creates branch
# → Implements via Symphony
# → Shows next steps
# 2. Validate and commit
gptcode issue commit 123 --auto-fix --check-coverage --security-scan
# → Builds code
# → Runs tests (auto-fixes if fail)
# → Runs linters (auto-fixes if fail)
# → Checks coverage
# → Scans for vulnerabilities
# → Commits with "Closes #123"
# 3. Create PR
gptcode issue push 123
# → Pushes branch
# → Creates PR via gh
# → Links to issue
# → Copies labels
# 4. Handle CI failures
gptcode issue ci 42
# → Waits for CI
# → Fetches logs if failed
# → Analyzes error
# → Generates fix
# → Commits and pushes
# → CI reruns
# 5. Address review comments
gptcode issue review 42
# → Fetches unresolved comments
# → Processes each via Symphony
# → Implements changes
# → Commits and pushes
# 6. Iterate until approved
# Repeat steps 4-5 until PR is ready to merge!
From issue to merge-ready PR, fully autonomous.
The Numbers
Session metrics:
- 21 commits (863775d → 6afd942)
- 8 phases completed
- 2,788 lines of code written
- 11 modules created
- 6 commands implemented
- 35 E2E tests passing
Autonomy progress:
- Started: 14% (9/64 scenarios)
- Ended: 59% (38/64 scenarios)
- Gain: +45 percentage points
MVAA Critical Path:
- Started: 3/17 (18%)
- Ended: 17/17 (100%) 🎆
- Status: MVP COMPLETE
Category breakdown:
- GitHub Integration: 10/10 (100%) ✅
- Test Execution: 3/8 (38%)
- Validation: 5/7 (71%)
- Error Recovery: 4/5 (80%) ✅
- Codebase Understanding: 3/5 (60%)
What It Can Do NOW
GPTCode can autonomously handle:
- ✅ Simple bug fixes (1-3 files)
- ✅ Small feature additions
- ✅ Test coverage improvements
- ✅ Linting/formatting fixes
- ✅ Documentation updates
- ✅ Dependency updates
- ✅ Security patches
- ✅ CI failure recovery
- ✅ Review comment iteration
All without human intervention (except the final merge button).
The Honest Limitations
We hit 100% MVAA. But that’s just the beginning. Here’s what we can’t do yet:
❌ Complex refactoring (12/12 scenarios missing)
- Multi-file architecture changes
- Database migrations
- Breaking API changes
- Backward compatibility
❌ Advanced test generation (5/8 scenarios missing)
- Generate unit tests for new code
- Integration test creation
- Mock generation
❌ Merge conflicts (1/5 scenario missing)
- Still need manual resolution
❌ Documentation updates (3/3 scenarios missing)
- README updates
- CHANGELOG generation
- API docs
Total remaining: 26 scenarios for true 100% autonomy
But 100% MVAA means: For simple bugs and small features, GPTCode can close the loop.
Technical Deep Dives
1. AI-Powered File Discovery
The hardest problem: Given an issue description, which files should we modify?
Traditional approaches:
- Keyword matching (too naive)
- Dependency analysis (too rigid)
- Manual selection (not autonomous)
Our approach: LLM + codebase tools + confidence scoring
Issue: "Add password validation with special characters"
GPTCode analyzes:
1. Reads issue requirements
2. Uses list_files to explore structure
3. Reads candidate files
4. Scores by relevance
5. Returns top 3-5 with confidence
Output:
1. [HIGH 0.9] auth/validator.go - Main validation logic
2. [MED 0.6] auth/validator_test.go - Needs test updates
3. [LOW 0.3] config/security.go - May need config
Why it works: Combines semantic understanding (LLM) with structural exploration (tools).
2. LLM-Powered Auto-Fix
Tests fail. What now?
Simple approach: Show error to user
Better approach: Try to fix it
func (ef *ErrorFixer) FixTestFailures(...) (*FixResult, error) {
failures := extractTestFailures(testResult.Output)
prompt := buildFixPrompt(failures, fullOutput)
fix := llm.Chat(ctx, prompt)
applyFix(fix)
newResult := runTests()
if newResult.Success {
return success()
}
// Retry with refined prompt
}
Success rate: ~70% for simple test failures (linting, missing error checks, type issues)
Failure modes: Complex business logic, ambiguous requirements, environmental issues
3. CI Log Parsing
CI logs are messy. Really messy.
GitHub Actions output:
##[group]Run tests
npm test
PASS src/validator.test.ts
FAIL src/auth.test.ts
✓ validates email (5 ms)
✕ validates password (12 ms)
Expected special character in password
at Validator.validatePassword (validator.ts:45)
##[endgroup]
Our parser:
- Scan for error/fail/fatal keywords
- Extract ±5 lines of context
- Identify job/step markers
- Find file paths and line numbers
- Pass to LLM for analysis
LLM prompt:
CI/CD Failure Analysis:
Job: Tests
Error: Expected special character in password
Log snippet:
[relevant lines]
Analyze:
1. Root cause
2. Files to modify
3. Specific changes needed
Output: Structured fix that we can apply and commit.
Lessons Learned
1. Start Simple, Build Up
We didn’t try to handle complex refactoring first. We started with:
- Fetch an issue ✅
- Create a branch ✅
- Make a simple change ✅
Then added layers:
- Tests ✅
- Linting ✅
- Coverage ✅
- CI ✅
- Reviews ✅
Each layer validated before moving forward.
2. Autonomy Needs Recovery
The difference between 50% and 100%? How you handle failures.
Early versions would stop at the first test failure. Now:
- Test fails → Analyze → Fix → Retry
- Lint fails → Analyze → Fix → Retry
- CI fails → Analyze → Fix → Retry
Recovery is not optional. It’s the core.
3. Confidence Scoring Matters
Not all decisions are equal. File finding taught us this.
When GPTCode says:
- [HIGH 0.9] - Trust it, implement
- [MED 0.6] - Worth trying, verify
- [LOW 0.3] - Fallback, maybe ignore
User can override. But defaults should be smart.
4. Multi-Language is Hard
Supporting 5 languages meant:
- 5 test runners
- 12 linters
- 3 build systems
- 3 security scanners
Each with different:
- Output formats
- Error messages
- Exit codes
- Configuration files
Solution: Abstraction layers + language detection
type TestExecutor interface {
RunTests() (*TestResult, error)
}
type GoTestExecutor struct { ... }
type NpmTestExecutor struct { ... }
// ...
5. GitHub CLI is Gold
We built on gh CLI instead of direct API calls. Best decision.
Why:
- ✅ Handles auth automatically
- ✅ Respects user’s GitHub config
- ✅ Works with ghes too
- ✅ Simpler than REST/GraphQL
Example:
gh pr checks 42 # Check CI status
gh pr view 42 --json reviews # Get reviews
gh pr create --title "..." # Create PR
Much easier than managing tokens, endpoints, pagination, etc.
What’s Next
Short Term: Real-World Testing
We hit 100% MVAA in controlled conditions. Now we need:
- Test on real GitHub repos
- Find edge cases
- Improve error messages
- Handle weird CI setups
Medium Term: GitHub Actions Integration
Make this available as a GitHub Action:
- uses: jadercorrea/gptcode-action@v1
with:
command: 'ci'
auto-fix: true
min-coverage: 80
Benefits:
- Works in any repo
- No installation needed
- Integrates with existing CI
Long Term: The Other 26 Scenarios
To reach true 100% autonomy:
- Complex Code Modifications (12 scenarios)
- Multi-file refactoring
- Database migrations
- Breaking changes
- Test Generation (5 scenarios)
- Auto-generate tests for new code
- Integration tests
- Mocking
- Documentation (3 scenarios)
- Update README
- Generate CHANGELOG
- API docs
- Advanced Git (5 scenarios)
- Merge conflicts
- Rebasing
- Cherry-picking
- Codebase Understanding (2 scenarios)
- Dependency tracing
- Convention extraction
Timeline: 2-3 months focused work
Try It Yourself
Want to experience 100% MVAA?
# Install
go install github.com/jadercorrea/gptcode/cmd/gptcode@latest
# Setup
gptcode setup # Configure LLM provider
# Try it on a real issue
gptcode issue fix 123
gptcode issue commit 123 --auto-fix
gptcode issue push 123
Requirements:
- GitHub CLI (
gh) installed and authenticated - Git repository with remote
- Issues on GitHub
Supported languages: Go, TypeScript, Python, Elixir, Ruby
The Bigger Picture
This isn’t just about GPTCode. It’s about what’s possible with AI + good engineering.
We went from 14% to 59% autonomy in one session because:
- Clear metrics - We knew exactly what to measure
- Incremental approach - Build, test, validate, repeat
- Honest limitations - We know what we can’t do
- Real architecture - Not demos, actual production code
The question isn’t “Can AI replace developers?”
The question is: “How can we make AI a reliable team member?”
100% MVAA is one answer. It means:
- For routine bugs: AI handles it
- For small features: AI implements it
- For CI failures: AI fixes it
- For reviews: AI addresses them
Developers focus on: Architecture, design, complex problems, product decisions.
AI handles: Grunt work, repetitive tasks, known patterns.
That’s the vision. And we’re 59% there.
Final Thoughts
Building autonomous AI is hard.
Not because LLMs aren’t smart enough. They are.
It’s hard because software development is a system:
- Code → Tests → Lint → Build → CI → Review → Merge
Each step has failure modes. Each needs recovery.
We spent 8 phases building that system. The LLM is just one piece.
The real achievement? A reliable, reproducible workflow that goes from issue to PR without breaking.
That’s 100% MVAA.
And it’s just the beginning.
Want to contribute? Check out the codebase →
Have questions? Open an issue →
Follow the journey: More updates coming as we push toward 100% total autonomy.
Special thanks to everyone who’s contributed ideas, bug reports, and feedback. This wouldn’t exist without the community pushing for better, more transparent AI tools.
Next post: Building the GitHub Action - Making 100% MVAA available to every repository.