A mini Devin / Cognition-style autonomous software engineering agent built with LangGraph. Give it a GitHub issue number and it will inspect the codebase, plan a fix, write the code, run tests, self-debug on failures, and open a pull request all without human intervention.
$ python main.py octocat/Hello-World 42
[SWE Agent] Starting on octocat/Hello-World#42
[fetch_issue] done — updated: issue
[inspect_repo] done — updated: local_repo_path, repo_tree, relevant_files
[plan_fix] done — updated: fix_plan
[write_code] done — updated: patches
[run_tests] done — updated: test_result
[open_pr] PR opened: https://github.com/octocat/Hello-World/pull/99
[SWE Agent] Complete.
fetch_issue → inspect_repo → plan_fix → write_code → run_tests
│
pass ──┤── open_pr ──→ DONE
│
fail (retry<3)─┤── debug ──→ write_code
│
fail (retry≥3)─┴── ABORT
Each step is a LangGraph node. State flows through a single AgentState TypedDict, making every transition inspectable and resumable.
| Node | What it does |
|---|---|
fetch_issue |
Downloads the GitHub issue title + body via the GitHub API |
inspect_repo |
Clones the repo, walks all source files, uses keyword heuristics to select the ≤20 most relevant files |
plan_fix |
Sends issue + relevant files to Claude and gets back a structured JSON fix plan (summary, files to change, ordered steps) |
write_code |
Sends the fix plan + current file contents to Claude and gets back complete patched file content |
run_tests |
Applies patches to disk and runs pytest --tb=short in a sandboxed subprocess |
debug |
On test failure, sends the failure output + patch to Claude for root-cause analysis; the note is fed back into the next write_code call |
open_pr |
Commits the patch on a new branch, pushes it, and opens a pull request with an LLM-generated title and description |
swe_agent/
├── main.py # CLI entry point
├── config.py # Env-var loader with startup validation
├── conftest.py # pytest env setup
├── requirements.txt
│
├── graph/
│ ├── state.py # AgentState TypedDict — single source of truth
│ ├── nodes.py # All 7 LangGraph node functions
│ ├── edges.py # Conditional routing (pass / fail / retry)
│ └── __init__.py # build_graph() factory
│
├── tools/ # Pure I/O functions — no LLM calls
│ ├── github_client.py # Read issues, create PRs (PyGithub)
│ ├── repo_inspector.py # Clone, walk, AST-parse, select relevant files
│ ├── code_sandbox.py # Subprocess runner with hard timeout
│ ├── test_runner.py # pytest runner + structured failure extractor
│ └── git_ops.py # Branch, apply patches, commit, push (GitPython)
│
├── agents/ # LLM-backed functions
│ ├── planner.py # Issue → FixPlan (structured JSON)
│ ├── coder.py # FixPlan → FilePatch list (full file content)
│ ├── debugger.py # Failures → root-cause note (reflection)
│ └── reviewer.py # Diff → PR title + body
│
└── tests/
├── test_github_client.py
├── test_repo_inspector.py
├── test_sandbox.py
├── test_test_runner.py
├── test_git_ops.py
└── test_graph.py # End-to-end smoke tests with all IO mocked
- Python 3.11+
- A GitHub account with a Personal Access Token (
reposcope) - An Anthropic API key
git clone https://github.com/rakeshguptak/swe-agent.git
cd swe-agent
pip install -r requirements.txtcp .env.example .envEdit .env:
GITHUB_TOKEN=ghp_your_token_here
ANTHROPIC_API_KEY=sk-ant-your_key_herepython main.py owner/repo 42The agent will:
- Fetch issue
#42fromowner/repo - Clone the repo locally to
$WORKSPACE_DIR - Inspect source files relevant to the issue
- Ask Claude to produce a fix plan
- Ask Claude to write the patched code
- Run
pytestagainst the patch - If tests fail, ask Claude to debug and retry (up to
MAX_RETRIEStimes) - Open a pull request with the working patch
All options are set via environment variables (or .env):
| Variable | Default | Description |
|---|---|---|
GITHUB_TOKEN |
required | GitHub Personal Access Token (repo scope) |
ANTHROPIC_API_KEY |
required | Anthropic API key |
LLM_MODEL |
claude-sonnet-4-6 |
Claude model to use for all LLM calls |
WORKSPACE_DIR |
/tmp/swe_agent_workspace |
Where repos are cloned |
MAX_RETRIES |
3 |
Max debug→rewrite cycles before aborting |
SANDBOX_TIMEOUT |
60 |
Seconds before a subprocess is killed |
pytest tests/ -vpytest tests/ --cov=. --cov-report=term-missingTests use mocks for all external calls (GitHub API, git clone, LLM, subprocess). No credentials or network access needed.
Coverage: 81% — all core graph logic, tools, and routing covered.
LangGraph's StateGraph makes the control flow explicit: each node is a pure function (state) → partial_state, and edges encode the routing logic separately. This means:
- The debug loop (
run_tests → debug → write_code) is a first-class graph construct, not an ad-hocwhileloop buried in business logic. - Every intermediate state is inspectable.
- The graph can be resumed from any checkpoint (LangGraph supports persistence out of the box).
Generating complete file content (rather than unified diffs) is more reliable with LLMs:
- Diffs require exact line-number matching, which LLMs get wrong under context pressure.
- Full content is unambiguous to apply — no patch conflict resolution needed.
- The 200 KB file size cap keeps context usage manageable.
MAX_RETRIES=3 is a hard cap on the debug → write_code cycle. Without it, a pathological case (e.g. a test that requires external state the agent can't set up) would spin indefinitely. Three attempts matches the observed empirical sweet spot — most real bugs are fixed in 1–2 passes; if three attempts fail, the issue likely needs human intervention.
The agent uses a keyword heuristic (words extracted from the issue title + body) to select ≤20 files. This is intentionally simple:
- Embedding-based semantic search adds latency and cost.
- For the majority of issues, the relevant files contain the same domain words as the issue description.
- The 20-file cap keeps the planning prompt within the model's effective context window.
- Python repos only for AST symbol extraction (other languages fall back to keyword-only file matching).
- pytest only for the test runner — no support for Jest, Go test, etc. yet.
- No execution sandbox beyond a subprocess timeout — the agent has full filesystem access inside the cloned repo.
- Single-repo — does not handle issues that require changes across multiple repositories.
- Multi-language test runner (Jest,
go test,cargo test) - Embedding-based file retrieval for larger codebases
- LangGraph checkpoint persistence (resume interrupted runs)
- Docker-based sandbox for true code isolation
- GitHub Actions integration (trigger on issue label)
- Streaming output with live patch preview
MIT