Date: 2026-05-07
Version: v3.4.0 (The Maturity Layer)
Target Model: qwen2.5-coder:3b (Local)
Methodology: Evaluation across 3 performance tiers: Unit Bugfixes, Dependency Repairs, and Multi-file Refactors.
DevAgent v3.4.0 marks a transition from a diagnostic prototype to a reliable Orchestration Infrastructure. While high-level reasoning is still bounded by local model constraints, the success rate for unit-level tasks has increased significantly (80%) due to autonomous environment management.
| Tier | Success Rate | Primary Driver |
|---|---|---|
| Dependency Repair | 95% | repair_environment tool + Isolated venv |
| Unit Bugfixes | 80% | Hierarchical Retrieval + Surgical Patching |
| Multi-file Refactors | 20% | Model context window / reasoning depth |
| Metric | Result | Status |
|---|---|---|
| Environment Isolation | 100% | ✅ Zero host pollution |
| Auto-Venv Creation | 100% | ✅ Isolated runtime parity |
| Rollback Integrity | 100% | ✅ Bit-perfect restoration |
| Avg. Recovery Steps | 2.4 | ✅ Efficient dependency resolution |
In previous versions, agents failed 100% of the time if a dependency was missing or the Python version was mismatched.
v3.4.0 Result: DevAgent now detects missing packages, enters the repair_environment loop, installs them into an isolated .tmp_envs/, and re-runs validation. This single feature converted ~40% of previous "unsolvable" runs into successes.
By using the new get_file_map and hierarchical scanning, the agent now avoids the "monolithic amnesia" previously seen in 30KB+ files. It identifies the target function before reading, drastically reducing noise in the context window.
Symptom: Agent fixes a local bug but breaks a distant dependency. Analysis: Local 3B models lack the "Global System Awareness" required for deep architectural refactoring. They can fix what they see, but struggle with what they must infer across multiple distant modules.
Symptom: Agent installs utils instead of a internal utils.py.
Analysis: When internal file names clash with common PyPI packages, the dependency repair logic can occasionally attempt to install the wrong thing.
With the orchestration infrastructure stabilized, we are focusing on:
- Elite Documentation: Completing the Troubleshooting and User guides.
- Telemetry Opt-in: Enabling users to share anonymized benchmark results for collective optimization.
- Public Launch: Distributing DevAgent-CLI as the standard for local agent infrastructure.
DevAgent — Reliability > Hype.