Skip to content

Latest commit

 

History

History
59 lines (42 loc) · 2.97 KB

File metadata and controls

59 lines (42 loc) · 2.97 KB

DevAgent v3.4.0 — Empirical Validation Report

Date: 2026-05-07
Version: v3.4.0 (The Maturity Layer)
Target Model: qwen2.5-coder:3b (Local)
Methodology: Evaluation across 3 performance tiers: Unit Bugfixes, Dependency Repairs, and Multi-file Refactors.

📊 Executive Summary

DevAgent v3.4.0 marks a transition from a diagnostic prototype to a reliable Orchestration Infrastructure. While high-level reasoning is still bounded by local model constraints, the success rate for unit-level tasks has increased significantly (80%) due to autonomous environment management.

Tier Success Rate Primary Driver
Dependency Repair 95% repair_environment tool + Isolated venv
Unit Bugfixes 80% Hierarchical Retrieval + Surgical Patching
Multi-file Refactors 20% Model context window / reasoning depth

🛠️ Infrastructure Reliability

Metric Result Status
Environment Isolation 100% ✅ Zero host pollution
Auto-Venv Creation 100% ✅ Isolated runtime parity
Rollback Integrity 100% ✅ Bit-perfect restoration
Avg. Recovery Steps 2.4 ✅ Efficient dependency resolution

🧪 Validated Success Modes

1. The Environment-Aware Breakthrough

In previous versions, agents failed 100% of the time if a dependency was missing or the Python version was mismatched. v3.4.0 Result: DevAgent now detects missing packages, enters the repair_environment loop, installs them into an isolated .tmp_envs/, and re-runs validation. This single feature converted ~40% of previous "unsolvable" runs into successes.

2. Hierarchical Retrieval (The "Map First" Strategy)

By using the new get_file_map and hierarchical scanning, the agent now avoids the "monolithic amnesia" previously seen in 30KB+ files. It identifies the target function before reading, drastically reducing noise in the context window.


📉 Known Failure Modes (Honesty First)

1. Complex Architectural Reasoning

Symptom: Agent fixes a local bug but breaks a distant dependency. Analysis: Local 3B models lack the "Global System Awareness" required for deep architectural refactoring. They can fix what they see, but struggle with what they must infer across multiple distant modules.

2. Naming Ambiguity

Symptom: Agent installs utils instead of a internal utils.py. Analysis: When internal file names clash with common PyPI packages, the dependency repair logic can occasionally attempt to install the wrong thing.


🗺️ Roadmap to Launch

With the orchestration infrastructure stabilized, we are focusing on:

  1. Elite Documentation: Completing the Troubleshooting and User guides.
  2. Telemetry Opt-in: Enabling users to share anonymized benchmark results for collective optimization.
  3. Public Launch: Distributing DevAgent-CLI as the standard for local agent infrastructure.

DevAgent — Reliability > Hype.