Hello DOX team!
We wanted to share some exciting empirical data backing the design philosophy of the DOX (AGENTS.md) framework.
We developed claude-code-optimizer to ingest and analyze developer sessions. During a deep audit of 9,911 real-world Claude Code session logs, we isolated the metrics of sessions utilizing custom agent rulesets (AGENTS.md / DOX) against those without.
Here is what the data showed:
| Metric |
With DOX (AGENTS.md) |
Without DOX |
Difference / Impact |
| Average Human Turns / Session |
4.87 |
1.66 |
+193% turn capacity |
| Average Prompt Input Tokens |
73,097.8 |
16,537.6 |
Deeper code context navigation |
| Average Tool Errors / Session |
0.68 |
0.37 |
Contextually aligned tool executions |
| Average API Cost per Session |
$10.48 |
$2.09 |
Deeper execution per cached session |
Key Takeaway
Sessions using DOX are almost 3x longer on average (measured in interactive human turns) and navigate significantly larger contexts. This suggests that structured instruction hierarchies successfully align the agent, allowing developers to execute longer, more complex coding tasks without the agent getting lost or drifting off-task.
Thank you for creating DOX! We hope this quantitative validation is helpful to the community.
Hello DOX team!
We wanted to share some exciting empirical data backing the design philosophy of the DOX (
AGENTS.md) framework.We developed claude-code-optimizer to ingest and analyze developer sessions. During a deep audit of 9,911 real-world Claude Code session logs, we isolated the metrics of sessions utilizing custom agent rulesets (
AGENTS.md/ DOX) against those without.Here is what the data showed:
AGENTS.md)Key Takeaway
Sessions using DOX are almost 3x longer on average (measured in interactive human turns) and navigate significantly larger contexts. This suggests that structured instruction hierarchies successfully align the agent, allowing developers to execute longer, more complex coding tasks without the agent getting lost or drifting off-task.
Thank you for creating DOX! We hope this quantitative validation is helpful to the community.