Commit 817456a
committed
[feat][eval] Add Codex fallback and JSONL-based cost tracking for ACW modes
session.py: Add fallback_backend parameter to run_prompt() -- when primary
backend (e.g. Codex) exhausts retries, automatically retry with fallback
backend (e.g. Claude Opus), dropping provider-specific extra_flags.
pipeline.py: Restore consensus default to Codex (gpt-5.2-codex) with
Claude Opus as fallback_backend. Add Codex flags and fallback to
run_consensus_stage() for parity. Prevents empty-consensus failures
when Codex API key is missing or expired.
eval_harness.py: Add _snapshot_jsonl_usage() and _diff_jsonl_usage()
helpers that read ~/.claude/projects/**/*.jsonl session files to compute
token usage deltas. run_full_impl() now snapshots before/after the
pipeline to track cost for both full and impl modes, which previously
reported $0.00 because ACW does not return token data.1 parent 2bd2162 commit 817456a
3 files changed
Lines changed: 125 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
106 | 179 | | |
107 | 180 | | |
108 | 181 | | |
| |||
587 | 660 | | |
588 | 661 | | |
589 | 662 | | |
590 | | - | |
591 | | - | |
| 663 | + | |
| 664 | + | |
592 | 665 | | |
593 | | - | |
| 666 | + | |
594 | 667 | | |
595 | 668 | | |
596 | 669 | | |
| |||
626 | 699 | | |
627 | 700 | | |
628 | 701 | | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
629 | 710 | | |
630 | 711 | | |
631 | 712 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
| 168 | + | |
168 | 169 | | |
169 | 170 | | |
170 | 171 | | |
| |||
201 | 202 | | |
202 | 203 | | |
203 | 204 | | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
204 | 235 | | |
205 | 236 | | |
206 | 237 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
268 | 268 | | |
269 | 269 | | |
270 | 270 | | |
| 271 | + | |
271 | 272 | | |
272 | 273 | | |
273 | 274 | | |
| |||
315 | 316 | | |
316 | 317 | | |
317 | 318 | | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
318 | 325 | | |
319 | 326 | | |
320 | 327 | | |
321 | 328 | | |
322 | 329 | | |
323 | 330 | | |
| 331 | + | |
| 332 | + | |
324 | 333 | | |
325 | 334 | | |
326 | 335 | | |
| |||
0 commit comments