fix: add retry limits and improve builder prompt for data engineering#106
fix: add retry limits and improve builder prompt for data engineering#106anandgupta42 wants to merge 2 commits intomainfrom
Conversation
… tasks - Add `RETRY_MAX_ATTEMPTS` (10) and `RETRY_MAX_TOTAL_TIME_MS` (120s) constants to `retry.ts` to prevent infinite retry loops on persistent API failures - Enforce retry limits in `processor.ts` — break out of retry loop when max attempts or total retry time exceeded, publish error and set session idle - Expand `builder.txt` with 5 new sections for better SQL/dbt output quality: - Column and Schema Fidelity (order, count, names, data types must match schema.yml) - JOIN Type Selection (INNER vs LEFT JOIN guidance with row count verification) - Temporal Determinism (avoid `current_date()`/`now()` on fixed datasets) - Fivetran & dbt Package Metadata Columns (`_fivetran_synced`, `source_relation`) - Completeness Checks Before dbt Run (verify all models, refs, intermediates) - Enhanced Self-Review with row count sanity checks and edge case validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
| } | ||
| retryErrorType = e?.name ?? "UnknownError" | ||
| attempt++ | ||
|
|
||
| // Give up after max attempts or total retry time exceeded | ||
| const totalRetryTime = retryStartTime ? Date.now() - retryStartTime : 0 | ||
| if ( | ||
| attempt > SessionRetry.RETRY_MAX_ATTEMPTS || | ||
| totalRetryTime > SessionRetry.RETRY_MAX_TOTAL_TIME_MS |
There was a problem hiding this comment.
Bug: The attempt counter is not reset after a successful recovery, causing subsequent retries to use an old count and fail prematurely.
Severity: MEDIUM
Suggested Fix
Reset the attempt counter to 0 after a successful recovery, in the same place where retryErrorType and retryStartTime are reset to null. This ensures that any new error sequence starts with a fresh retry state.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: packages/opencode/src/session/processor.ts#L496-L504
Potential issue: Within a single `process()` call, if a retryable error occurs and is
successfully handled, the `retryStartTime` and `retryErrorType` are reset, but the
`attempt` counter is not. If a second, unrelated error occurs later in the same process,
the retry logic will fail to set a new `retryStartTime` because `attempt` is no longer
0. This makes the time-based retry limit ineffective and can cause the attempt-based
limit to be reached prematurely, leading to process failure instead of recovery.
Did we get this right? 👍 / 👎 to inform future reviews.
Add benchmark runner, evaluator, and reporting scripts for the Spider 2.0-DBT benchmark suite. Auto-downloads Spider2 repo and DuckDB databases on first run if not available. - `run_benchmark.py`: Parallel task runner with auto-retry, resume support - `evaluate_results.py`: Official `duckdb_match` evaluation against gold DBs - `setup_spider2.py`: One-time setup (sparse clone + Google Drive downloads) - `report.py`: Leaderboard comparison and domain breakdown reports - `config.py`: Centralized paths, timeouts, and leaderboard data - `prompt_template.py`: Task prompt builder for the agent - `.gitignore`: Excludes dataset dirs (`spider2_repo/`, `workspace/`, `results/`) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| const totalRetryTime = retryStartTime ? Date.now() - retryStartTime : 0 | ||
| if ( | ||
| attempt > SessionRetry.RETRY_MAX_ATTEMPTS || | ||
| totalRetryTime > SessionRetry.RETRY_MAX_TOTAL_TIME_MS |
There was a problem hiding this comment.
Bug: The time-based retry limit (RETRY_MAX_TOTAL_TIME_MS) is ineffective for any errors that occur after a successful recovery because retryStartTime is not correctly re-initialized.
Severity: MEDIUM
Suggested Fix
Reset the attempt counter to 0 after a successful step recovery, similar to how retryStartTime and retryErrorType are reset. This will ensure that for a new, independent error, the retry logic starts fresh.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: packages/opencode/src/session/processor.ts#L501-L504
Potential issue: The time-based retry limit logic contains a flaw that disables it after
a successful recovery. When an error occurs, `attempt` is incremented and
`retryStartTime` is set. If this step recovers, `retryStartTime` is reset to `null`, but
`attempt` is not. On a subsequent error in the same session, the check `if (attempt ===
0)` fails, so `retryStartTime` is never re-initialized. This causes `totalRetryTime` to
always be calculated as 0, rendering the `RETRY_MAX_TOTAL_TIME_MS` check ineffective and
relying solely on the attempt limit.
|
This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window. Feel free to open a new pull request that follows our guidelines. |
Summary
RETRY_MAX_ATTEMPTS(10) andRETRY_MAX_TOTAL_TIME_MS(120s) constants to prevent infinite retry loops on persistent API failuresprocessor.ts— break out of retry loop when max attempts or total time exceeded, publish error event and set session idlebuilder.txtwith 5 new prompt sections for better SQL/dbt output quality:current_date()/now()on fixed datasets)Context
The infinite retry loop was discovered during Spider 2.0-DBT benchmark runs where API rate limits caused sessions to hang indefinitely. The builder prompt improvements address common failure patterns observed in data engineering tasks (wrong JOIN types, temporal non-determinism, missing metadata columns, schema mismatches).
Follow-up items (from multi-model code review)
Telemetry.track({ type: "retry_exhausted" })event for observabilityretry-afterserver headers against remaining time budgetTest plan
🤖 Generated with Claude Code