Skip to content

fix: add retry limits and improve builder prompt for data engineering#106

Closed
anandgupta42 wants to merge 2 commits intomainfrom
feat/builder-prompt-retry-improvements
Closed

fix: add retry limits and improve builder prompt for data engineering#106
anandgupta42 wants to merge 2 commits intomainfrom
feat/builder-prompt-retry-improvements

Conversation

@anandgupta42
Copy link
Contributor

Summary

  • Add RETRY_MAX_ATTEMPTS (10) and RETRY_MAX_TOTAL_TIME_MS (120s) constants to prevent infinite retry loops on persistent API failures
  • Enforce retry limits in processor.ts — break out of retry loop when max attempts or total time exceeded, publish error event and set session idle
  • Expand builder.txt with 5 new prompt sections for better SQL/dbt output quality:
    • Column and Schema Fidelity (order, count, names, data types must match schema.yml)
    • JOIN Type Selection (INNER vs LEFT JOIN guidance with row count verification)
    • Temporal Determinism (avoid current_date()/now() on fixed datasets)
    • Fivetran & dbt Package Metadata Columns
    • Completeness Checks Before dbt Run
    • Enhanced Self-Review with row count sanity checks and edge case validation

Context

The infinite retry loop was discovered during Spider 2.0-DBT benchmark runs where API rate limits caused sessions to hang indefinitely. The builder prompt improvements address common failure patterns observed in data engineering tasks (wrong JOIN types, temporal non-determinism, missing metadata columns, schema mismatches).

Follow-up items (from multi-model code review)

  • Add test coverage for retry limit enforcement (max attempts + max total time)
  • Add Telemetry.track({ type: "retry_exhausted" }) event for observability
  • Consider clamping retry-after server headers against remaining time budget

Test plan

  • TypeScript typecheck passes (all 13 packages)
  • Verify retry loop terminates after 10 retries in manual testing
  • Verify retry loop terminates after 120s total retry time
  • Verify builder agent follows new prompt guidance for dbt tasks

🤖 Generated with Claude Code

… tasks

- Add `RETRY_MAX_ATTEMPTS` (10) and `RETRY_MAX_TOTAL_TIME_MS` (120s) constants
  to `retry.ts` to prevent infinite retry loops on persistent API failures
- Enforce retry limits in `processor.ts` — break out of retry loop when max
  attempts or total retry time exceeded, publish error and set session idle
- Expand `builder.txt` with 5 new sections for better SQL/dbt output quality:
  - Column and Schema Fidelity (order, count, names, data types must match schema.yml)
  - JOIN Type Selection (INNER vs LEFT JOIN guidance with row count verification)
  - Temporal Determinism (avoid `current_date()`/`now()` on fixed datasets)
  - Fivetran & dbt Package Metadata Columns (`_fivetran_synced`, `source_relation`)
  - Completeness Checks Before dbt Run (verify all models, refs, intermediates)
  - Enhanced Self-Review with row count sanity checks and edge case validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions
Copy link

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

Comment on lines 496 to +504
}
retryErrorType = e?.name ?? "UnknownError"
attempt++

// Give up after max attempts or total retry time exceeded
const totalRetryTime = retryStartTime ? Date.now() - retryStartTime : 0
if (
attempt > SessionRetry.RETRY_MAX_ATTEMPTS ||
totalRetryTime > SessionRetry.RETRY_MAX_TOTAL_TIME_MS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The attempt counter is not reset after a successful recovery, causing subsequent retries to use an old count and fail prematurely.
Severity: MEDIUM

Suggested Fix

Reset the attempt counter to 0 after a successful recovery, in the same place where retryErrorType and retryStartTime are reset to null. This ensures that any new error sequence starts with a fresh retry state.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: packages/opencode/src/session/processor.ts#L496-L504

Potential issue: Within a single `process()` call, if a retryable error occurs and is
successfully handled, the `retryStartTime` and `retryErrorType` are reset, but the
`attempt` counter is not. If a second, unrelated error occurs later in the same process,
the retry logic will fail to set a new `retryStartTime` because `attempt` is no longer
0. This makes the time-based retry limit ineffective and can cause the attempt-based
limit to be reached prematurely, leading to process failure instead of recovery.

Did we get this right? 👍 / 👎 to inform future reviews.

Add benchmark runner, evaluator, and reporting scripts for the
Spider 2.0-DBT benchmark suite. Auto-downloads Spider2 repo and
DuckDB databases on first run if not available.

- `run_benchmark.py`: Parallel task runner with auto-retry, resume support
- `evaluate_results.py`: Official `duckdb_match` evaluation against gold DBs
- `setup_spider2.py`: One-time setup (sparse clone + Google Drive downloads)
- `report.py`: Leaderboard comparison and domain breakdown reports
- `config.py`: Centralized paths, timeouts, and leaderboard data
- `prompt_template.py`: Task prompt builder for the agent
- `.gitignore`: Excludes dataset dirs (`spider2_repo/`, `workspace/`, `results/`)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +501 to +504
const totalRetryTime = retryStartTime ? Date.now() - retryStartTime : 0
if (
attempt > SessionRetry.RETRY_MAX_ATTEMPTS ||
totalRetryTime > SessionRetry.RETRY_MAX_TOTAL_TIME_MS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The time-based retry limit (RETRY_MAX_TOTAL_TIME_MS) is ineffective for any errors that occur after a successful recovery because retryStartTime is not correctly re-initialized.
Severity: MEDIUM

Suggested Fix

Reset the attempt counter to 0 after a successful step recovery, similar to how retryStartTime and retryErrorType are reset. This will ensure that for a new, independent error, the retry logic starts fresh.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: packages/opencode/src/session/processor.ts#L501-L504

Potential issue: The time-based retry limit logic contains a flaw that disables it after
a successful recovery. When an error occurs, `attempt` is incremented and
`retryStartTime` is set. If this step recovers, `retryStartTime` is reset to `null`, but
`attempt` is not. On a subsequent error in the same session, the check `if (attempt ===
0)` fails, so `retryStartTime` is never re-initialized. This causes `totalRetryTime` to
always be calculated as 0, rendering the `RETRY_MAX_TOTAL_TIME_MS` check ineffective and
relying solely on the attempt limit.

@github-actions
Copy link

This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window.

Feel free to open a new pull request that follows our guidelines.

@github-actions github-actions bot closed this Mar 11, 2026
@anandgupta42 anandgupta42 deleted the feat/builder-prompt-retry-improvements branch March 17, 2026 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant