Skip to content

Bug 2026725: Use load jobs instead of stream insert so if script fails previous rows can be deleted#16

Merged
dklawren merged 2 commits intomainfrom
snapshot-delete-fix
Mar 27, 2026
Merged

Bug 2026725: Use load jobs instead of stream insert so if script fails previous rows can be deleted#16
dklawren merged 2 commits intomainfrom
snapshot-delete-fix

Conversation

@dklawren
Copy link
Copy Markdown
Contributor

When trying to delete rows right after inserting them, BigQuery returns a 400 Bad Request error.

The root cause is insert_rows_json(), which uses the Streaming Insert API. Rows inserted this way go into a streaming buffer and are immutable for up to 90 minutes.

This pull request fixes the issue by replacing the use of Strreaming API with a load job, which writes directly to BigQuery storage. Rows inserted via load jobs are immediately mutable. This means, if the script fails, running again the DELETE in delete_existing_snapshot() will work right away.

The trade-off: load jobs have slightly higher latency to start (~a few seconds overhead per job) but that's negligible for a batch ETL use case. The streaming API was designed for real-time, high-frequency appends where you need sub-second availability so not necessary for a batch ETL pipeline like this.

@dklawren dklawren requested review from cgsheeh, Copilot and shtrom and removed request for cgsheeh and Copilot March 26, 2026 17:33
Copy link
Copy Markdown
Member

@shtrom shtrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ w/ nit

Copilot AI review requested due to automatic review settings March 27, 2026 02:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the BigQuery load path to avoid the streaming buffer (which blocks immediate DELETE/UPDATE) by using BigQuery load jobs for production runs, while retaining streaming inserts for the BigQuery emulator (which doesn’t support load jobs).

Changes:

  • Added _insert_rows_to_table() to centralize writing rows via either streaming inserts (emulator) or load jobs (prod).
  • Updated load_data() to accept use_streaming_insert and route writes through the new helper.
  • Wired _main() to enable streaming inserts automatically when BIGQUERY_EMULATOR_HOST is set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +681 to +684
if load_job.errors:
error_msg = f"BigQuery load errors for table {table}: {load_job.errors}"
logger.error(error_msg)
raise Exception(error_msg)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the streaming-insert branch: raising a generic Exception here bypasses main()'s RuntimeError handler and will likely result in an uncaught exception traceback. Prefer raising RuntimeError (or another exception type that main() handles) for consistent failure behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +667 to +669
error_msg = f"BigQuery insert errors for table {table}: {errors}"
logger.error(error_msg)
raise Exception(error_msg)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These branches raise a generic Exception, but main() only catches RuntimeError. If an insert/load fails, this will escape the top-level handler and produce an uncaught exception traceback instead of returning a clean exit code via main(). Consider raising RuntimeError (or a dedicated custom exception) here so failures are handled consistently.

Copilot uses AI. Check for mistakes.
@dklawren dklawren merged commit 8081a57 into main Mar 27, 2026
8 checks passed
@dklawren dklawren deleted the snapshot-delete-fix branch March 27, 2026 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants