fix: cache ensured tables in BigQuery streaming v2 to stop 403 storm#68
Merged
Merged
Conversation
load_to_bigquery_via_streaming unconditionally called create_table on every flush, so each bizon.stream.iteration issued one POST /tables per destination_id. With ~15 CDC tables routed through one destination, this blew past BigQuery's per-table metadata quota (5 ops / 10s) and returned HTTP 403 rateLimitExceeded. The google-cloud-bigquery SDK silently retried those 403s via DEFAULT_RETRY, so data kept flowing but every iteration burned wall-clock in backoff and flooded Datadog with red spans. Cache (temp_table_id, schema_fingerprint) pairs on the destination instance and short-circuit the create_table/Conflict/schema-reconcile block when we've already ensured a table with that exact schema this process lifetime. Drop matching cache entries in finalize() after delete_table in FULL_REFRESH and INCREMENTAL modes so a same-process re-run recreates the temp table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BigQueryStreamingV2Destination.load_to_bigquery_via_streamingcalledself.bq_client.create_table(...)on every flush, only catchingConflict(409). In the helpdesk CDC pipeline, onebizon.stream.iterationroutes records to ~15destination_ids, so every iteration issued ~15 sequentialPOST /bigquery/v2/projects/{proj}/datasets/cdc/tablesrequests — blowing past BigQuery's per-table metadata quota (5 ops / 10s) and returning 403rateLimitExceededon every one.google-cloud-bigquery'sDEFAULT_RETRYsilently retriesrateLimitExceededwith exponential backoff up to 10 minutes — so data flowed, but every iteration burned wall-clock in retries and every Datadog trace forbizon-streamingwas red.(temp_table_id, schema_fingerprint)pairs on the destination instance. First call ensures (create or schema-reconcile), subsequent calls with the same schema skip the whole block. A genuine schema change (new column inrecord_schemas) produces a new fingerprint → runs the reconcile path exactly once.finalize()drops matching cache entries afterdelete_tableinFULL_REFRESHandINCREMENTALso a same-process re-run recreates the temp table.No config changes, no migration, no new dependencies. Existing
Conflicthandling for schema drift is preserved. The SDK's built-inrateLimitExceededretry remains as defense-in-depth.Files
bizon/connectors/destinations/bigquery_streaming_v2/src/destination.pyTest plan
uv run ruff format+uv run ruff checkcleanuv run pytest tests/connectors/destinations/bigquery_streaming_v2/test_bigquery_streaming_v2.py— 4/4 passservice:bizon-streaming resource_name:"POST /bigquery/v2/projects/gorgias-pipeline-production/datasets/cdc/tables" @http.status_code:403drops from ~15/iteration to 0 after warmupbizon.stream.iterationp50/p95 latency drops (SDK backoff no longer burning wall-clock on 403 retries)record_schemas) still triggers theConflict→update_tablepath🤖 Generated with Claude Code