Skip to content

fix: cache ensured tables in BigQuery streaming v2 to stop 403 storm#68

Merged
anaselmhamdi merged 1 commit into
mainfrom
fix/bq-streaming-v2-create-table-cache
Apr 8, 2026
Merged

fix: cache ensured tables in BigQuery streaming v2 to stop 403 storm#68
anaselmhamdi merged 1 commit into
mainfrom
fix/bq-streaming-v2-create-table-cache

Conversation

@anaselmhamdi

Copy link
Copy Markdown
Collaborator

Summary

  • BigQueryStreamingV2Destination.load_to_bigquery_via_streaming called self.bq_client.create_table(...) on every flush, only catching Conflict (409). In the helpdesk CDC pipeline, one bizon.stream.iteration routes records to ~15 destination_ids, so every iteration issued ~15 sequential POST /bigquery/v2/projects/{proj}/datasets/cdc/tables requests — blowing past BigQuery's per-table metadata quota (5 ops / 10s) and returning 403 rateLimitExceeded on every one.
  • The pipeline kept producing data because google-cloud-bigquery's DEFAULT_RETRY silently retries rateLimitExceeded with exponential backoff up to 10 minutes — so data flowed, but every iteration burned wall-clock in retries and every Datadog trace for bizon-streaming was red.
  • Fix: cache (temp_table_id, schema_fingerprint) pairs on the destination instance. First call ensures (create or schema-reconcile), subsequent calls with the same schema skip the whole block. A genuine schema change (new column in record_schemas) produces a new fingerprint → runs the reconcile path exactly once.
  • finalize() drops matching cache entries after delete_table in FULL_REFRESH and INCREMENTAL so a same-process re-run recreates the temp table.

No config changes, no migration, no new dependencies. Existing Conflict handling for schema drift is preserved. The SDK's built-in rateLimitExceeded retry remains as defense-in-depth.

Files

  • bizon/connectors/destinations/bigquery_streaming_v2/src/destination.py

Test plan

  • uv run ruff format + uv run ruff check clean
  • uv run pytest tests/connectors/destinations/bigquery_streaming_v2/test_bigquery_streaming_v2.py — 4/4 pass
  • Deploy to staging helpdesk CDC pipeline; confirm via Datadog: service:bizon-streaming resource_name:"POST /bigquery/v2/projects/gorgias-pipeline-production/datasets/cdc/tables" @http.status_code:403 drops from ~15/iteration to 0 after warmup
  • Confirm bizon.stream.iteration p50/p95 latency drops (SDK backoff no longer burning wall-clock on 403 retries)
  • Sanity-check that schema drift (adding a column to record_schemas) still triggers the Conflictupdate_table path

🤖 Generated with Claude Code

load_to_bigquery_via_streaming unconditionally called create_table on
every flush, so each bizon.stream.iteration issued one POST /tables per
destination_id. With ~15 CDC tables routed through one destination, this
blew past BigQuery's per-table metadata quota (5 ops / 10s) and returned
HTTP 403 rateLimitExceeded. The google-cloud-bigquery SDK silently
retried those 403s via DEFAULT_RETRY, so data kept flowing but every
iteration burned wall-clock in backoff and flooded Datadog with red spans.

Cache (temp_table_id, schema_fingerprint) pairs on the destination
instance and short-circuit the create_table/Conflict/schema-reconcile
block when we've already ensured a table with that exact schema this
process lifetime. Drop matching cache entries in finalize() after
delete_table in FULL_REFRESH and INCREMENTAL modes so a same-process
re-run recreates the temp table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anaselmhamdi anaselmhamdi requested a review from aballiet April 8, 2026 14:23
@anaselmhamdi anaselmhamdi merged commit aef3739 into main Apr 8, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant