Skip to content

Switch batch inference to Feature Store offline store#28

Merged
shlbatra merged 14 commits into
mainfrom
feature/fp-step8-batch-inference
Jun 16, 2026
Merged

Switch batch inference to Feature Store offline store#28
shlbatra merged 14 commits into
mainfrom
feature/fp-step8-batch-inference

Conversation

@shlbatra

Copy link
Copy Markdown
Owner

Summary

  • Read from canonical feature table (iris_features) instead of raw BQ tables (iris / iris_pubsub_data)
  • Filter to source = 'batch_input' server-side in SQL — only scores unlabeled inference data
  • Remove the conditional column rename hack (old lines 36-48) — canonical table has consistent names regardless of data source
  • Use canonical feature column names (sepal_length_cm, etc.) matching the retrained model
  • Use BigQueryClient(project=project_id) for query job permissions (same pattern as Step 6)

What was removed

The old inference component had a brittle if bq_table == "iris_pubsub_data" branch that renamed snake_case Pub/Sub columns to CamelCase. With the feature store, all data flows through ingest.py into iris_features with canonical names — no conditional renaming needed.

Prerequisites

  • ingest.py must be run to populate iris_features with both training and batch_input rows
  • bq_dataloader.py --generate-random N must be run first to create batch_input data
  • Model must be retrained on canonical column names (Step 6 PR)

Test plan

  • Run bq_dataloader.py --generate-random 20 then ingest.py to populate feature table
  • Submit inference pipeline and confirm it reads only batch_input rows
  • Verify predictions are written to iris_predictions table
  • Confirm no column rename errors — canonical names used end-to-end

🤖 Generated with Claude Code

shlbatra and others added 14 commits June 16, 2026 09:03
Read from canonical feature table (iris_features) with server-side
source='batch_input' filter instead of raw BQ tables. Remove the
conditional column rename hack — canonical table has consistent names
regardless of data source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Append -training and -inference to PIPELINE_NAME in each pipeline file
so they show as distinct pipelines in Vertex AI (e.g.
pipeline-iris-staging-training, pipeline-iris-staging-inference).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
list_models returned models in creation order and [0] grabbed the
first (oldest) version — trained with CamelCase columns before the
feature store migration. Sort by create_time descending so [0] is
the most recently registered model.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop CamelCase aliases and ConfigDict — field names match the feature
platform directly. No backward compat needed since the model is
retrained on canonical names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Also fix sepal_width_cm type from integer to number to match the
other feature fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Loads the latest registered model from GCS and checks that
feature_names_in_ matches the canonical names from the feature store.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents stale bytecache or non-editable installs from causing
KFP to serialize old component code into pipeline YAML.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The upper bound <3.11 excluded the local Python 3.11.0, blocking
editable installs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
list_models returns parent model entries, not versions. Use
list_model_versions to get all versions of the model, then sort
by create_time to pick the latest one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
register.py already sets version_aliases=['blessed'] on each uploaded
model. Use get_model(name + '@blessed') to directly fetch the blessed
version instead of listing all versions and sorting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shlbatra

shlbatra commented Jun 16, 2026

Copy link
Copy Markdown
Owner Author

Successful run -

https://console.cloud.google.com/agent-platform/pipelines/locations/us-central1/runs/pipeline-iris-staging-inference-20260616115218?project=deeplearning-sahil

Data loaded (50 rows) -

/gcs/sb-vertex/staging/pipeline_root/57434141298/pipeline-iris-staging-inference-20260616115218/get-model_4592835321065897984/latest_model

@shlbatra shlbatra merged commit f159bac into main Jun 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant