feat: BigQuery loading and GCS integration for Python pipeline by Copilot · Pull Request #3 · CorrelAid/a4d

Copilot · 2026-02-20T12:37:53Z

Implement GCS storage client (src/a4d/gcp/storage.py)
Implement BigQuery loading client (src/a4d/gcp/bigquery.py)
Update src/a4d/gcp/__init__.py with re-exports and __all__
Add upload-tables, download-trackers, upload-output CLI commands
Add 18 unit tests for GCP modules (all mocked, all passing)
Update .env.example with GCP authentication documentation
Fix all 109 ruff linter warnings across the entire codebase (0 remaining)
All 239 tests passing

Original prompt

wir sind in der Mitte einer Migration von R zu Python. eigentlich testen wir gerade noch aber das ist gerade ohne Daten nicht möglich. daher würde ich gerne die nächsten Schritte planen und die Pipeline fertigstellen. wir müssten bereits die Tabellen erstellt haben, das nächste Ziel wäre also die Daten nach Big Query zu laden und die Pipeline dann auch Mal in Google cloud zu testen, was wäre dafür notwendig bzw welche Dienste müssten aufgesetzt werden?

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

pmayd · 2026-02-20T12:43:20Z

We also need to check and allow running the pipeline locally as well as within GCP, so locally we need to somehow authenticate against GCP I guess and when running inside a container on GCP I guess the tools like gcloud are already available but this needs to be considered when writing the client code for Google storage and big Query

- Add gcp/bigquery.py with load_table() and load_pipeline_tables() matching R pipeline's ingest_data() with clustering fields - Add gcp/storage.py with download_tracker_files() and upload_output() replacing R pipeline's gsutil CLI calls - Add CLI commands: upload-tables, download-trackers, upload-output - Add 18 unit tests for GCP modules (all mocked) - Update .env.example with GCP auth documentation Co-authored-by: pmayd <9614291+pmayd@users.noreply.github.com>

- Add __all__ to gcp/__init__.py for explicit public API - Add comment explaining table_logs.parquet naming convention - Extract _get_job_config() helper in tests to reduce duplication Co-authored-by: pmayd <9614291+pmayd@users.noreply.github.com>

pmayd · 2026-02-20T12:55:18Z

I allowed astral.sh for agents, you can try again @copilot

Copilot

Pull request overview

This PR implements GCP integration for the Python pipeline migration, enabling BigQuery table loading and Google Cloud Storage operations that were previously handled by R scripts using CLI tools (bq and gsutil).

Changes:

Added a4d.gcp.bigquery module with table loading functions and clustering configurations matching the R pipeline
Added a4d.gcp.storage module for downloading tracker files and uploading pipeline output to GCS
Added three new CLI commands: upload-tables, download-trackers, and upload-output

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`src/a4d/gcp/bigquery.py`	Implements BigQuery table loading with clustering configs for patient_data_monthly, patient_data_annual, patient_data_static, and logs tables
`src/a4d/gcp/storage.py`	Implements GCS operations for downloading tracker files and uploading pipeline output
`src/a4d/gcp/__init__.py`	Exports public API for the GCP module
`src/a4d/cli.py`	Adds three new commands for BigQuery upload and GCS interactions
`.env.example`	Documents GCP authentication options using Application Default Credentials
`tests/test_gcp/test_bigquery.py`	18 unit tests covering table loading, clustering configs, and error handling
`tests/test_gcp/test_storage.py`	Tests for GCS download/upload with mocked clients
`tests/test_gcp/__init__.py`	Empty init file for test package

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pmayd · 2026-02-20T15:11:25Z

Fix the ci errors, tests will fail for now but fix the ruff linter warnings

pmayd · 2026-02-20T15:12:30Z

@copilot Fix the ci errors, tests will fail for now but fix the ruff linter warnings

- E501: Break long lines (comments, f-strings, docstrings) to ≤100 chars - E501: Add noqa: E501 for file path literals in scripts/ - N806: Rename EXCEL_ERRORS -> excel_errors, FLOAT_REL_TOL -> float_rel_tol, FLOAT_ABS_TOL -> float_abs_tol (variables inside functions) - B904: Add 'from e' to raise statements in except blocks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot · 2026-02-20T15:22:33Z

@copilot Fix the ci errors, tests will fail for now but fix the ruff linter warnings

Fixed all 109 ruff linting errors in commit ac2fa23. This includes:

46 E501 (line too long) - wrapped comments, f-strings, and docstrings; added # noqa: E501 for file path literals in scripts/
Auto-fixed UP035, F841, I001, PT006, B007, PT018 via ruff --fix
3 N806 (variable naming) - renamed uppercase variables to lowercase inside functions
2 B904 (raise from) - added from e to raise statements in except blocks

All 239 tests pass, 0 ruff errors remain.

We also need to check and allow running the pipeline locally as well as within GCP, so locally we need to somehow authenticate against GCP I guess and when running inside a container on GCP I guess th...

The GCP client code (gcp/bigquery.py and gcp/storage.py) already uses Application Default Credentials (ADC), which handles both scenarios:

Locally: Authenticate with gcloud auth application-default login - no service account key file needed
In GCP (Cloud Run/GCE): Authentication is automatic via the metadata server - no extra setup
CI/CD or VMs: Set GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json environment variable

The .env.example documents these options. No gcloud CLI is needed at runtime - the Python client libraries (google-cloud-bigquery, google-cloud-storage) handle auth natively through ADC.

Initial plan

a5b9948

Copilot AI assigned Copilot and pmayd Feb 20, 2026

Copilot started work on behalf of pmayd February 20, 2026 12:38 View session

Copilot AI and others added 2 commits February 20, 2026 12:45

refactor: address code review feedback

b2ccea1

- Add __all__ to gcp/__init__.py for explicit public API - Add comment explaining table_logs.parquet naming convention - Extract _get_job_config() helper in tests to reduce duplication Co-authored-by: pmayd <9614291+pmayd@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Plan next steps for data migration to BigQuery~~ feat: BigQuery loading and GCS integration for Python pipeline Feb 20, 2026

Copilot AI requested a review from pmayd February 20, 2026 12:50

Copilot finished work on behalf of pmayd February 20, 2026 12:50

pmayd approved these changes Feb 20, 2026

View reviewed changes

pmayd requested a review from Copilot February 20, 2026 14:59

Copilot started reviewing on behalf of pmayd February 20, 2026 15:00 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

pmayd marked this pull request as ready for review February 20, 2026 15:10

Copilot started work on behalf of pmayd February 20, 2026 15:12 View session

Copilot AI requested a review from pmayd February 20, 2026 15:23

Copilot finished work on behalf of pmayd February 20, 2026 15:23

pmayd approved these changes Feb 24, 2026

View reviewed changes

pmayd merged commit 7e06dbc into migration Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: BigQuery loading and GCS integration for Python pipeline#3

feat: BigQuery loading and GCS integration for Python pipeline#3
pmayd merged 4 commits intomigrationfrom
copilot/plan-data-migration-steps

Copilot AI commented Feb 20, 2026 •

edited

Loading

Uh oh!

pmayd commented Feb 20, 2026

Uh oh!

pmayd commented Feb 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

pmayd commented Feb 20, 2026

Uh oh!

pmayd commented Feb 20, 2026

Uh oh!

Copilot AI commented Feb 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmayd commented Feb 20, 2026

Uh oh!

pmayd commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

pmayd commented Feb 20, 2026

Uh oh!

pmayd commented Feb 20, 2026

Uh oh!

Copilot AI commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 20, 2026 •

edited

Loading

pmayd commented Feb 20, 2026 •

edited

Loading

Copilot AI commented Feb 20, 2026 •

edited

Loading