Finalize patient pipeline for end-to-end testing#5
Conversation
…estion skip Co-authored-by: pmayd <9614291+pmayd@users.noreply.github.com>
|
@copilot we no longer support longitudinal data so you can remove this part of the table creation and all related code, the schema must be the same as in the R pipeline as we need to fill the existing big Query tables but we do not need the longitudinal table for now |
Co-authored-by: pmayd <9614291+pmayd@users.noreply.github.com>
Done in 72c7be9. Removed |
|
@copilot I asked you to trade the migration. Add a detailed grading and overview of the migration to this PR in the description showing the mapping between the R components and the Python component, if we migrated it, how and what we improved. I need a complete analysis before going live |
I've updated the PR description with the full analysis. It includes:
Overall verdict: patient pipeline is ready for end-to-end testing. Product pipeline is intentionally excluded and can be integrated in a follow-up PR. |
There was a problem hiding this comment.
Pull request overview
This PR finalizes the patient data pipeline for end-to-end testing by removing the longitudinal HbA1c table, updating Python version support, fixing stale column references, and deferring product data ingestion to a future release.
Changes:
- Extended Python version support to include Python 3.13 (changed upper bound from
<3.13to<3.14) - Removed the longitudinal HbA1c table (
patient_data_hba1c) and its creation functioncreate_table_longitudinal_data - Guarded product data BigQuery ingestion with a comment explaining it's deferred until the product pipeline is finalized
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| scripts/python/pyproject.toml | Updated Python version constraint to support Python 3.13 |
| scripts/R/run_script_3_create_tables.R | Removed longitudinal HbA1c table creation logic (48 lines) |
| scripts/R/run_pipeline.R | Commented out product data BigQuery ingestion with explanatory note |
| R/script3_create_table_patient_data_changes_only.R | Deleted entire file containing longitudinal table creation function |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
support_from_a4d→support_level, removeupdated_2022_date)pyproject.toml:<3.13→<3.14)run_pipeline.R(commented out, prevents hard abort)script3_create_table_patient_data_changes_only.Rdeleted, call block and BQ ingest removed)Migration Analysis — Patient Pipeline
Architecture
Component Mapping
scripts/python/main.py(CLI tool)scripts/python/sort_yaml.pyscript1_read_patient_data.R,script1_helper_read_patient_data.R,script1_process_patient_data.Rread_product_data.R,helper_product_data.R,wide_format_2_long_format.R,script1_process_product_data.Rscript1_helper_read_patient_data.R+reference_data/synonyms/synonyms_patient.yaml(100 patient fields, 358 lines)script1_get_tracker_year.Rscript2_process_patient_data.R,script2_helper_patient_data_fix.R,script2_helper_dates.R,script2_sanitize_str.Rreference_data/data_cleaning.yaml(22 columns)reference_data/provinces/allowed_provinces.yaml(215 entries)script2_process_product_data.R,read_product_data.R(step 2)script3_create_table_patient_data.Rscript3_create_table_patient_data_static.Rscript3_create_table_patient_data_annual.Rscript3_create_table_clinic_static_data.R+reference_data/clinic_data.xlsx(synced from Google Sheets)script3_create_table_product_data.Rscript3_create_table_patient_data_changes_only.Rscript3_link_product_patient.Rlogger.Rrun_script_4_create_logs_table.Rrun_script_5_create_metadata_table.Rscripts/R/run_pipeline.RBigQuery Tables — Patient Pipeline
patient_data_monthlypatient_data_staticpatient_data_annualclinic_data_staticlogstracker_metadataproduct_datapatient_data_hba1cWhat Was Migrated / Improved vs Previous State
clinic_code+country_codecolumnsclinic_idderived from parent folder — simpler join withclinic_data_staticsupport_from_a4d,updated_2022_datein column listsupport_level, stale column removed>=3.10,<3.13— excluded Python 3.13>=3.10,<3.14— supports current Python releasepatient_data_annualadded for 2024+ tracker formathuman_insulin_pre_mixed/short_acting/intermediate_acting,analog_insulin_*columns +insulin_type/insulin_subtypederived fieldshba1c_baseline_exceeds,hba1c_updated_exceedsboolean flagstryCatchon every tracker/table step; single file failure does not stop batchGrading
BadZipFile, detects already-anonymized files, file logging; no automated testsOverall patient pipeline readiness: ready for end-to-end testing.
The product pipeline is intentionally excluded and can be integrated in a follow-up PR.