Infrastructure and code to generate and maintain the HTAN Phase 1 longitudinal clinical table using Google Cloud Run.
The Phase 1 longitudinal table provides a structured view of each participant’s disease course. It is designed to:
- Align clinical events (diagnosis, vital status)
- Track biospecimen collection events
- Capture therapy exposure
- Link to downstream molecular data via biospecimen identifiers
This table serves as a mapping for integrated analyses that connect patient trajectories with HTAN assay data.
Requires access to deploy resources in the HTAN Google Cloud Project, htan-dcc.
Please contact an owner of htan-dcc to request access.
(Owners in 2026: Dar'ya Pozhidayeva, Yamina Katariya, Vesteinn Thorsson, William Longabaugh, ISB)
-
Create a Synapse Auth Token secret in Secret Manager.
The workflow requires read access to HTAN Synapse metadata sources.
Currently uses thesynapse-service-HTAN-lambdaservice account. -
Install Terraform ≥ 1.7.0
The Cloud Run job:
- Loads HTAN clinical and metadata tables
- Identifies eligible participants for longitudinal analysis
- Extracts time-resolved clinical events (diagnosis, therapy, follow-up, vital status)
- Integrates biospecimen and molecular test events
- Cleans and validates event timing
- Flags suspicious or implausible records
- Writes the updated longitudinal table to BigQuery
Before creating the job, build and push a Docker image to Google Artifact Registry:
cd src
docker build . -t us-docker.pkg.dev/<gc-project>/gcr.io/<image-name>
docker push us-docker.pkg.dev/<gc-project>/gcr.io/<image-name>
Define variables in terraform.tfvars.
Variable descriptions are available in variables.tf.
terraform init
terraform plan
terraform apply
The pipeline produces a BigQuery longitudinal table containing:
- HTAN_Participant_ID
- Event start and end times
- Event type
- Event details
- Quality control flags
This table supports longitudinal cohort definition, clinical trajectory analysis, and linkage to molecular assay data.