HTAN Phase 1 Longitudinal Data Table

Infrastructure and code to generate and maintain the HTAN Phase 1 longitudinal clinical table using Google Cloud Run.

Purpose

The Phase 1 longitudinal table provides a structured view of each participant’s disease course. It is designed to:

Align clinical events (diagnosis, vital status)
Track biospecimen collection events
Capture therapy exposure
Link to downstream molecular data via biospecimen identifiers

This table serves as a mapping for integrated analyses that connect patient trajectories with HTAN assay data.

Requirements

Requires access to deploy resources in the HTAN Google Cloud Project, htan-dcc.
Please contact an owner of htan-dcc to request access.
(Owners in 2026: Dar'ya Pozhidayeva, Yamina Katariya, Vesteinn Thorsson, William Longabaugh, ISB)

Prerequisites

Create a Synapse Auth Token secret in Secret Manager.
The workflow requires read access to HTAN Synapse metadata sources.
Currently uses the synapse-service-HTAN-lambda service account.
Install Terraform ≥ 1.7.0

What This Job Does

The Cloud Run job:

Loads HTAN clinical and metadata tables
Identifies eligible participants for longitudinal analysis
Extracts time-resolved clinical events (diagnosis, therapy, follow-up, vital status)
Integrates biospecimen and molecular test events
Cleans and validates event timing
Flags suspicious or implausible records
Writes the updated longitudinal table to BigQuery

Docker Image

Before creating the job, build and push a Docker image to Google Artifact Registry:

cd src
docker build . -t us-docker.pkg.dev/<gc-project>/gcr.io/<image-name>
docker push us-docker.pkg.dev/<gc-project>/gcr.io/<image-name>

Deploy Cloud Resources

Define variables in terraform.tfvars.
Variable descriptions are available in variables.tf.

terraform init
terraform plan
terraform apply

Output

The pipeline produces a BigQuery longitudinal table containing:

HTAN_Participant_ID
Event start and end times
Event type
Event details
Quality control flags

This table supports longitudinal cohort definition, clinical trajectory analysis, and linkage to molecular assay data.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
README.md		README.md
iam.tf		iam.tf
main.tf		main.tf
terraform.tfvars		terraform.tfvars
variables.tf		variables.tf
versions.tf		versions.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HTAN Phase 1 Longitudinal Data Table

Infrastructure and code to generate and maintain the HTAN Phase 1 longitudinal clinical table using Google Cloud Run.

Purpose

Requirements

Prerequisites

What This Job Does

Docker Image

Deploy Cloud Resources

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

ncihtan/phase1_longitudinal_data

Folders and files

Latest commit

History

Repository files navigation

HTAN Phase 1 Longitudinal Data Table

Infrastructure and code to generate and maintain the HTAN Phase 1 longitudinal clinical table using Google Cloud Run.

Purpose

Requirements

Prerequisites

What This Job Does

Docker Image

Deploy Cloud Resources

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages