Skip to content

ncihtan/phase1_longitudinal_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HTAN Phase 1 Longitudinal Data Table

Infrastructure and code to generate and maintain the HTAN Phase 1 longitudinal clinical table using Google Cloud Run.

Purpose

The Phase 1 longitudinal table provides a structured view of each participant’s disease course. It is designed to:

  • Align clinical events (diagnosis, vital status)
  • Track biospecimen collection events
  • Capture therapy exposure
  • Link to downstream molecular data via biospecimen identifiers

This table serves as a mapping for integrated analyses that connect patient trajectories with HTAN assay data.


Requirements

Requires access to deploy resources in the HTAN Google Cloud Project, htan-dcc.
Please contact an owner of htan-dcc to request access.
(Owners in 2026: Dar'ya Pozhidayeva, Yamina Katariya, Vesteinn Thorsson, William Longabaugh, ISB)


Prerequisites

  • Create a Synapse Auth Token secret in Secret Manager.
    The workflow requires read access to HTAN Synapse metadata sources.
    Currently uses the synapse-service-HTAN-lambda service account.

  • Install Terraform ≥ 1.7.0


What This Job Does

The Cloud Run job:

  1. Loads HTAN clinical and metadata tables
  2. Identifies eligible participants for longitudinal analysis
  3. Extracts time-resolved clinical events (diagnosis, therapy, follow-up, vital status)
  4. Integrates biospecimen and molecular test events
  5. Cleans and validates event timing
  6. Flags suspicious or implausible records
  7. Writes the updated longitudinal table to BigQuery

Docker Image

Before creating the job, build and push a Docker image to Google Artifact Registry:

cd src
docker build . -t us-docker.pkg.dev/<gc-project>/gcr.io/<image-name>
docker push us-docker.pkg.dev/<gc-project>/gcr.io/<image-name>

Deploy Cloud Resources

Define variables in terraform.tfvars.
Variable descriptions are available in variables.tf.

terraform init
terraform plan
terraform apply

Output

The pipeline produces a BigQuery longitudinal table containing:

  • HTAN_Participant_ID
  • Event start and end times
  • Event type
  • Event details
  • Quality control flags

This table supports longitudinal cohort definition, clinical trajectory analysis, and linkage to molecular assay data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors