Skip to content

Upload v2 data to GCP #6

@adamklie

Description

@adamklie

Context

Upload v2 results for igvf-data/igvf_sc-islet_10X-Multiome to GCP via bin/9_upload/1_upload_to_gcp.ipynb.

Upgraded to high priority 2026-04-16. This is the canonical v2 snapshot that downstream collaborators (Gaulton lab for #2, cross-dataset integrative analyses for #9/#10) pull from. Doubles as the template for the v3 upload once #8 ATAC integration lands — i.e. this defines the shared object layout that "define → GCP-ize → v3-ize" will reuse for every subsequent analysis.

Tasks

  • Design step — define the manifest: exhaustive list of what gets uploaded (files + sizes + provenance) and the bucket layout / directory structure / naming convention. Write as a MANIFEST.md / LAYOUT.md before any upload happens. No upload until this is agreed.
  • Format and upload single-cell matrices (RNA + ATAC)
  • Upload cell-type annotations (harmony_round_2_leiden_1.0_cell_type_annotation.csv)
  • Upload the 3 v2 peak sets (pseudoreplicates / timepoints / no_replicates) with post-processed + annotated consensus peaks
  • Format and upload pseudobulk matrices (RNA + ATAC)
  • Format and upload differential analysis results (when Finalize DESeq2 pseudobulk methodology for RNA and ATAC #4 finalized)
  • README + provenance notes at the bucket root so the layout is self-documenting for downstream consumers

Metadata

Metadata

Assignees

Labels

infrastructureData upload, pipeline, toolingmultiome10X Multiome analysis

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions