Get up and running with VCC-project in 5 minutes!
# Check Python version (need 3.9+)
python --version
# Check mamba
mamba --version# 1. Clone repository
git clone https://github.com/ACTN3Bioinformatics/VCC-project.git
cd VCC-project
# 2. Create environment
mamba env create -f environment.yml
# 3. Activate environment
mamba activate vcc2025# Download demo data
snakemake download_demo_data --cores 1
# Run complete pipeline
snakemake --cores 4 --configfile config/datasets.yaml
# View results
ls results/demo/The pipeline:
- ✅ Downloaded ~10k cells from public Perturb-seq data
- ✅ Filtered low-quality cells (QC)
- ✅ Normalized and log-transformed counts
- ✅ Balanced perturbation classes
- ✅ Created train/val/test splits
- ✅ Generated QC reports
import scanpy as sc
# Load processed data
adata = sc.read_h5ad('results/demo/final.h5ad')
# Check it out
print(adata)
print(adata.obs['split'].value_counts())- 📖 Read PIPELINE_GUIDE.md for detailed documentation
- 📓 Explore data interactively:
jupyter notebook notebooks/demo_exploration.ipynb - 🔧 Customize parameters in
config/datasets.yaml - 📊 Check QC report:
reports/demo/qc_report.html - 🚀 Process your own data by adding to
config/datasets.yaml
Want to visualize and explore your data? Launch the demo notebook:
# Start Jupyter
jupyter notebook notebooks/demo_exploration.ipynb
# The notebook covers:
# - Loading processed data
# - QC metrics visualization
# - Perturbation analysis
# - PCA and UMAP plots
# - Gene expression patternsOut of memory?
# In config/datasets.yaml, reduce:
demo:
max_genes: 4000
target_cells_per_perturbation: 50Download failed?
# Manual download
wget https://zenodo.org/records/7041849/files/ReplogleWeissman2022_K562_essential.h5ad
mv ReplogleWeissman2022_K562_essential.h5ad data_local/demo/replogle_subset.h5adNeed more help? See TROUBLESHOOTING.md