BDB-Genomics ChIP-seq Pipeline

A scalable, reproducible, and modular Snakemake pipeline for end-to-end processing of paired-end ChIP-seq data. Starting from raw FASTQ files, the pipeline performs quality control, alignment, duplicate removal, peak calling, coverage generation, control-aware signal normalization, and downstream analysis.

Requirements

Software	Version	Purpose
Snakemake	≥ 8.0	Workflow manager
Singularity / Apptainer	≥ 3.8	Container runtime
Conda / Mamba	any	Environment manager
Python	≥ 3.10	Config validation script

Installation

# Create and activate a Snakemake environment
conda create -n snakemake -c bioconda -c conda-forge snakemake singularity graphviz -y
conda activate snakemake

# Clone the pipeline
git clone https://github.com/<your-org>/chipseq_pipeline.git
cd chipseq_pipeline

Quick Start

1. Prepare reference files

Place the following in data/reference/:

File	Description
`genome.fa`	Reference genome FASTA
`genome.chrom.sizes`	Chromosome sizes (from `samtools faidx` + `cut -f1,2`)
`ENCODE_blacklist.bed`	ENCODE blacklist (e.g., hg38)
`annotation.gtf`	Gene annotation GTF (Ensembl or GENCODE)

Build the Bowtie2 index:

bowtie2-build data/reference/genome.fa data/02_alignment/bowtie2/index/genome

2. Define your samples

Create data/fastp/samples.tsv (tab-separated):

sample	condition	replicate	fastq_r1	fastq_r2	control
Sample1	Control	1	/path/to/Sample1_R1.fastq.gz	/path/to/Sample1_R2.fastq.gz	NONE
Sample2	Control	2	/path/to/Sample2_R1.fastq.gz	/path/to/Sample2_R2.fastq.gz	NONE
Sample3	Treatment	1	/path/to/Sample3_R1.fastq.gz	/path/to/Sample3_R2.fastq.gz	Sample1
Sample4	Treatment	2	/path/to/Sample4_R1.fastq.gz	/path/to/Sample4_R2.fastq.gz	Sample2

Note: Use absolute paths for FASTQ files.

Use NONE in the control column for control-only rows.

3. Configure parameters

Edit config.yaml — at minimum, verify:

global.bowtie_index — path to your Bowtie2 index prefix
global.genome_fa — path to reference FASTA
global.genome_chrom_sizes — chrom sizes file
mito_ChIP_calculate.params.mito_chr — "MT" (Ensembl) or "chrM" (UCSC)
macs_peakcall.params.genome_size — "hs" (human), "mm" (mouse), etc.
qc_gate.params.* — FRiP, NSC, RSC, mapping-rate, and duplicate-rate thresholds

4. Run the pipeline

# Dry run (check DAG without executing)
snakemake --dry-run --cores 8

# Run locally with Singularity
snakemake --use-singularity --cores 8

# Run locally with Conda
snakemake --use-conda --cores 8

# Run on SLURM cluster
snakemake --use-singularity --profile slurm --jobs 50

Output Structure

results/
├── fastp/                      # Trimmed FASTQs + fastp reports
├── fastqc/                     # FastQC HTML + ZIP reports
├── bowtie2/                    # Raw aligned BAMs
├── samtools_sort/              # Coordinate-sorted BAMs
├── mito_ChIP/                  # Mitochondrial read statistics
├── remove_mito_reads/          # MT-filtered BAMs
├── samtools_fixmate/           # Fixmate BAMs
├── samtools_markdup/           # Deduplicated BAMs
├── samtools_index/             # BAM indices
├── samtools_view/              # MAPQ/flag filtered BAMs
├── samtools_stats/             # Alignment statistics
├── fragment_size_analysis/     # Fragment size distributions & plots
├── picard/
│   ├── CollectAlignmentSummaryMetrics/
│   └── CollectInsertSizeMetrics/
├── bedtools_genomecov/         # Raw BedGraphs
├── sorted_bedgraph_file/       # Sorted BedGraphs
├── bigwig/                     # Raw BigWig tracks
├── normalized_coverage/        # CPM-normalized BigWigs
├── bamCompare/                 # ChIP vs control log2 signal tracks
├── correlation_analysis/       # Sample correlation matrix & heatmap
├── macs2_peakcall/             # narrowPeak files
├── filtered_peaks/             # Blacklist-filtered peaks
├── heatmap/                    # TSS heatmaps (matrix + PDF)
├── frip_calculation/           # FRiP scores per sample
├── qc_gate/                    # QC pass/fail markers
├── peak_annotation/            # ChIPseeker annotation tables
├── motif_analysis/             # HOMER motif discovery output
├── phantompeakqualtools/       # NSC/RSC metrics
├── preseq/                     # Library complexity curves
├── qualimap/                   # BAM QC reports
└── multiqc/                    # Aggregated MultiQC report ← start here

Key Parameters

Parameter	Default	Description
`fastp.params.trim_front1/2`	5	Bases trimmed from 5' end (R1/R2)
`fastp.params.length_required`	30	Minimum read length after trimming
`bowtie2.sensitive`	`--very-sensitive`	Alignment sensitivity preset
`mito_ChIP_calculate.params.mito_chr`	`MT`	Mitochondrial chromosome name
`samtools_view.params.MAPQ`	30	Minimum mapping quality
`samtools_view.params.flags`	3844	SAM flags to exclude
`macs_peakcall.params.genome_size`	`hs`	Effective genome size
`macs_peakcall.params.qvalue`	0.01	MACS2 peak calling q-value threshold
`macs_peakcall.params.format`	`BAMPE`	Paired-end BAM input format
`qc_gate.params.min_frip`	0.05	Minimum FRiP threshold
`qc_gate.params.min_nsc`	1.05	Minimum NSC threshold
`qc_gate.params.min_rsc`	0.8	Minimum RSC threshold
`qc_gate.params.min_mapping_rate`	90.0	Minimum mapping-rate threshold
`qc_gate.params.max_duplicate_rate`	20.0	Maximum duplicate-rate threshold

Software Versions (Singularity Containers)

Tool	Version	Container
fastp	1.1.0	`fastp:1.1.0--heae3180_0`
FastQC	—	`fastqc`
Bowtie2	2.5.4	`bowtie2:2.5.4--he96a11b_5`
samtools	—	`samtools`
Picard	—	`picard`
bedtools	2.31.1	`bedtools:2.31.1--h13024bc_3`
MACS2	2.2.9.1	`macs2:2.2.9.1--py39hbcbf7aa_2`
deepTools	—	`deeptools`
HOMER	4.11	`homer:4.11--pl5262h4ac6f70_9`
ChIPseeker	1.46.1	`bioconductor-chipseeker:1.46.1--r45hdfd78af_0`
PhantomPeakQualTools	1.2.2	`phantompeakqualtools:1.2.2--0`
MultiQC	—	`multiqc`

All containers are pulled from the Galaxy Project Singularity depot.

Authors

Himanshu Bhandary Email: 2032ushimanshu@gmail.com

License

MIT License. See LICENSE for full terms.

Citation

Bhandary H. et al. (2026). Modular ChIP-seq Pipeline [Software]. GitHub. https://github.com//chipseq_pipeline

(Update with journal citation once published.)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
data/fastp		data/fastp
profiles		profiles
rules		rules
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BDB-Genomics ChIP-seq Pipeline

Requirements

Installation

Quick Start

1. Prepare reference files

2. Define your samples

3. Configure parameters

4. Run the pipeline

Output Structure

Key Parameters

Software Versions (Singularity Containers)

Authors

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BDB-Genomics ChIP-seq Pipeline

Requirements

Installation

Quick Start

1. Prepare reference files

2. Define your samples

3. Configure parameters

4. Run the pipeline

Output Structure

Key Parameters

Software Versions (Singularity Containers)

Authors

License

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages