XR-seq Analysis Pipeline (C. elegans)

This repository contains a modular pipeline for processing and analyzing XR-seq (excision repair sequencing) data from raw .fastq.gz files through to simulation-normalized metaprofiles and quality control visualizations.

Repository Structure

xrseq_pipeline/
├── XR_Seq.sh                   # Main XR-seq processing pipeline (adapter trim → BAM/BED → filtering)
├── fordinucleotide.sh          # Script to generate fasta and count dinucleotide composition
├── bw.sh                       # Script to generate strand-separated simulation-normalized bigWig files
├── fa2kmerAbundanceTable.py    # k-mer abundance calculator
├── fasta.py                    # FASTA parsing utilities
├── sequence.py                 # Sequence filtering for dipyrimidine reads
├── exact_dam_site.py           # Locating the exact damage site in 2bp resolution 
├── plot_dinucleotide.R         # Dinucleotide positional enrichment barplot script
├── plot_monomer_rld.R          # Monomer frequency and read-length distribution plot

Requirements

Modules or tools:
- cutadapt, fastx_toolkit, bowtie2, samtools, bedtools, ucsctools
- Python ≥ 3.6 with BioPython
- R with ggplot2, wesanderson, ggthemes, dplyr, tidyr

Pipeline Overview

1. Core Processing (`XR_Seq.sh`)

Input: .fastq.gz files
Steps:
- Adapter trimming (cutadapt)
- Read collapsing (fastx_collapser)
- Genome alignment (bowtie2)
- Conversion to BED format (samtools → bedtools)
- Length filtering to retain reads [21–28 nt]
- FASTA extraction from BED (bedtools)
- Read count and length distribution generation

2. Dinucleotide Enrichment Analysis

Run:

bash fordinucleotide.sh

This prepares filtered FASTA files and performs dinucleotide counting for TT/TC/CT/CC at each position across reads.

Plot using:

Rscript plot_dinucleotide.R

3. Monomer Profile & Length Distribution

Run monomer.py to compute position-specific nucleotide ratios across filtered reads. Then visualize with:

Rscript plot_monomer_rld.R sample_name

This generates:

[sample]_monomer_graph.png
[sample]_rld_graph.png

4. Strand-Specific bigWig Files (`bw.sh`)

Generates plus- and minus-strand bigWig files normalized to RPM:

Removes mitochondrial DNA
Uses bedtools genomecov and bedGraphToBigWig

5. Locating the exact damage site in 2bp resolution

Run:

python3 exact_dam_site.py \
  --input merged_bed \
  --output merged_bed_exact_dam_site

Used only for nucleosome occupancy and dyad-centered analyses (Figure 5g,h and 6a,b)
Both input and output should be directories.
Inside merged_bed directory, stranded bed files and their simulations should be stored (see Notes below).

Output Files

*_filtered.bed: final XR-seq reads, 21–28 nt
*_readCount.txt: total filtered reads
*_plus.bw, *_minus.bw: RPM-normalized bigWigs
*_monomer_graph.png, *_rld_graph.png: QC plots
*_get2ker.csv: dinucleotide abundance

Notes

Genome index used: WBcel235 (ce11)
Adapted for use on HPC environments with SLURM
filtered.bed for two biological repeats can be merged and deposited in a new folder "merged_bed"
for xrseq around histone markers: stranded xrseq merged bed files generated by strandedmergedbedfiles.sh
simulations applied to merged.bed

Author

Cansu Kose, 2025 – UNC Chapel Hill | Sancar Lab Cem Azgari, 2025 – Sabancı University | Adebali Lab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XR-seq Analysis Pipeline (C. elegans)

Repository Structure

Requirements

Pipeline Overview

1. Core Processing (`XR_Seq.sh`)

2. Dinucleotide Enrichment Analysis

3. Monomer Profile & Length Distribution

4. Strand-Specific bigWig Files (`bw.sh`)

5. Locating the exact damage site in 2bp resolution

Output Files

Notes

Author

FilesExpand file tree

README_XRseq.md

Latest commit

History

README_XRseq.md

File metadata and controls

XR-seq Analysis Pipeline (C. elegans)

Repository Structure

Requirements

Pipeline Overview

1. Core Processing (XR_Seq.sh)

2. Dinucleotide Enrichment Analysis

3. Monomer Profile & Length Distribution

4. Strand-Specific bigWig Files (bw.sh)

5. Locating the exact damage site in 2bp resolution

Output Files

Notes

Author

1. Core Processing (`XR_Seq.sh`)

4. Strand-Specific bigWig Files (`bw.sh`)