ChIP_SP: Spatial ChIP (ChIP-SP) as a New Bioinformatics Tool to Characterize Spatial Gene Regulation
ChIP_SP is an R package implementing the core ChIP-SP algorithm for integrating ChIP-seq transcription factor binding with Hi-C chromatin interaction data to identify spatially linked regulatory regions.
The package is designed to provide the core computational engine for ChIP-SP analysis: Step 0, optional removal of sex chromosomes (chrX, chrY); Step 1, merging Hi-C loop outputs across replicates and resolutions; and Step 2, spatial linking of ChIP-seq peaks to distal Hi-C loop anchors and ranking interactions.
Downstream analyses, including gene annotation, pathway enrichment, and visualization, are intentionally excluded from the package API and are instead provided as reference R scripts in inst/scripts/. This design allows users full flexibility in annotation strategy and downstream interpretation.
Hi-C identifies chromatin loops, but it does not resolve the exact nucleotide-level contact point within each interacting bin. ChIP-SP addresses this limitation by identifying ChIP-seq peaks that overlap one anchor of a Hi-C loop, assigning the partner anchor as a spatially linked regulatory region, and treating the full interacting Hi-C anchor region as a potential transcription factor regulatory site.
Spatial interactions are ranked using both ChIP-seq peak strength (pileup) and Hi-C loop confidence (FDR). The resulting ranked regions can then be used for downstream analyses such as gene annotation with ChIPpeakAnno, UCSC Genome Browser visualization, KEGG or pathway enrichment analysis, and transcription factor enrichment analysis.
Install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("Lattesnow/ChIP_SP")Then load the package with:
library(ChIP_SP)The current exported package API includes removeXYChromosomes(), mergeHiCLoops(), and chipSPLink().
removeXYChromosomes() is an optional preprocessing function to remove rows on chromosome X and chromosome Y from a genomic interval table. mergeHiCLoops() merges Hi-C loop output files across replicates and/or resolutions into a unified loop table. chipSPLink() performs the core ChIP-SP spatial integration by linking ChIP-seq peaks to distal Hi-C loop anchors and generating a ranked output table.
All input files should be placed in the same working directory unless full paths are provided.
Hi-C loop files are expected to follow the naming pattern *HiC.xls. These files may include multiple biological replicates and multiple loop-calling resolutions.
ChIP-seq peak files are expected to follow the naming pattern *ChIP.xls. These files are typically generated by MACS2 or equivalent peak-calling software.
Step 0 is an optional preprocessing step for removing chromosome X and chromosome Y from genomic interval tables before downstream analysis. This can be performed using removeXYChromosomes().
Example:
library(ChIP_SP)
hic_df_filtered <- removeXYChromosomes(hic_df, chr_col = "chr")Step 1 merges Hi-C loop outputs across replicates and resolutions. The mergeHiCLoops() function reads Hi-C loop files and combines them into a single merged data frame.
Example:
library(ChIP_SP)
hic_files <- list.files(
pattern = "HiC\\.xls$",
full.names = TRUE
)
hic_df <- mergeHiCLoops(hic_files)The output is a merged data.frame containing Hi-C loops across all replicates and resolutions.
Step 2 performs ChIP–Hi-C spatial integration and ranking. The chipSPLink() function takes a ChIP-seq peak file together with the merged Hi-C loop table and identifies spatially linked regulatory regions.
Example:
chip_file <- list.files(
pattern = "ChIP\\.xls$",
full.names = TRUE
)
chipsp_results <- chipSPLink(
chip_file = chip_file,
hic_df = hic_df
)The output is a ranked data.frame containing ChIP-SP integrated spatial regulatory regions, including genomic coordinates such as chr, start, and end, together with ChIP-seq signal (pileup), Hi-C loop confidence (FDR), normalized scores, and the final ChIP-SP ranking score.
A typical minimal workflow is:
library(ChIP_SP)
hic_files <- list.files(pattern = "HiC\\.xls$", full.names = TRUE)
hic_df <- mergeHiCLoops(hic_files)
hic_df <- removeXYChromosomes(hic_df, chr_col = "chr")
chip_file <- list.files(pattern = "ChIP\\.xls$", full.names = TRUE)
chipsp_results <- chipSPLink(
chip_file = chip_file,
hic_df = hic_df
)Downstream annotation and enrichment analyses are not part of the package API. Reference scripts are provided in inst/scripts/.
These include scripts such as Step0_ChIP_SP_Remove_XY_Chromosomes.R, Step3_ChIP_SP_Gene_annotation_ChIPpeakAnno_UCSC.R, Step4_ChIP_SP_Gene_annotation_Pathwayanalysis_UCSC.R, Step5_ChIP_SP_Gene_annotation_KEGG_ChEA_UCSC.R, and Step6_ClusterGVis_ChIP_SP_Loop.R.
These scripts can be used as templates for ChIPpeakAnno-based gene annotation, pathway enrichment analysis, KEGG and ChEA enrichment, UCSC Genome Browser preparation, and ClusterGVis-based visualization.
The package includes the core ChIP-SP spatial integration algorithm, Hi-C loop merging, optional sex chromosome removal, and reproducible, modular workflows.
The package excludes genome-specific annotation choices, pathway and transcription factor enrichment APIs, visualization-specific dependencies, and web-based downstream tools.
This modular design supports reproducibility, reviewer transparency, and flexibility across species, annotations, and downstream analysis pipelines.
The package currently focuses on the core spatial integration workflow. Annotation and interpretation are intentionally left outside the package API. Users can customize downstream analyses according to species, genome build, and study design.
If you use ChIP_SP in your research, please cite the associated manuscript describing the ChIP-SP method. Citation details will be added once available.
For questions, issues, or feature requests, please open a GitHub issue at the repository or contact tianyi.zhou@childrens.harvard.edu.