This repository hosts a comprehensive data and workflow resource for Centromere and Satellite Annotation in the APGp1 genome assembly.
This project provides precise characterization of centromeric regions, including:
- Complete Centromere Coordinates
- Detailed Satellite and Higher-Order Repeat (HOR) Annotation Tracks
- Robust Pipelines for HOR identification and Satellite Annotation.
The Annotation directory contains the final, high-quality annotation data.
-
Complete centromere coordinates in APGp1 : APGp1_complete_centromere
-
Satellite tracks for each phased assembly: native_bed_format
-
HOR identified by HORmon: HOR_HORmon
-
HOR identified by HiCAT: HOR_HiCAT
-
CENP-A enrichment boundaries: native_bed_format
We provide and highly recommend the following two pipelines for future T2T human assemblies and related satellite studies.
Directory: SatelliteAnnotationWorkflow
This workflow was developed for the comprehensive annotation of centromeric satellites (including
Directory: HORmining
HORmining is a bioinformatics pipeline designed for robust identification of Higher-Order Repeat (HOR) structures within alpha satellite DNA. It integrates two complementary computational approaches: the graph-based HORmon algorithm and the hierarchical tandem repeat mining (HTRM)-based HiCAT algorithm.
The Analysis directory contains the source code used for the final, integrated analysis of global centromere diversity, structural characterization, and evolutionary comparisons.