Effelsberg Large-scale Data Exploration with Nextflow for Robust Identification of New Globular cluster pulsars.
A GPU-accelerated Nextflow pipeline for pulsar candidate detection featuring RFI mitigation, periodicity searches with peasoup, candidate folding with PulsarX, and machine learning classification.
- Features
- Pipeline Overview
- Requirements
- Installation
- Quick Start
- Input File Formats
- Available Workflows
- Configuration
- Output Structure
- Advanced Usage
- Troubleshooting
- Contributing
- License
- GPU-Accelerated Search: Fast periodicity searches using peasoup on NVIDIA GPUs
- RFI Mitigation: Automated RFI detection and filtering with spectral kurtosis
- Multi-Beam Support: Process multiple beams in parallel
- Coherent Dedispersion: Support for DADA baseband data with digifits conversion
- Filterbank Stacking: Stack multiple beams by coherent DM for improved sensitivity
- Segmented Searches: Search full observation and sub-segments for accelerated pulsars
- ML Classification: PICS-based candidate scoring
- Alpha-Beta-Gamma Scoring: Additional candidate ranking metrics
- Resume Support: Automatic caching and resume capability via Nextflow
- Cumulative Runtime Tracking: Track total processing time across resumed runs
- Email Notifications: Optional notifications on completion or failure
- Input Validation: Pre-flight checks for parameters and input files
┌─────────────────────────────────────────────────────────────────────────────┐
│ ELDEN-RING Pipeline │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ DADA Files ──► digifits ──┐ │
│ │ │
│ FITS/Filterbanks ─────────┼──► RFI Filter ──► filtool ──► Segmentation │
│ │ │
│ └──────────────────────────────────────────────►│
│ │
│ Segmentation ──► birdies ──► peasoup (GPU) ──► XML Parse ──► PulsarX │
│ │
│ PulsarX ──► Merge Folds ──► PICS Classifier ──► Alpha-Beta-Gamma │
│ │
│ Final Output: CandyJar tarball with ranked candidates │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
- Nextflow >= 21.10.0
- Singularity >= 3.0 (or Docker)
- NVIDIA GPU with CUDA support (for peasoup)
The pipeline uses containerized tools. Required images:
| Tool | Purpose |
|---|---|
pulsarx_image |
Candidate folding (PulsarX) |
peasoup_image |
GPU periodicity search |
presto_image |
Filterbank utilities (readfile) |
rfi_mitigation_image |
RFI analysis and filtering |
pics_classifier_image |
ML candidate classification |
edd_pulsar_image |
DADA to FITS conversion (digifits) |
git clone https://github.com/erc-compact/elden-ring.git
cd elden-ringnextflow pull erc-compact/elden-ringnextflow run elden.nf -entry setup_basedir --basedir /path/to/my_projectThis creates:
/path/to/my_project/
├── params.config # Main configuration (edit this)
├── inputfile.txt # Input data CSV (edit this)
├── generate_inputfile.sh # Helper script for input generation
├── meta/ # Pipeline metadata
└── shared_cache/ # Reusable cached files
For filterbank/FITS files:
cd /path/to/my_project
bash generate_inputfile.sh \
--cluster NGC6544 \
--ra "18:07:20.5" \
--dec "-24:59:51" \
--utc "2024-01-15T10:00:00" \
--cdm "60.0 120.0" \
/path/to/data/*.filFor DADA baseband directories:
bash generate_inputfile.sh \
--dada \
--cluster 2MASS-GC02 \
--ra "18:09:36.51" \
--dec "+20:46:43.99" \
--utc "2025-12-06T13:08:08" \
--cdm "156.0 428.0 700.0" \
/path/to/baseband3 /path/to/baseband4 /path/to/baseband5vim params.configKey parameters to review:
basedir- Output directory (auto-set by setup)runID- Unique identifier for this searchtelescope- Your telescope (effelsberg, meerkat, etc.)ddplan.*- DM search rangepeasoup.*- Search parameters (acceleration, segments, SNR threshold)
nextflow run elden.nf \
-entry full \
-profile hercules \
-c params.config \
--runID my_search_v1 \
-resumeCSV format for filterbank/FITS files:
pointing,cluster,beam_name,beam_id,utc_start,ra,dec,fits_files,cdm
0,NGC6544,cfbf00001,1,2024-01-15T10:00:00,18:07:20.5,-24:59:51,/path/to/beam1.fil,60.0
0,NGC6544,cfbf00002,2,2024-01-15T10:00:00,18:07:20.5,-24:59:51,/path/to/beam2.fil,60.0| Column | Description |
|---|---|
pointing |
Pointing index (integer) |
cluster |
Target name / cluster identifier |
beam_name |
Beam identifier (e.g., cfbf00001) |
beam_id |
Numeric beam ID |
utc_start |
Observation start time (ISO format) |
ra |
Right ascension (HH:MM:SS.ss) |
dec |
Declination (DD:MM:SS.ss) |
fits_files |
Full path to filterbank/FITS file |
cdm |
Coherent dedispersion DM |
CSV format for DADA baseband directories:
pointing,dada_files,cluster,beam_name,beam_id,utc_start,ra,dec,cdm_list
0,/path/to/baseband3/*dada,2MASS-GC02,cfbf00003,3,2025-12-06T13:08:08,18:09:36.51,+20:46:43.99,156.0 428.0 700.0
0,/path/to/baseband4/*dada,2MASS-GC02,cfbf00004,4,2025-12-06T13:08:08,18:09:36.51,+20:46:43.99,156.0 428.0 700.0Note: cdm_list contains space-separated coherent DM values. The pipeline will process each CDM independently.
Select a workflow with the -entry flag:
| Workflow | Description |
|---|---|
full |
Complete pipeline: intake → RFI → clean → search → fold → classify |
run_search_fold |
Search & fold on pre-cleaned filterbanks |
run_rfi_clean |
RFI cleaning only (intake → filter → clean) |
generate_rfi_filter |
Generate RFI diagnostic plots only |
| Workflow | Description |
|---|---|
run_dada_search |
Full pipeline starting from DADA baseband files |
run_digifits |
Convert DADA to FITS/filterbank only |
run_dada_clean_stack |
DADA → FITS → clean → stack (no search) |
| Workflow | Description |
|---|---|
fold_par |
Fold data using a known pulsar ephemeris (.par file) |
candypolice |
Re-fold candidates from an existing CandyJar CSV |
| Workflow | Description |
|---|---|
help |
Display detailed usage information |
setup_basedir |
Initialize a new project directory |
validate_inputs |
Validate input files and parameters |
cleanup_cache |
Find orphaned files in shared cache |
// Required
params.basedir = "/path/to/project"
params.runID = "search_v1"
params.files_list = "inputfile.txt"
params.telescope = "effelsberg"
// DM Search Range
params.ddplan.dm_start = -10 // Relative to coherent DM
params.ddplan.dm_end = 10
params.ddplan.dm_step = 0.1
// Peasoup Search
params.peasoup.segments = [1, 2, 4] // Full, half, quarter segments
params.peasoup.acc_start = -50 // Acceleration range (m/s²)
params.peasoup.acc_end = 50
params.peasoup.min_snr = 8.0
// Processing Options
params.filtool.run_filtool = true
params.generateRfiFilter.run_rfi_filter = true
params.stack_by_cdm = false
params.split_fil = false
// Notifications (optional)
params.notification.enabled = true
params.notification.email = "user@example.com"
params.notification.on_complete = true
params.notification.on_fail = trueSelect a profile with -profile:
| Profile | Description |
|---|---|
local |
Local execution (testing) |
hercules |
SLURM cluster with GPU nodes |
edgar |
Edgar cluster configuration |
contra |
Contra cluster configuration |
condor |
HTCondor submission |
Create custom profiles in conf/profiles/.
basedir/
├── shared_cache/ # Reusable cached files
│ └── <cluster>/
│ ├── FITS/ # Converted FITS files (from DADA)
│ └── <beam_name>/
│ ├── RFIFILTER/ # RFI diagnostic plots
│ └── CLEANEDFIL/ # Cleaned filterbanks
│
├── <runID>/ # Run-specific outputs
│ ├── <beam_name>/
│ │ └── segment_<N>/
│ │ └── <seg_id>/
│ │ ├── BIRDIES/ # Birdie detection files
│ │ ├── SEARCH/ # Peasoup XML results
│ │ ├── PARSEXML/ # Parsed candidates
│ │ │ └── XML/ # Filtered XML files
│ │ ├── FOLDING/ # PulsarX outputs
│ │ │ ├── PNG/ # Diagnostic plots
│ │ │ ├── AR/ # Archive files
│ │ │ ├── CANDS/ # .cands files
│ │ │ ├── CSV/ # Merged CSVs
│ │ │ └── PROVENANCE/ # Tracking files
│ │ ├── ABG/ # Alpha-beta-gamma scores
│ │ ├── ZERODM/ # Zero-DM plots
│ │ └── CLASSIFICATION/ # PICS scores
│ │
│ ├── TARBALL_CSV/ # CSV files for tarball
│ ├── CANDIDATE_TARBALLS/ # Final candidate packages
│ ├── DMFILES/ # DM search files
│ └── pipeline_summary_*.txt # Run summary
│
└── .cumulative_runtime_*.txt # Runtime tracking
nextflow run elden.nf -entry full -profile hercules -c params.config -resumenextflow run elden.nf -entry validate_inputs -c params.config# Dry run (shows what would be deleted)
nextflow run elden.nf -entry cleanup_cache --basedir /path/to/project
# Actually delete orphaned files
bash scripts/cleanup_shared_cache.sh /path/to/project falsenextflow run elden.nf -entry fold_par \
-c params.config \
--parfold.parfile_path /path/to/pulsar.parnextflow run elden.nf -entry candypolice \
-c params.config \
--candypolice.input_csv /path/to/candyjar.csvEnable in params.config:
params.copy_from_tape.run_copy = true
params.copy_from_tape.remoteUser = "username"
params.copy_from_tape.remoteHost = "remote.cluster.edu"# View recent log
cat .nextflow.log
# View execution history
nextflow log
# View specific run
nextflow log <run_name> -f name,status,exit,durationGPU not detected
- Ensure CUDA drivers are installed
- Check Singularity GPU bindings:
singularity exec --nv
Out of memory
- Reduce
params.peasoup.segmentsto fewer segments - Adjust SLURM memory requests in profile
Missing input files
- Run
validate_inputsworkflow to check paths - Verify CSV file format matches expected columns
Cache corruption
- Delete
work/directory and re-run with-resume - Clean shared_cache if needed
nextflow run elden.nf -entry help- Fork the repository
- Create a feature branch
- Submit a pull request
This project is part of the ERC COMPACT project.
- Open an issue: https://github.com/erc-compact/elden-ring/issues
- Email: fkareem[at]mpifr-bonn.mpg.de

