Skip to content

Effelsberg Large-scale Data Exploration with Nextflow for Robust Identification of New Globular cluster pulsars.

Notifications You must be signed in to change notification settings

erc-compact/elden-ring

Repository files navigation

ELDEN-RING

Effelsberg Large-scale Data Exploration with Nextflow for Robust Identification of New Globular cluster pulsars.

elden-ring-transformed

A GPU-accelerated Nextflow pipeline for pulsar candidate detection featuring RFI mitigation, periodicity searches with peasoup, candidate folding with PulsarX, and machine learning classification.

Table of Contents

Features

  • GPU-Accelerated Search: Fast periodicity searches using peasoup on NVIDIA GPUs
  • RFI Mitigation: Automated RFI detection and filtering with spectral kurtosis
  • Multi-Beam Support: Process multiple beams in parallel
  • Coherent Dedispersion: Support for DADA baseband data with digifits conversion
  • Filterbank Stacking: Stack multiple beams by coherent DM for improved sensitivity
  • Segmented Searches: Search full observation and sub-segments for accelerated pulsars
  • ML Classification: PICS-based candidate scoring
  • Alpha-Beta-Gamma Scoring: Additional candidate ranking metrics
  • Resume Support: Automatic caching and resume capability via Nextflow
  • Cumulative Runtime Tracking: Track total processing time across resumed runs
  • Email Notifications: Optional notifications on completion or failure
  • Input Validation: Pre-flight checks for parameters and input files

Pipeline Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                           ELDEN-RING Pipeline                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   DADA Files ──► digifits ──┐                                               │
│                             │                                               │
│   FITS/Filterbanks ─────────┼──► RFI Filter ──► filtool ──► Segmentation   │
│                             │                                               │
│                             └──────────────────────────────────────────────►│
│                                                                             │
│   Segmentation ──► birdies ──► peasoup (GPU) ──► XML Parse ──► PulsarX     │
│                                                                             │
│   PulsarX ──► Merge Folds ──► PICS Classifier ──► Alpha-Beta-Gamma         │
│                                                                             │
│   Final Output: CandyJar tarball with ranked candidates                     │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

mermaid-diagram-2025-04-24-115529

Requirements

Software

  • Nextflow >= 21.10.0
  • Singularity >= 3.0 (or Docker)
  • NVIDIA GPU with CUDA support (for peasoup)

Container Images

The pipeline uses containerized tools. Required images:

Tool Purpose
pulsarx_image Candidate folding (PulsarX)
peasoup_image GPU periodicity search
presto_image Filterbank utilities (readfile)
rfi_mitigation_image RFI analysis and filtering
pics_classifier_image ML candidate classification
edd_pulsar_image DADA to FITS conversion (digifits)

Installation

Option 1: Clone the repository

git clone https://github.com/erc-compact/elden-ring.git
cd elden-ring

Option 2: Use Nextflow's built-in pull

nextflow pull erc-compact/elden-ring

Quick Start

1. Initialize a new project

nextflow run elden.nf -entry setup_basedir --basedir /path/to/my_project

This creates:

/path/to/my_project/
├── params.config           # Main configuration (edit this)
├── inputfile.txt           # Input data CSV (edit this)
├── generate_inputfile.sh   # Helper script for input generation
├── meta/                   # Pipeline metadata
└── shared_cache/           # Reusable cached files

2. Generate your input file

For filterbank/FITS files:

cd /path/to/my_project
bash generate_inputfile.sh \
    --cluster NGC6544 \
    --ra "18:07:20.5" \
    --dec "-24:59:51" \
    --utc "2024-01-15T10:00:00" \
    --cdm "60.0 120.0" \
    /path/to/data/*.fil

For DADA baseband directories:

bash generate_inputfile.sh \
    --dada \
    --cluster 2MASS-GC02 \
    --ra "18:09:36.51" \
    --dec "+20:46:43.99" \
    --utc "2025-12-06T13:08:08" \
    --cdm "156.0 428.0 700.0" \
    /path/to/baseband3 /path/to/baseband4 /path/to/baseband5

3. Edit configuration

vim params.config

Key parameters to review:

  • basedir - Output directory (auto-set by setup)
  • runID - Unique identifier for this search
  • telescope - Your telescope (effelsberg, meerkat, etc.)
  • ddplan.* - DM search range
  • peasoup.* - Search parameters (acceleration, segments, SNR threshold)

4. Run the pipeline

nextflow run elden.nf \
    -entry full \
    -profile hercules \
    -c params.config \
    --runID my_search_v1 \
    -resume

Input File Formats

Standard Input (inputfile.txt)

CSV format for filterbank/FITS files:

pointing,cluster,beam_name,beam_id,utc_start,ra,dec,fits_files,cdm
0,NGC6544,cfbf00001,1,2024-01-15T10:00:00,18:07:20.5,-24:59:51,/path/to/beam1.fil,60.0
0,NGC6544,cfbf00002,2,2024-01-15T10:00:00,18:07:20.5,-24:59:51,/path/to/beam2.fil,60.0
Column Description
pointing Pointing index (integer)
cluster Target name / cluster identifier
beam_name Beam identifier (e.g., cfbf00001)
beam_id Numeric beam ID
utc_start Observation start time (ISO format)
ra Right ascension (HH:MM:SS.ss)
dec Declination (DD:MM:SS.ss)
fits_files Full path to filterbank/FITS file
cdm Coherent dedispersion DM

DADA Input (dada_files.csv)

CSV format for DADA baseband directories:

pointing,dada_files,cluster,beam_name,beam_id,utc_start,ra,dec,cdm_list
0,/path/to/baseband3/*dada,2MASS-GC02,cfbf00003,3,2025-12-06T13:08:08,18:09:36.51,+20:46:43.99,156.0 428.0 700.0
0,/path/to/baseband4/*dada,2MASS-GC02,cfbf00004,4,2025-12-06T13:08:08,18:09:36.51,+20:46:43.99,156.0 428.0 700.0

Note: cdm_list contains space-separated coherent DM values. The pipeline will process each CDM independently.

Available Workflows

Select a workflow with the -entry flag:

Main Processing Pipelines

Workflow Description
full Complete pipeline: intake → RFI → clean → search → fold → classify
run_search_fold Search & fold on pre-cleaned filterbanks
run_rfi_clean RFI cleaning only (intake → filter → clean)
generate_rfi_filter Generate RFI diagnostic plots only

DADA Processing Pipelines

Workflow Description
run_dada_search Full pipeline starting from DADA baseband files
run_digifits Convert DADA to FITS/filterbank only
run_dada_clean_stack DADA → FITS → clean → stack (no search)

Specialized Workflows

Workflow Description
fold_par Fold data using a known pulsar ephemeris (.par file)
candypolice Re-fold candidates from an existing CandyJar CSV

Utility Workflows

Workflow Description
help Display detailed usage information
setup_basedir Initialize a new project directory
validate_inputs Validate input files and parameters
cleanup_cache Find orphaned files in shared cache

Configuration

Key Parameters

// Required
params.basedir = "/path/to/project"
params.runID = "search_v1"
params.files_list = "inputfile.txt"
params.telescope = "effelsberg"

// DM Search Range
params.ddplan.dm_start = -10    // Relative to coherent DM
params.ddplan.dm_end = 10
params.ddplan.dm_step = 0.1

// Peasoup Search
params.peasoup.segments = [1, 2, 4]   // Full, half, quarter segments
params.peasoup.acc_start = -50        // Acceleration range (m/s²)
params.peasoup.acc_end = 50
params.peasoup.min_snr = 8.0

// Processing Options
params.filtool.run_filtool = true
params.generateRfiFilter.run_rfi_filter = true
params.stack_by_cdm = false
params.split_fil = false

// Notifications (optional)
params.notification.enabled = true
params.notification.email = "user@example.com"
params.notification.on_complete = true
params.notification.on_fail = true

Cluster Profiles

Select a profile with -profile:

Profile Description
local Local execution (testing)
hercules SLURM cluster with GPU nodes
edgar Edgar cluster configuration
contra Contra cluster configuration
condor HTCondor submission

Create custom profiles in conf/profiles/.

Output Structure

basedir/
├── shared_cache/                    # Reusable cached files
│   └── <cluster>/
│       ├── FITS/                    # Converted FITS files (from DADA)
│       └── <beam_name>/
│           ├── RFIFILTER/           # RFI diagnostic plots
│           └── CLEANEDFIL/          # Cleaned filterbanks
│
├── <runID>/                         # Run-specific outputs
│   ├── <beam_name>/
│   │   └── segment_<N>/
│   │       └── <seg_id>/
│   │           ├── BIRDIES/         # Birdie detection files
│   │           ├── SEARCH/          # Peasoup XML results
│   │           ├── PARSEXML/        # Parsed candidates
│   │           │   └── XML/         # Filtered XML files
│   │           ├── FOLDING/         # PulsarX outputs
│   │           │   ├── PNG/         # Diagnostic plots
│   │           │   ├── AR/          # Archive files
│   │           │   ├── CANDS/       # .cands files
│   │           │   ├── CSV/         # Merged CSVs
│   │           │   └── PROVENANCE/  # Tracking files
│   │           ├── ABG/             # Alpha-beta-gamma scores
│   │           ├── ZERODM/          # Zero-DM plots
│   │           └── CLASSIFICATION/  # PICS scores
│   │
│   ├── TARBALL_CSV/                 # CSV files for tarball
│   ├── CANDIDATE_TARBALLS/          # Final candidate packages
│   ├── DMFILES/                     # DM search files
│   └── pipeline_summary_*.txt       # Run summary
│
└── .cumulative_runtime_*.txt        # Runtime tracking

Advanced Usage

Resume a Failed Run

nextflow run elden.nf -entry full -profile hercules -c params.config -resume

Validate Inputs Before Running

nextflow run elden.nf -entry validate_inputs -c params.config

Clean Up Orphaned Cache Files

# Dry run (shows what would be deleted)
nextflow run elden.nf -entry cleanup_cache --basedir /path/to/project

# Actually delete orphaned files
bash scripts/cleanup_shared_cache.sh /path/to/project false

Fold with Known Pulsar Ephemeris

nextflow run elden.nf -entry fold_par \
    -c params.config \
    --parfold.parfile_path /path/to/pulsar.par

Re-fold Candidates from CandyJar

nextflow run elden.nf -entry candypolice \
    -c params.config \
    --candypolice.input_csv /path/to/candyjar.csv

Copy Data from Remote Cluster

Enable in params.config:

params.copy_from_tape.run_copy = true
params.copy_from_tape.remoteUser = "username"
params.copy_from_tape.remoteHost = "remote.cluster.edu"

Troubleshooting

Check Nextflow Logs

# View recent log
cat .nextflow.log

# View execution history
nextflow log

# View specific run
nextflow log <run_name> -f name,status,exit,duration

Common Issues

GPU not detected

  • Ensure CUDA drivers are installed
  • Check Singularity GPU bindings: singularity exec --nv

Out of memory

  • Reduce params.peasoup.segments to fewer segments
  • Adjust SLURM memory requests in profile

Missing input files

  • Run validate_inputs workflow to check paths
  • Verify CSV file format matches expected columns

Cache corruption

  • Delete work/ directory and re-run with -resume
  • Clean shared_cache if needed

Get Help

nextflow run elden.nf -entry help

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

License

This project is part of the ERC COMPACT project.

Contact

About

Effelsberg Large-scale Data Exploration with Nextflow for Robust Identification of New Globular cluster pulsars.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published