cnvkit.py batch hangs on very large WGS dataset #1015

gtollefson · 2024-02-06T23:17:36Z

gtollefson
Feb 6, 2024

I think I'm experiencing memory bottlenecking on a large sample set while generating the reference.cnn and am hoping you can help me to identify a way to move past it.

I'm running the following command to generate a reference.cnn for 5 normal samples. Each of the normal bam files are between 150-200GB (around 1Tb total). I'm curious whether the batch command attempts to open and run on all files simultaneously. I'm allocating my institution's maximum user memory allowance 1200GB for this job but it appears to hang indefinitely (over several days) and nohup doesn't show any CPU usage for this job).

cnvkit.py batch -n -m wgs -f <fasta_reference> --processes 40 -d output_male_new/ $sample_paths

Is there a way to generate the reference.cnn with all 5 of my normal samples by running each individually and then merging them?

Are there any other parameters you recommend setting for both creating the normal panel and also running the CNV calling on tumor samples using WGS bam files of this magnitude?

Thank you,
George

etal · 2026-02-24T01:16:43Z

etal
Feb 24, 2026
Maintainer

Yes, the batch command optimizes for wall-clock time and will open and read files concurrently in as many processes as you give it. This means the peak memory usage with -p 40 can be 40x the peak memory usage with a single process, -p 1. You may be able to solve the OOM problem by reducing the number of processes.

Also note that batch doesn't do distributed computing, only multiprocessing on the local node. If you have a compute cluster, then slurm or nextflow would be a better way to split the workload across multiple nodes and take advantage of their memory rather than maxing out a single node.

0 replies

etal · 2026-02-24T01:17:41Z

etal
Feb 24, 2026
Maintainer

I'm working on some WGS improvements for the next release and I'll try to document the best practices for it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cnvkit.py batch hangs on very large WGS dataset #1015

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

cnvkit.py batch hangs on very large WGS dataset #1015

Uh oh!

Uh oh!

gtollefson Feb 6, 2024

Replies: 2 comments

Uh oh!

etal Feb 24, 2026 Maintainer

Uh oh!

etal Feb 24, 2026 Maintainer

gtollefson
Feb 6, 2024

etal
Feb 24, 2026
Maintainer

etal
Feb 24, 2026
Maintainer