cnvkit.py batch hangs on very large WGS dataset #1015
Replies: 2 comments
-
|
Yes, the batch command optimizes for wall-clock time and will open and read files concurrently in as many processes as you give it. This means the peak memory usage with Also note that |
Beta Was this translation helpful? Give feedback.
-
|
I'm working on some WGS improvements for the next release and I'll try to document the best practices for it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @etal,
I think I'm experiencing memory bottlenecking on a large sample set while generating the reference.cnn and am hoping you can help me to identify a way to move past it.
I'm running the following command to generate a reference.cnn for 5 normal samples. Each of the normal bam files are between 150-200GB (around 1Tb total). I'm curious whether the batch command attempts to open and run on all files simultaneously. I'm allocating my institution's maximum user memory allowance 1200GB for this job but it appears to hang indefinitely (over several days) and nohup doesn't show any CPU usage for this job).
Is there a way to generate the reference.cnn with all 5 of my normal samples by running each individually and then merging them?
Are there any other parameters you recommend setting for both creating the normal panel and also running the CNV calling on tumor samples using WGS bam files of this magnitude?
Thank you,
George
Beta Was this translation helpful? Give feedback.
All reactions