-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hi Diptavo,
I hope you're doing well. I have been using Meta-MultiSKAT for meta-analysis on a large-scale genome-wide dataset. To improve computational efficiency, I have already chunked each chromosome into 150 smaller parts. However, I am still facing issues with certain chunks where the genotype file size is large- these runs are significantly slow, and in some cases, they fail.
I came across the computation time estimates in your paper (attached) and wanted to ask if you have any suggestions on optimizing runtime for large datasets. Specifically:
Time Adjustment Strategies – Based on your reported CPU time, how can we efficiently scale the runtime when working with significantly larger genotype files?
Memory and Computational Load – Are there optimal hardware configurations (e.g., memory allocation, high-performance computing settings) that you recommend?
Further Parallelization – Would increasing the number of parallel jobs or adjusting certain parameters help improve speed?
Alternative Workarounds – Are there preprocessing steps or modifications to the Meta-MultiSKAT pipeline that could make it more scalable for genome-wide analysis?
I would greatly appreciate any insights you could provide on making the workflow more efficient. I have attached a section from your paper for reference. Looking forward to your thoughts.
Sincerely,
Neetesh
