Regarding Optimizing Meta-MultiSKAT for Large Genome-Wide Genotype Files

Hi Diptavo,
I hope you're doing well. I have been using Meta-MultiSKAT for meta-analysis on a large-scale genome-wide dataset. To improve computational efficiency, I have already chunked each chromosome into 150 smaller parts. However, I am still facing issues with certain chunks where the genotype file size is large- these runs are significantly slow, and in some cases, they fail.

I came across the computation time estimates in your paper (attached) and wanted to ask if you have any suggestions on optimizing runtime for large datasets. Specifically:

Time Adjustment Strategies – Based on your reported CPU time, how can we efficiently scale the runtime when working with significantly larger genotype files?

Memory and Computational Load – Are there optimal hardware configurations (e.g., memory allocation, high-performance computing settings) that you recommend?

Further Parallelization – Would increasing the number of parallel jobs or adjusting certain parameters help improve speed?

Alternative Workarounds – Are there preprocessing steps or modifications to the Meta-MultiSKAT pipeline that could make it more scalable for genome-wide analysis?

I would greatly appreciate any insights you could provide on making the workflow more efficient. I have attached a section from your paper for reference. Looking forward to your thoughts.

Sincerely,
Neetesh

![Image](https://github.com/user-attachments/assets/a8ef95ea-2b0e-4f3f-9b39-f38191a79022)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding Optimizing Meta-MultiSKAT for Large Genome-Wide Genotype Files #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Regarding Optimizing Meta-MultiSKAT for Large Genome-Wide Genotype Files #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions