Hi there,
MASH is good for very similar sequences but might not be the best for long evolutionary distances. I saw blastp is the alternative but it's currently encoded to run one sequence at a time in a loop. Commands from BLAST+ don't scale well past 2-4 threads, so it would be better to make the function multithreaded. Alternatively, this could be done through GNU Parallel or DIAMOND as they should be easy in-place replacement and they scale pretty well. Another option could be to use MMSeqs2.
What do you think?
Hi there,
MASH is good for very similar sequences but might not be the best for long evolutionary distances. I saw
blastpis the alternative but it's currently encoded to run one sequence at a time in a loop. Commands from BLAST+ don't scale well past 2-4 threads, so it would be better to make the function multithreaded. Alternatively, this could be done through GNU Parallel or DIAMOND as they should be easy in-place replacement and they scale pretty well. Another option could be to use MMSeqs2.What do you think?