Conversation
chadbrewbaker
left a comment
There was a problem hiding this comment.
General comments. Suggested a few unit tests, and raised the issue of Boost license.
| saidx_t *SA, saidx_t *first, saidx_t *last, | ||
| trbudget_t *budget) { | ||
| #define STACK_SIZE TR_STACKSIZE | ||
| struct { const saidx_t *a; saidx_t *b, *c; saint_t d, e; }stack[STACK_SIZE]; |
There was a problem hiding this comment.
Is stack[x].e dead code? If so this change alone is a win for cache efficiency unless there is some alignment reason why a dummy value was in there.
There was a problem hiding this comment.
stack[x].e stored the trlink but the refactoring makes it unneccessary. It seems trlink was used to track when to switch to heap sort in case of tandem repeat.
| /* Tandem repeat sort */ | ||
| void | ||
| trsort(saidx_t *ISA, saidx_t *SA, saidx_t n, saidx_t depth) { | ||
| trsort(saidx_t *ISA, saidx_t *SA, saidx_t n, saidx_t depth, saidx_t *buf, saidx_t bufsize) { |
There was a problem hiding this comment.
Need to add a small and large test case so this code gets coverage.
| #undef STACK_SIZE | ||
| } | ||
|
|
||
| static |
There was a problem hiding this comment.
Need at least a few tests that pass the same string to each of the sort subroutines and verify they return the same sort.
Also, is any of this code lifted from Boost? If so it needs to be refactored into a separate file, add a note to the license file, and add a build option so you compile with MIT only code by default.
There was a problem hiding this comment.
I can't tell straigth from the code if it's equal in terms of computation but it seems by boost he refers to "improve" and not to the software library. The code looks like a modification of yuta mori's code and integrates quite closely. It would be nice to get an explaination for trsort and the changes as it's rather unoblivious and almost no documentation exists.
There was a problem hiding this comment.
I've got a flight coming up. I'll try to add a lightweight set of unit tests. Also make the CMake more llvm friendly.
and last partitions after tandem repeat partitioning.
It decreases total time for Manzini's corpus by 3% and total time for Gauntlet by 24%