Support ragged aux_labels in k2.compose#686
Conversation
|
Ready for review. |
| repeats (i.e., multiplicity) of each output sequence. | ||
| The caller does not need to pre-allocate it. It is | ||
| allocated inside the function. | ||
| @param [out] new2old_indexes |
There was a problem hiding this comment.
Each output sublist could have appeared multiple times in the input, so is there a rule for which one is listed?
Should probably specify that this would be an idx0 if src had 2 axes, and an idx01 if src had 3 axes.
There was a problem hiding this comment.
Each output sublist could have appeared multiple times in the input, so is there a rule for which one is listed?
Sublists of a seq are reordered by their hash values by calling SortSublists.
Although the implementation ofSortSublists does not mention that it is stable, the Python tests
show that the first one is kept, other repeats are discarded.
I think we don't care which one is kept since their contents are identical (assuming there are no hash conflicts).
There was a problem hiding this comment.
Should probably specify that this would be an idx0 if src had 2 axes, and an idx01 if src had 3 axes.
Thanks. Done.
There was a problem hiding this comment.
If it just puts an arbitrary one of the inputs, you should specify that. (Also specify if this is nondeterministic, i.e.
may give different outputs for the same input on different run.s).
There was a problem hiding this comment.
If it just puts an arbitrary one of the inputs, you should specify that. (Also specify if this is nondeterministic, i.e.
may give different outputs for the same input on different run.s).
Fixed. It is described in the comment that it uses a deterministic algorithm.
It is needed in
where
aux_labelsindecoding_latsis a ragged tensor.This pull-request extends
k2.composeto support that.[EDITED]: Mark it as WIP. Will merge after testing it in snowfall.
Used in k2-fsa/snowfall#106