Fix implicit padding in split_between_processes when apply_padding=False and num_samples < num_processes#4052
Open
3manifold wants to merge 1 commit into
Open
Conversation
3manifold
commented
May 27, 2026
| result = inputs[-1:] | ||
| else: | ||
| result = inputs[start_index:end_index] | ||
| result = inputs[start_index:end_index] |
Contributor
Author
There was a problem hiding this comment.
Python slicing returns [] naturally for out-of-bounds, which is the correct behavior when apply_padding=False.
3manifold
commented
May 27, 2026
| result = pad_across_processes(tensorized_result, pad_index=inputs[-1]) | ||
| else: | ||
| result += [result[-1]] * (num_samples_per_process + (1 if num_extras > 0 else 0) - len(result)) | ||
| result += [inputs[-1]] * (num_samples_per_process + (1 if num_extras > 0 else 0) - len(result)) |
Contributor
Author
There was a problem hiding this comment.
Use inputs[-1] instead of result[-1]: result may be empty when this process received no samples.
3manifold
commented
May 27, 2026
| if end_index > len(inputs): | ||
| end_index = len(inputs) | ||
| result_idcs = list(range(start_index, end_index)) | ||
| clamped_start = min(start_index, len(inputs)) |
Contributor
Author
There was a problem hiding this comment.
Clamp both indices to valid range without artificially inflating start_index to len-1 when apply_padding is False.
3manifold
commented
May 27, 2026
| result_idcs = list(range(clamped_start, clamped_end)) | ||
| if apply_padding: | ||
| result_idcs += [end_index - 1] * ( | ||
| result_idcs += [len(inputs) - 1] * ( |
Contributor
Author
There was a problem hiding this comment.
Pad using the last real element (len(inputs)-1), not end_index-1 which can be wrong when result_idcs is empty.
…ding flag is disabled
3ce7d52 to
e81785a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Function
split_between_processesapplies implicit padding even ifapply_padding=False. This results to redundant calculations during distributed inference since the same inference (i.e. prompt) is called twice etc. Apart from wasting resources, this can cause conflicts among competing processes, for instance, disk I/O.The same bug is duplicated across two code branches of the same function. Both stem from the identical incorrect assumption — "a process always receives at least one element" — and both produce identical wrong behaviour (silent padding when
apply_padding=False). The only difference is syntax:result = inputs[-1:]start_index = len(inputs) - 1How to reproduce
Example with 1 sample and 2 GPUs
inputs[0:1]→[sample_0]✓start_index >= len(inputs)triggers! →inputs[-1:]=[sample_0]✗Process 1's
start_index(1) equalslen(inputs)(1), triggering the guard. Instead of returning an empty slice, it silently returns the last element — regardless ofapply_padding.resolves #4053
Who can review?
@BenjaminBossan @SunMarc