Skip to content

fix: validate sequence length divisibility for zigzag ring attention#57

Open
Bhavyashah20 wants to merge 1 commit into
jzhang38:mainfrom
Bhavyashah20:fix/zigzag-sequence-length-validation
Open

fix: validate sequence length divisibility for zigzag ring attention#57
Bhavyashah20 wants to merge 1 commit into
jzhang38:mainfrom
Bhavyashah20:fix/zigzag-sequence-length-validation

Conversation

@Bhavyashah20

Copy link
Copy Markdown

Found this while reading through the codebase. The zigzag attention
backward pass fails with a cryptic size mismatch error when sequence
length isn't divisible by 2 * world_size. Took me a while to trace
it back to the unequal chunks from torch.chunk().

Added a validation check in extract_local() with a clear error message
pointing users to the fix. Also added a pytest suite since there were
no tests for this module.

  ZigZag ring attention requires seq_len % (2 * world_size) == 0.
  When this constraint is violated, torch.Tensor.chunk() produces unequal
  chunks, causing a size mismatch error in the backward pass.

  Add validation in extract_local() with a clear error message explaining
  the issue and how to fix it. Add pytest suite for prepare_inputs.py.

  Fixes jzhang38#47
@Bhavyashah20

Copy link
Copy Markdown
Author

Bumping this in case it got buried — the zigzag attention backward pass fails with a cryptic size mismatch error when sequence length isn't divisible by 2 * world_size. It took a while to trace back to torch.chunk() producing unequal chunks in that case.

The fix adds an early validation check in extract_local() with a clear error message pointing users to the fix, plus a pytest suite since there were no tests for this path. Let me know if you'd like any changes.

@Bhavyashah20

Copy link
Copy Markdown
Author

Hi, just following up on this — it's been about a month since I opened this. The validation catches a silent failure in zigzag ring attention that's otherwise very hard to debug. Happy to revise if anything looks off. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant