Skip to content

Add CI testcases and benchmark for allreduce#387

Merged
coderfeli merged 8 commits intomainfrom
allreduce
Apr 16, 2026
Merged

Add CI testcases and benchmark for allreduce#387
coderfeli merged 8 commits intomainfrom
allreduce

Conversation

@yanboshao
Copy link
Copy Markdown
Contributor

@yanboshao yanboshao commented Apr 13, 2026

Motivation

  1. rm mem_ops module
  2. add CI testcase of allreduce

Technical Details

Test Plan

Test the performance of allreduce on MI325.

Test Result

Submission Checklist

Comment thread .github/workflows/flydsl.yaml Fixed
@yanboshao yanboshao changed the title Allreduce Add CI testcases and benchmark for allreduce Apr 13, 2026
@coderfeli
Copy link
Copy Markdown
Collaborator

coderfeli commented Apr 13, 2026

[rank=7] Error: [rank=7] cudagraph max_err=7.845e+00 >= atol=0.15
File "/tmp/flydsl-main/tests/kernels/test_allreduce.py", line 458, in _dist_worker
assert max_err < atol, f"[rank={rank}] cudagraph max_err={max_err:.3e} >= atol={atol}"
^^^^^^^^^^^^^^
File "/tmp/flydsl-main/tests/kernels/test_allreduce.py", line 458, in _dist_worker
assert max_err < atol, f"[rank={rank}] cudagraph max_err={max_err:.3e} >= atol={atol}"
^^^^^^^^^^^^^^

Are these expected?

@coderfeli
Copy link
Copy Markdown
Collaborator

Too many logs in benchmark. Difficult to figure out whether regressions exist.

@yanboshao yanboshao closed this Apr 14, 2026
@yanboshao
Copy link
Copy Markdown
Contributor Author

Too many logs in benchmark. Difficult to figure out whether regressions exist.

image

Comment thread .github/workflows/flydsl.yaml Fixed
coderfeli and others added 3 commits April 15, 2026 19:24
…om buffer_load

- vector.py: replace hardcoded _KDYNAMIC magic number with
  ir.ShapedType.get_dynamic_size() (consistent with moe_gemm_2stage.py)
- buffer_ops.py: remove offset_is_bytes param from buffer_load, callers
  should pass element offsets (buffer_load already scales by sizeof(dtype))
- custom_all_reduce_kernel.py: use _ELEMS_PER_PACK (4) element offsets
  instead of _BYTES_PER_PACK (16) byte offsets

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderfeli coderfeli merged commit 6e635c6 into main Apr 16, 2026
15 of 16 checks passed
@coderfeli coderfeli deleted the allreduce branch April 16, 2026 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants