Change graph JSON serialization for repro tool#280
Conversation
ebe0938 to
4e4d199
Compare
4e4d199 to
83fcc7b
Compare
|
@cudnn-ci-bot run |
|
🚀 Running mirror pipeline Branch: cudnn-gh/pr-280-2347b21 |
|
@CodeRabbit review |
✅ Action performedReview finished.
|
📝 WalkthroughWalkthroughThis PR refactors the cuDNN frontend library and repro tool to replace graph_uid with gid identifiers and migrate tensor representation from dict-keyed to tid-indexed lists. Core library changes introduce per-tensor IDs, update JSON serialization format, and modify graph key/logging. The repro tool is comprehensively refactored to support the new tensor list API, ragged detection, and gid-based log parsing with tensor dump injection. ChangesCore Graph and Tensor ID Infrastructure
Repro Tool Tensor Lookup and Payload Validation
Test Infrastructure and Payload Updates
🎯 4 (Complex) | ⏱️ ~60 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
tools/cudnn_repro/tests/test_cudnn_repro_fp8.py (1)
179-186: ⚡ Quick winReplace lambda assignment with a
defstatement.Lambda assignments reduce readability and are flagged by Ruff E731. Use a named function instead.
♻️ Refactor to use def
- payload = lambda tag: {"json_version": "2.0", "gid": 1, "nodes": [{"tag": tag}], "tensors": []} - - assert operations.detect_operation_key(payload("SDPA_FWD")) == "sdpa_fwd" - assert operations.detect_operation_key(payload("SDPA_BWD")) == "sdpa_bwd" - assert operations.detect_operation_key(payload("SDPA_FP8_FWD")) == "sdpa_fp8_fwd" - assert operations.detect_operation_key(payload("SDPA_FP8_BWD")) == "sdpa_fp8_bwd" - assert operations.detect_operation_key(payload("SDPA_MXFP8_FWD")) == "sdpa_fp8_fwd" - assert operations.detect_operation_key(payload("SDPA_MXFP8_BWD")) == "sdpa_fp8_bwd" + def payload(tag): + return {"json_version": "2.0", "gid": 1, "nodes": [{"tag": tag}], "tensors": []} + + assert operations.detect_operation_key(payload("SDPA_FWD")) == "sdpa_fwd" + assert operations.detect_operation_key(payload("SDPA_BWD")) == "sdpa_bwd" + assert operations.detect_operation_key(payload("SDPA_FP8_FWD")) == "sdpa_fp8_fwd" + assert operations.detect_operation_key(payload("SDPA_FP8_BWD")) == "sdpa_fp8_bwd" + assert operations.detect_operation_key(payload("SDPA_MXFP8_FWD")) == "sdpa_fp8_fwd" + assert operations.detect_operation_key(payload("SDPA_MXFP8_BWD")) == "sdpa_fp8_bwd"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/cudnn_repro/tests/test_cudnn_repro_fp8.py` around lines 179 - 186, The test defines payload as a lambda (payload = lambda tag: {...}) which triggers Ruff E731; replace it with a normal named function (def payload(tag): return {...}) keeping the same body and usages so the subsequent asserts calling payload("SDPA_FWD") etc. and operations.detect_operation_key remain unchanged; update only the payload definition in tests/test_cudnn_repro_fp8.py to remove the lambda and use def to satisfy the linter.tools/cudnn_repro/cudnn_repro/log_parser.py (1)
36-41: 💤 Low valueMinor type hint style inconsistency.
This function uses lowercase
tupleandlistfor generics, while_parse_context_entryat line 25 and_apply_tensor_dumpsat line 43 useTuplefrom typing. Consider using a consistent style throughout the module—either the importedTuple/Listor the built-in lowercase forms.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/cudnn_repro/cudnn_repro/log_parser.py` around lines 36 - 41, The return type annotation in _parse_tensor_dump currently mixes built-in generics (tuple[int, list[int]] | None) while other functions like _parse_context_entry and _apply_tensor_dumps use typing.Tuple/typing.List; make the module consistent by changing _parse_tensor_dump's signature to use the same typing forms (e.g., Tuple[int, List[int]] | None) or conversely update the other functions to built-in generics—also update/ensure the appropriate imports (Tuple, List) are present or remove them if switching all to built-ins so all functions (_parse_tensor_dump, _parse_context_entry, _apply_tensor_dumps) use the same style.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/cudnn_repro/cudnn_repro/log_parser.py`:
- Around line 79-86: The tensor-dump accumulator current_dumps should be reset
whenever an execution marker is seen to avoid misattributing dumps; in the
EXECUTE_GRAPH_PATTERN handling block (where gid = int(match.group(1)),
current_entry = graph_entries_by_gid.get(gid) and
execution_entries.append(_apply_tensor_dumps(current_entry, current_dumps)) is
called) move or add the statement current_dumps = {} so it executes
unconditionally when a new execution marker is encountered (i.e., outside the
inner if current_entry is not None), ensuring dumps are cleared regardless of
whether graph_entries_by_gid returned None.
In `@tools/cudnn_repro/cudnn_repro/sdpa_fp8_bwd.py`:
- Around line 144-152: The code returns ragged offsets using
inputs.get("RAGGED_OFFSET_Q") and inputs.get("RAGGED_OFFSET_KV") but misses the
plural key fallback; update the ragged_offset_q and ragged_offset_kv expressions
to try the singular then fall back to the plural keys (e.g. use
inputs.get("RAGGED_OFFSET_Q") or inputs.get("RAGGED_OFFSETS_Q") and similarly
for KV) before passing to utils.tensor_entry/utils.seq_len so behavior matches
sdpa_fp8_fwd.py and sdpa_fwd.extract_seq_and_ragged; keep the other symbols
(payload, tensors, inputs, utils.seq_len, utils.tensor_entry, seed) unchanged.
In `@tools/cudnn_repro/cudnn_repro/sdpa_fp8_fwd.py`:
- Around line 146-154: The function returning seq_len/ragged fields currently
only looks up "RAGGED_OFFSET_Q" and "RAGGED_OFFSET_KV" in the inputs dict and
misses plural keys; update the lookups to fall back to "RAGGED_OFFSETS_Q" and
"RAGGED_OFFSETS_KV" (like sdpa_fwd.extract_seq_and_ragged does) so
ragged_offset_q and ragged_offset_kv use inputs.get("RAGGED_OFFSET_Q") or
inputs.get("RAGGED_OFFSETS_Q") and inputs.get("RAGGED_OFFSET_KV") or
inputs.get("RAGGED_OFFSETS_KV") respectively when calling
utils.tensor_entry/tensor_seq_len, leaving other fields (seq_len_q, seq_len_kv,
rng_data_seed) unchanged.
---
Nitpick comments:
In `@tools/cudnn_repro/cudnn_repro/log_parser.py`:
- Around line 36-41: The return type annotation in _parse_tensor_dump currently
mixes built-in generics (tuple[int, list[int]] | None) while other functions
like _parse_context_entry and _apply_tensor_dumps use typing.Tuple/typing.List;
make the module consistent by changing _parse_tensor_dump's signature to use the
same typing forms (e.g., Tuple[int, List[int]] | None) or conversely update the
other functions to built-in generics—also update/ensure the appropriate imports
(Tuple, List) are present or remove them if switching all to built-ins so all
functions (_parse_tensor_dump, _parse_context_entry, _apply_tensor_dumps) use
the same style.
In `@tools/cudnn_repro/tests/test_cudnn_repro_fp8.py`:
- Around line 179-186: The test defines payload as a lambda (payload = lambda
tag: {...}) which triggers Ruff E731; replace it with a normal named function
(def payload(tag): return {...}) keeping the same body and usages so the
subsequent asserts calling payload("SDPA_FWD") etc. and
operations.detect_operation_key remain unchanged; update only the payload
definition in tests/test_cudnn_repro_fp8.py to remove the lambda and use def to
satisfy the linter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 24bba5a3-7814-4537-8eb9-f6ea51621510
📒 Files selected for processing (20)
include/cudnn_frontend/graph_interface.hinclude/cudnn_frontend/graph_properties.hinclude/cudnn_frontend/node/scaled_dot_product_flash_attention.hinclude/cudnn_frontend/node_interface.hinclude/cudnn_frontend/utils/serialize.htools/cudnn_repro/cudnn_repro/log_parser.pytools/cudnn_repro/cudnn_repro/operations.pytools/cudnn_repro/cudnn_repro/sdpa_bwd.pytools/cudnn_repro/cudnn_repro/sdpa_fp8_bwd.pytools/cudnn_repro/cudnn_repro/sdpa_fp8_fwd.pytools/cudnn_repro/cudnn_repro/sdpa_fwd.pytools/cudnn_repro/cudnn_repro/utils.pytools/cudnn_repro/tests/helpers.pytools/cudnn_repro/tests/test_cudnn_repro_bwd.pytools/cudnn_repro/tests/test_cudnn_repro_cli.pytools/cudnn_repro/tests/test_cudnn_repro_closed_loop.pytools/cudnn_repro/tests/test_cudnn_repro_fp8.pytools/cudnn_repro/tests/test_cudnn_repro_graph_uid.pytools/cudnn_repro/tests/test_cudnn_repro_log_parser.pytools/cudnn_repro/tests/test_cudnn_repro_schema.py
💤 Files with no reviewable changes (1)
- tools/cudnn_repro/tests/test_cudnn_repro_graph_uid.py
| std::optional<pass_by_values_t> compile_time_constant_value = std::nullopt; | ||
|
|
||
| TensorReordering_t reordering_type = TensorReordering_t::NONE; | ||
| tid_t tid = 0; |
There was a problem hiding this comment.
What is tid ? What is it used for?
There was a problem hiding this comment.
Tensor ID. It's an immutable ID that's generated on creation and cannot be changed.
The reason we do this is when we dump the graph JSON at compilation time, the UIDs actually change from the UIDs that are seen at execution time.
@CodeRabbit ignore
Summary
gidand tensor-localtidreferencesuidfields on tensorsRAGGED_OFFSET_*tensor refs as well as serialized tensor metadataVerification
PYTHONPATH=tools/cudnn_repro uv run --no-project --with pytest --with looseversion python -m pytest -q tools/cudnn_repro/tests(31 passed, 30 skipped)cmake --build /tmp/cudnn2-fe-codex-build --target tests -j 16CUDA_VISIBLE_DEVICES=2 /tmp/cudnn2-fe-codex-build/bin/tests "[graph][serialize]"d_qk=128,s_kv=7800,h_k=40,h_v=40Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests