-
Notifications
You must be signed in to change notification settings - Fork 9
[Perf] CUDA graph 2: graph_do_while #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
148 commits
Select commit
Hold shift + click to select a range
61b5b36
Add CUDA graph MVP for multi-task kernels
hughperkins 49ce3c1
bug fixes for cuda graph
hughperkins 9c32a28
Add per-kernel @qd.kernel(cuda_graph=True) API
hughperkins cffb9ae
Add cross-platform test for cuda_graph=True annotation
hughperkins ed1cff9
Handle argument changes in CUDA graph replay
hughperkins 85dc8db
Fix formatting and disable cuda_graph on adjoint kernels
hughperkins d9ca32a
Add graph_while conditional nodes for GPU-side iteration loops
hughperkins d6cbd15
Fix graph_while arg_id for struct parameters and add cross-platform f…
hughperkins 0573c12
Add static_assert on CudaGraphNodeParams size to catch ABI drift
hughperkins 7fd81d3
Add compute capability check for graph_while (requires SM 9.0+)
hughperkins 9c75cee
Use CUDA_HOME/CUDA_PATH env vars to find libcudadevrt.a
hughperkins 7f80b72
Restore documentation comments removed during cuda-graph refactor
hughperkins 7762fd9
Add CUDA graph documentation and do-while semantics warning
hughperkins 47d59dc
Apply clang-format to kernel_launcher.h static_assert
hughperkins ad4eab6
Fix lint: formatting (black, clang-format, ruff)
hughperkins e00fc15
Fix clang-format whitespace in kernel_launcher.cpp
hughperkins 9bcc487
Merge branch 'hp/cuda-graph-mvp-1-graph-build' into hp/cuda-graph-mvp…
hughperkins 0031619
Reject cuda_graph=True on kernels with struct return values
hughperkins 792ff34
Add test for cuda_graph with different-sized arrays
hughperkins 334c2e8
Restore comments removed during cuda graph refactor
hughperkins 8f56ffd
Add test for cuda_graph after qd.reset()
hughperkins 5dd2d66
Merge branch 'hp/cuda-graph-mvp-1-graph-build' into hp/cuda-graph-mvp…
hughperkins f8ff3ee
Fix graph_while cache staleness when counter ndarray changes
hughperkins 96b43de
Validate graph_while parameter name at decoration time
hughperkins 8caa42c
Merge remote-tracking branch 'origin/main' into hp/cuda-graph-mvp-1-g…
hughperkins 501362f
Add CUDA graph documentation page
hughperkins 517d3db
Expose CUDA graph cache size for test observability
hughperkins da3ff27
Add get_cuda_graph_cache_used_on_last_call() for test observability
hughperkins a2abceb
Add cache size and cache used assertions to all CUDA graph tests
hughperkins 720f5d8
Inline expected cache size in cross-platform test assertion
hughperkins f158fd4
Run all CUDA graph tests on all platforms
hughperkins a8e6b8f
update doc
hughperkins dd4f48b
Add comment documenting resolve_ctx_ndarray_ptrs contract
hughperkins aa08442
Add comment explaining contexts_ population in graph path
hughperkins 98bf081
Add comment explaining single-task graph fallback guard
hughperkins 7b18674
Add comment explaining resolve_ctx_ndarray_ptrs fallback check
hughperkins 9907333
Add comment explaining kernelParams vs extra in graph node params
hughperkins 6ff327e
Add comment explaining graph_exec field in CachedCudaGraph
hughperkins 6796baf
Add comment explaining cuda_graph_cache_ key
hughperkins 1d4ebef
Parametrize test_cuda_graph_changed_args over ndarray and field
hughperkins a4cfdc3
Parametrize all CUDA graph tests over ndarray and field
hughperkins 2db1b05
Parametrize test_cuda_graph_different_sizes over ndarray and field
hughperkins 7775a4e
Merge branch 'main' into hp/cuda-graph-mvp-1-graph-build
hughperkins f1d397a
Merge remote-tracking branch 'origin/hp/cuda-graph-mvp-1-graph-build'…
hughperkins f5ff0af
Rename cuda_graphs.md to cuda_graph.md
hughperkins a91878b
merge doc from pr 1
hughperkins cd23e79
Use index.md from cuda-graph-mvp-1 branch
hughperkins aad0dd9
fix up merge
hughperkins d559a92
Use [()] instead of [None] in CUDA graph docs
hughperkins 712846f
ndarray vs field
hughperkins a14a072
Rename graph_while to graph_do_while
hughperkins aa4dd52
add caveats to doc
hughperkins 44781b8
Add comments to AMDGPU graph_do_while fallback code
hughperkins c63e201
Remove graph_do_while fallback, require CUDA with SM 9.0+
hughperkins 1605188
Remove cross-backend graph_do_while tests
hughperkins 9fbb433
Allow cuda_graph=True for single-task kernels
hughperkins ae8d5a9
Remove test_cuda_graph_single_loop test
hughperkins 398a101
Merge branch 'hp/cuda-graph-mvp-1-graph-build' into hp/cuda-graph-mvp…
hughperkins aa82bcb
Fix cuda-cudart-dev package name in GPU workflow
hughperkins 4de7a64
Fix formatting (black + clang-format)
hughperkins b737cce
Simplify cuda_graph doc caveats section
hughperkins 81f580f
Fix wording: "Older CUDA GPUs"
hughperkins 8333f14
Update kernel_impl.py docstring: non-CUDA not supported
hughperkins aedac28
Add comments to JIT linker function declarations
hughperkins a7a3ad9
Add comments to graph_do_while condition kernel PTX
hughperkins cc64157
Improve comments in graph_do_while condition kernel PTX
hughperkins e235b23
Assert no gradient pointers in cuda_graph path
hughperkins ce09e5e
Add /*name=*/ comment to link_add_data call
hughperkins ea83519
Add description comment to ensure_condition_kernel_loaded
hughperkins 0301b76
Throw error if graph_do_while condition ndarray changes between calls
hughperkins 3402fae
Extract add_conditional_while_node from launch_llvm_kernel_graph
hughperkins 0603fd5
Add comments to link_state and conditional graph structure
hughperkins 03e8142
Error instead of fallback when graph_do_while has host-resident ndarrays
hughperkins 0c481e8
Extract add_kernel_node helper to deduplicate graph kernel node creation
hughperkins e223689
Add comment explaining why condition kernel must be last in body graph
hughperkins ff2d2ab
Add comment for conditional node in body graph
hughperkins b11fe26
Add comment explaining cached graph_do_while_flag_dev_ptr
hughperkins ee5b0b8
Extract CudaGraphManager from KernelLauncher into separate class
hughperkins 54c8bf0
Fix awkward string literal split in QD_TRACE
hughperkins cf86442
Extract launch_cached_graph from try_launch
hughperkins ddb552e
Extract CudaGraphManager from KernelLauncher into separate class
hughperkins 76181bf
Make on_cuda_device a free function shared by both launch paths
hughperkins f6d531e
Make on_cuda_device a free function shared by both launch paths
hughperkins 3fc7a9a
Move on_cuda_device to cuda_context where it belongs
hughperkins 683194d
Move on_cuda_device to cuda_context where it belongs
hughperkins c17e59c
Move on_cuda_device to runtime/cuda/cuda_utils
hughperkins ccf9a37
Move on_cuda_device to runtime/cuda/cuda_utils
hughperkins 3411acb
Merge hp/cuda-graph-mvp-1-graph-build into hp/cuda-graph-mvp-2-graph-…
hughperkins 2e42c12
Extract resolve_device_alloc_ptr helper to deduplicate DeviceAllocati…
hughperkins c94b41c
Extract resolve_device_alloc_ptr helper to deduplicate DeviceAllocati…
hughperkins 55f03c1
Revert "Extract resolve_device_alloc_ptr helper to deduplicate Device…
hughperkins 680c7dc
Revert "Extract resolve_device_alloc_ptr helper to deduplicate Device…
hughperkins 4f78c21
Error on gradient pointers in cuda_graph path instead of silently res…
hughperkins e0200e5
Add comment explaining scalar parameter skip in resolve_ctx_ndarray_ptrs
hughperkins 2f411a5
Merge hp/cuda-graph-mvp-1-graph-build into hp/cuda-graph-mvp-2-graph-…
hughperkins 05a7e4f
Clarify that fields are template parameters and not handled here
hughperkins ff5d021
Merge hp/cuda-graph-mvp-1-graph-build into hp/cuda-graph-mvp-2-graph-…
hughperkins a55c234
Re-add comments lost during merge conflict resolution
hughperkins b73dfb8
Add comment explaining resolved_data variable
hughperkins 34f685c
Add comment noting cache_size and used_on_last_call are for tests
hughperkins bf41337
Merge branch 'hp/cuda-graph-mvp-1-graph-build' into hp/cuda-graph-mvp…
hughperkins e88fad4
Apply clang-format
hughperkins 51f898f
Apply clang-format
hughperkins caefc2b
Merge hp/cuda-graph-mvp-1-graph-build into hp/cuda-graph-mvp-2-graph-…
hughperkins 90639dc
Add comment explaining why CudaGraphNodeParams is defined locally
hughperkins e9d4af4
Add comment explaining CudaGraphNodeParams vs CudaKernelNodeParams
hughperkins 359c7d8
Rename increment_loop to graph_loop in test_graph_do_while_counter
hughperkins 99889ee
Remove unnecessary qd.sync() calls from do-while tests
hughperkins b57582a
Add second call with different counter to test_graph_do_while_counter
hughperkins 844a454
Add second call to all do-while tests to verify graph reuse
hughperkins 4a62b03
Use different values on second call in do-while tests
hughperkins bf9e9ba
Make threshold a runtime ndarray parameter in boolean done test
hughperkins ec0daca
Pass threshold as scalar int instead of ndarray
hughperkins 5a2a41a
Remove comment
hughperkins 2bdb112
Remove redundant test_graph_do_while_replay
hughperkins 955413d
Simplify changed-condition-ndarray test
hughperkins 3dfcf8a
Replace [None] with [()] in do-while tests
hughperkins f175378
Error instead of fallback when cuda_graph gets host-resident arrays
hughperkins 857b59f
Merge branch 1: error on host-resident arrays in cuda_graph
hughperkins b222dd1
Align autograd check and libcudadevrt error message with branch 3
hughperkins 339b084
Reorder use_graph_do_while declaration to match branch 3
hughperkins dd480d9
Fix clang-format indentation in QD_ERROR_IF
hughperkins 7206fcd
Merge branch 'hp/cuda-graph-mvp-1-graph-build' into hp/cuda-graph-mvp…
hughperkins 7964d82
Fix macro parse error: avoid brace-init-list inside QD_ERROR_IF
hughperkins d2563b9
Skip graph_do_while tests on SM < 90
hughperkins 0c33d05
Revert "Skip graph_do_while tests on SM < 90"
hughperkins a5ebc33
xfail graph_do_while tests on SM < 90
hughperkins 32b1341
Add num_offloaded_tasks query for compiled kernels
hughperkins 2c1464b
Expose CUDA graph node count for test assertions
hughperkins 470b073
Add multi-func cuda graph test with 9 offloaded tasks
hughperkins 3079912
Merge branch 'hp/cuda-graph-mvp-1-graph-build' into hp/cuda-graph-mvp…
hughperkins 70aac93
Change graph_do_while syntax from decorator param to in-kernel while …
hughperkins 791bab4
Remove implicit cuda_graph=True note from docs
hughperkins ee9178c
Require cuda_graph=True for graph_do_while instead of implicitly enab…
hughperkins 6623375
Update graph_do_while docstring to reflect SM 9.0+ only support
hughperkins b2d375e
Add tests for graph_do_while syntax errors
hughperkins 6748e17
Apply black formatting to ast_transformer.py
hughperkins 2996cb9
Fix offloaded tasks assertions to use >= for x64 ndarray compatibility
hughperkins e6d5adc
Merge branch 'hp/cuda-graph-mvp-1-graph-build' into hp/cuda-graph-mvp…
hughperkins b39c3a9
Fix cuda graph tests: derive expected node count from offloaded tasks
hughperkins d750b1d
Merge branch 'hp/cuda-graph-mvp-1-graph-build' into hp/cuda-graph-mvp…
hughperkins 3497560
Add graph_do_while to public API test list
hughperkins 31b73a6
Merge branch 'main' into hp/cuda-graph-mvp-1-graph-build
hughperkins 34de2b4
Merge branch 'hp/cuda-graph-mvp-1-graph-build' into hp/cuda-graph-mvp…
hughperkins a32efba
Merge origin/main into hp/cuda-graph-mvp-2-graph-while
hughperkins 1bdd202
Fix end-of-file newline in env.sh
hughperkins df0f753
Remove env.sh from git and add to .gitignore
hughperkins 8d10a35
Merge remote-tracking branch 'origin/main' into hp/cuda-graph-mvp-2-g…
hughperkins File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -96,3 +96,4 @@ imgui.ini | |
| stubs/ | ||
| CHANGELOG.md | ||
| python/quadrants/_version.py | ||
| env.sh | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -73,8 +73,18 @@ PER_CUDA_FUNCTION(import_external_semaphore, cuImportExternalSemaphore,CUexterna | |
| // Graph management | ||
| PER_CUDA_FUNCTION(graph_create, cuGraphCreate, void **, uint32); | ||
| PER_CUDA_FUNCTION(graph_add_kernel_node, cuGraphAddKernelNode, void **, void *, const void *, std::size_t, const void *); | ||
| PER_CUDA_FUNCTION(graph_add_node, cuGraphAddNode, void **, void *, const void *, std::size_t, void *); | ||
| PER_CUDA_FUNCTION(graph_instantiate, cuGraphInstantiate, void **, void *, void *, char *, std::size_t); | ||
| PER_CUDA_FUNCTION(graph_launch, cuGraphLaunch, void *, void *); | ||
| PER_CUDA_FUNCTION(graph_destroy, cuGraphDestroy, void *); | ||
| PER_CUDA_FUNCTION(graph_exec_destroy, cuGraphExecDestroy, void *); | ||
| PER_CUDA_FUNCTION(graph_conditional_handle_create, cuGraphConditionalHandleCreate, void *, void *, void *, uint32, uint32); | ||
|
|
||
| // JIT linker (for loading condition kernel with cudadevrt) | ||
| PER_CUDA_FUNCTION(link_create, cuLinkCreate_v2, uint32, void *, void *, void **); | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lets have some comments describing eacch fo these |
||
| PER_CUDA_FUNCTION(link_add_data, cuLinkAddData_v2, void *, uint32, void *, std::size_t, const char *, uint32, void *, void *); | ||
| PER_CUDA_FUNCTION(link_add_file, cuLinkAddFile_v2, void *, uint32, const char *, uint32, void *, void *); | ||
| PER_CUDA_FUNCTION(link_complete, cuLinkComplete, void *, void **, std::size_t *); | ||
| PER_CUDA_FUNCTION(link_destroy, cuLinkDestroy, void *); | ||
| PER_CUDA_FUNCTION(module_load_data, cuModuleLoadData, void **, const void *); | ||
| // clang-format on | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise error if not in cuda graph arlreayd