feat: add dynamic kernel workload tracing (trace-kernel)#11
Open
irvineoy wants to merge 6 commits into
Open
Conversation
Add trace-kernel CLI support for patching Triton launches and Python-visible custom op wrappers, collecting JSONL workload metadata, and aggregating workload ranges. Implement local/Docker overlay injection, container-source patching for Docker benchmarks, Agent fallback wiring, and unit coverage for the required repo cases.
Make module_import trace events unsampled so overlay activation can be reliably diagnosed even with low sample rates. Route aiter-compile-ops tracing through the central aiter.jit.core.compile_ops hook instead of patching only high-level wrapper files. Add static patch coverage for both aiter ctypes and pybind wrapper paths, plus trace-all support for discovering real low-level op names. Extend tracing tests and document trace-kernel usage, Docker overlay behavior, result interpretation, and runnable examples.
Render workload signatures and shape ranges as Markdown tables instead of dense inline dictionaries. Add postprocess coverage to keep the summary output readable for traced tensor input distributions.
| return line_idx | ||
|
|
||
|
|
||
| def patch_triton_launch_file( |
Collaborator
There was a problem hiding this comment.
there are 2 patch_triton_launch_file modules defined. which one is the correct one?
| try: | ||
| yield | ||
| finally: | ||
| os.environ.clear() |
Collaborator
There was a problem hiding this comment.
we might loose all env variables here
| from pathlib import Path | ||
|
|
||
|
|
||
| RUNTIME_SOURCE = r''' |
Collaborator
There was a problem hiding this comment.
is this same code as serializer.py? can we consolidate those?
|
|
||
|
|
||
| def test_required_case_matrix_has_30_cases(): | ||
| assert len(TRACE_TEST_CASES) == 30 |
Collaborator
There was a problem hiding this comment.
having 30 hard coded is that ok?
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| import yaml |
Collaborator
There was a problem hiding this comment.
lets add PyYAML to requirements.txt
Add a generated supported-kernels registry and switch trace-kernel to resolve targets by kernel ID. Include list-trace-kernels for discovery, expand patchability tests across the registry, and document the new flow. Harden tracing for E2E workloads by preserving container module origins, skipping unsafe torch tracing proxies, supporting trace-all wrapper instrumentation, and separating any-event discovery from exact target hits. Also guard aiter compile_ops overlays against uncheckable annotations.
… refresh - allow trace-kernel to accept repeated or comma-separated kernel IDs - patch multiple static trace targets in one workload run - refresh supported kernel registry from benchmark Docker images - record source image provenance and relax local checkout validation - add coverage for multi-target tracing and registry update flows
Extend trace-kernel with a --disable-benchmark-cuda-graph option that rewrites the selected InferenceX benchmark script into a no-cudagraph overlay for Docker benchmark runs. SGLang launch scripts now get --disable-cuda-graph and --disable-piecewise-cuda-graph, while vLLM launch scripts get --enforce-eager. The generated script is bind-mounted through the existing docker wrapper so Magpie and InferenceX stay read-only. Move trace_raw permission setup and benchmark-script override handling into the runner, keep stdout JSON output stable, and add the compact trace summary on stderr. Always generate target_kernel_tensor_shapes.json during trace postprocessing, preserving the broader workload_ranges.json while adding a target-kernel-oriented view for shape analysis. Add a checked-in DeepSeek R1 multi-kernel tracing example script and update README coverage for the new outputs and options. Tests cover SGLang and vLLM script rewrites, docker extra mounts, CLI dry-run propagation, target shape artifact generation, and the new example script syntax.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds the initial
trace-kernelworkflow for dynamic kernel workload tracing.The new workflow can temporarily patch Python-visible kernel launch sites or wrapper calls, run a tracing workload, collect JSONL workload metadata, and aggregate the observed shapes/flags into workload ranges for later kernel optimization.
Key changes
workload_optimizer.py trace-kernelsubcommand with full CLI argument parsingpipeline/kernel_tracing/package: runtime, serializer, patchers, overlay runner, postprocessor, mode detection, and agent fallback harnesskernel[grid](...)calls via AST patchingNew files (15 total)
.gitignoreresults_*/patternpipeline/kernel_tracing/__init__.pypipeline/kernel_tracing/agent_harness.pypipeline/kernel_tracing/mode_detection.pypipeline/kernel_tracing/overlay.pypipeline/kernel_tracing/patch_triton.pypipeline/kernel_tracing/patch_wrapper.pypipeline/kernel_tracing/postprocess.pypipeline/kernel_tracing/runner.pypipeline/kernel_tracing/runtime.pypipeline/kernel_tracing/serializer.pypipeline/kernel_tracing/test_cases.pytests/test_kernel_tracing.pytests/test_kernel_tracing_cases.pyworkload_optimizer.pytrace-kernelsubcommand + arg parserValidation
trace-kernelsmoke test passedkernel_unified_attention_2dNotes