Skip to content

Pr/llm kernel assistant#348

Open
gujialiang123 wants to merge 5 commits intomainfrom
pr/llm-kernel-assistant
Open

Pr/llm kernel assistant#348
gujialiang123 wants to merge 5 commits intomainfrom
pr/llm-kernel-assistant

Conversation

@gujialiang123
Copy link
Copy Markdown
Collaborator

Summary
Adds an optional AI kernel debug assistant to the visualizer: users can run an initial trace-aware analysis, then ask follow-up questions grounded in recorded ops, source snippets, and a compact compute trace. Configuration follows the rest of the stack (local JSON, env, and programmatic setup) without putting secrets in the UI.

What’s included
Backend (Flask)

GET/POST /api/llm/config — inspect / merge effective settings (no API key echoed).
POST /api/llm/analyze — first-pass bug-oriented summary over trace + code context.
POST /api/llm/chat — follow-up Q&A after analysis is ready.
Supporting endpoints: records snapshot, single record, prompt template preview.
Client & config

OpenAI-compatible Chat Completions client (llm_utils.py).
Layered config: defaults → llm_config.local.json → optional setup_llm(config_path=...) → env (TRITON_VIZ_LLM_*, OPENAI_API_KEY) → setup_llm(**kwargs) / POST /api/llm/config.
Package exports: setup_llm, setup_llm_from_file, clear_llm_setup, LLM_SETUP_KEYS.
CLI: optional --llm-api-key / --llm-base-url before launch.
Frontend

Floating AI Assistant panel in index.html: Start analysis, chat input, English copy only for LLM UI strings.
Prompts

visualizer/prompts/system_default.md — English system instructions focused on concrete kernel debugging.
Examples

examples/LLMtest/ — small intentionally buggy kernels + README for smoke-testing the assistant.
Repo hygiene

.gitignore entries for local LLM config and optional debug log (e.g. llm_config.local.json, llm_chat_debug.jsonl).
How to try it
Configure API access (e.g. llm_config.local.json from llm_config.example.json, or triton_viz.setup_llm(...) before launch(), or env vars).
Run a traced script and triton_viz.launch(...).
Open AI Assistant → Start analysis → then ask questions.
See examples/LLMtest/README.md for minimal runnable examples.

Notes for reviewers
No secrets in git: example config only; real keys stay local / env / runtime setup.
POST /api/llm/config does not accept config_path (file path is Python-only via setup_llm).
LLM UI and prompt templates are English-only by design.

Gujiang Liang added 2 commits March 20, 2026 00:30
- Flask routes: /api/llm/records, record, prompt, chat, analyze
- Record store, OpenAI-compatible client, prompt templates
- index.html: AI Assistant panel and wiring
- Ignore llm_config.local.json and llm_chat_debug.jsonl
- examples: fix @triton_viz.trace(client=Tracer()) for flip/matmul/histogram
- examples/LLMtest: small buggy kernels for assistant smoke tests

Made-with: Cursor
- Export setup_llm, setup_llm_from_file, clear_llm_setup, LLM_SETUP_KEYS
- CLI: optional --llm-api-key / --llm-base-url before launch
- Extended POST/GET /api/llm/config; layered config with config_path
- LLM chat panel strings in English
- LLMtest: inline _LL_CONFIG_PATH / _LL_API_KEY, drop preflight helper

Made-with: Cursor
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 20, 2026

Sanitizer Performance Benchmark

Benchmark main (min) PR (min) Change
gemm 0.184s 0.183s -0.6%
gemm_oob 0.193s 0.191s -0.7%
indirect_load 0.294s 0.295s +0.2%
nested_loop 0.375s 0.371s -1.1%
block_pointer_loop_advance 0.187s 0.187s -0.1%
liger_jsd 0.150s 0.150s -0.2%
flaggems_layernorm 0.461s 0.458s -0.6%
swiglu 0.184s 0.184s -0.1%
cross_entropy 0.172s 0.172s -0.3%
fused_linear_jsd 0.228s 0.225s -0.9%
Total 2.429s 2.416s -0.5%

Iterations: 1 warmup + 20 measured

@mark14wu
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1aaa45d380

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mark14wu
Copy link
Copy Markdown
Collaborator

Code review

Found 6 issues:

  1. Hardcoded developer filesystem path in example file. _LL_CONFIG_PATH is set to /home/jgu7/work/triton-viz/... instead of "" like the other two example files. This makes the example non-functional for all other users.

# Optional: visualizer LLM — set one or both before running.
_LL_CONFIG_PATH = "/home/jgu7/work/triton-viz/triton_viz/visualizer/llm_config.local.json" # e.g. "/path/to/llm.json" (same shape as llm_config.example.json)
_LL_API_KEY = "" # e.g. "sk-..." if the key is not in that file

  1. DEFAULT_MODEL = "gpt-5-mini" is not a real OpenAI model name. This will cause every default LLM call to fail with a model-not-found error. Likely should be "gpt-4o-mini" or another valid model.

DEFAULT_BASE_URL = "https://api.openai.com/v1"
DEFAULT_MODEL = "gpt-5-mini"
PROMPTS_DIR = os.path.join(os.path.dirname(__file__), "prompts")

  1. activeOpUuid property does not exist on the active block object. window.__tritonVizActiveBlock is an OpWorkspaceBlock instance which has no activeOpUuid field. As a result, window.__tritonVizCurrentOp is always undefined, and context-aware chat always receives uuid: null, defeating the "current selected op" feature.

const active = window.__tritonVizActiveBlock;
if (active && active.activeOpUuid) {
window.__tritonVizCurrentOp = active.activeOpUuid;
}

  1. JSON truncation at character boundaries produces malformed JSON. text[:LLM_SYS_CONTEXT_MAX_CHARS] (and similar) slices the serialized JSON string at an arbitrary character offset, producing syntactically invalid JSON that is sent to the LLM as context. The truncation should happen on the data structure before serialization, not on the serialized string.

if len(text) > LLM_SYS_CONTEXT_MAX_CHARS:
text = text[:LLM_SYS_CONTEXT_MAX_CHARS]
return "Kernel run records summary (JSON, truncated due to size limit): " + text

  1. Unauthenticated POST /api/llm/config endpoint accepts api_key. There is no authentication or CSRF protection. When share=True, the visualizer is network-accessible, allowing any reachable host to overwrite the in-memory API key.

"model": cfg.model,
"timeout_sec": cfg.timeout_sec,
"max_tokens": cfg.max_tokens,
"debug_log_enabled": cfg.debug_log_enabled,
"llm_setup_file": setup_basename,
"allowed_keys": sorted(LLM_SETUP_KEYS),
}
)
@app.route("/api/llm/config", methods=["POST"])
def post_llm_config():
"""
Merge JSON fields into the in-memory LLM setup (same as ``triton_viz.setup_llm``).
Allowed keys: same as ``LLM_SETUP_KEYS`` (see GET response). Does **not** accept
``config_path`` (use ``setup_llm(config_path=...)`` in Python before ``launch``).
Pass ``null`` or ``\"\"`` for string fields to clear that patch entry.
"""
data = request.json or {}
payload = {k: data[k] for k in data if k in LLM_SETUP_KEYS}
if not payload:
return (
jsonify(
{
"error": f"Provide at least one of: {sorted(LLM_SETUP_KEYS)}",

  1. debug_log_path accepted via unauthenticated HTTP endpoint. The POST /api/llm/config endpoint also accepts debug_log_path, which calls os.makedirs and open() on the user-supplied path. This allows unauthenticated callers to create directories and write files at arbitrary filesystem locations.

return value.strip().lower() in {"1", "true", "yes", "on"}
return False
def _resolve_debug_log_path(path_value: Any) -> str:
text = str(path_value or "").strip()
if not text:
return os.path.join(os.path.dirname(__file__), DEFAULT_DEBUG_LOG_NAME)
if os.path.isabs(text):
return text

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants