Skip to content

docs: README important-first + architecture SVG + validated MI350/FlyDSL content#2

Merged
jhinpan merged 2 commits into
mainfrom
docs/mi350-flydsl-refresh
Jun 11, 2026
Merged

docs: README important-first + architecture SVG + validated MI350/FlyDSL content#2
jhinpan merged 2 commits into
mainfrom
docs/mi350-flydsl-refresh

Conversation

@jhinpan

@jhinpan jhinpan commented Jun 8, 2026

Copy link
Copy Markdown
Owner

What & why

Restructures the README so the important stuff leads and acknowledgements come last, adds an architecture SVG, and surfaces our first-party MI350X-validated MI350/FlyDSL content (which wasn't in the wiki yet).

README (important-first)

New order: what-it-is → Hardware Scope (MI350/gfx950) → Validated on real silicon → What's Here → Install/Query → Architecture (+ SVG) → Maintenance/Quality Gates → License → Acknowledgements & Citation (moved to the very end).

Architecture diagram

  • docs/architecture.svg — hand-authored three-layer flow (sources/wiki/queries/, gated by data/ + scripts/), embedded in the README Architecture section. Renders on GitHub.

Validated MI350 / FlyDSL content (silicon-measured)

  • sources/refs/ref-flydsl-kernel-profiling.md — new source anchor for the rocprofv3 ATT sweep + GitHub Pages dashboard: 17 FlyDSL kernels profiled on MI350X (ROCm 7.2) vs AITER/CK/hipBLASLt.
  • wiki/kernels/flydsl-flash-attention.md — new page filling a real gap (FlyDSL FA had no page). Generic vs gfx950 dual-wave software-pipelined kernel, the #225→#334→#462→#629→#661 (layout MMA-atom API) arc, measured ~0.92× vs CK-tile, register-pressure-capped occupancy.
  • wiki/languages/flydsl.md — adds the FA / MMA-atom-API note and a "Measured on MI350X" section; FA added to kernel_types/related/sources.
  • data/tags.yamlprofiling, rocprofv3, kernel-profiling, register-pressure misc tags.
  • index.md — lists the new page + a silicon-validation pointer.
  • Regenerated queries/*.md.

Validation

  • python3 scripts/validate.py0 errors (7541 pages)
  • python3 tests/test_validate.pyall pass
  • python3 scripts/generate-indices.py → clean (indices committed)
  • docs/architecture.svg → well-formed XML

🤖 Generated with Claude Code


Open in Devin Review

Summary by Sourcery

Restructure the README to foreground hardware scope, MI350X validation details, architecture overview, and usage, move acknowledgements to the end, and add an embedded architecture diagram. Add first-party MI350X profiling content for FlyDSL, including a new flash-attention kernel page, a profiling reference repo anchor, and updated FlyDSL language docs and indices. Extend tag vocabulary for profiling-related concepts and regenerate query indices to surface the new FlyDSL flash-attention and profiling content across hardware-feature, technique, language, and kernel-type views.

New Features:

  • Introduce a new FlyDSL flash-attention kernel wiki page documenting the generic implementation and gfx950 dual-wave fast path with MI350X performance claims.
  • Add a profiling reference page for the FlyDSL MI350X rocprofv3 ATT sweep and link it from the index, FlyDSL language page, and related query indices.
  • Embed a new SVG architecture diagram describing the sources → wiki → queries pipeline and reference it from the README architecture section.

Enhancements:

  • Reorganize README sections so hardware scope, silicon validation, feature overview, install/query usage, architecture, and maintenance/quality gates are emphasized before acknowledgements and citation.
  • Clarify MI350X validation claims, including runnable examples and hardware re-grounding details, and surface them in both README and the wiki index overview.
  • Update FlyDSL language documentation with flash-attention coverage, MMA-atom API notes, and a "Measured on MI350X" performance summary.
  • Expand query indices by hardware feature, technique, language, and kernel type to include the new FlyDSL flash-attention kernel and profiling reference.
  • Summarize supporting data/ files in README to include hardware verification metadata and clarify optional ROCM_WIKI_ROOT usage.

Documentation:

  • Document the new architecture diagram and three-layer design in README, alongside updated descriptions of contents, tools, and maintenance/quality gates.
  • Extend user-facing FlyDSL documentation with sections on flash attention, the MMA-atom API, and MI350X performance findings from the profiling sweep.

Chores:

  • Extend the controlled vocabulary in data/tags.yaml with profiling-related tags for use across the knowledge base.

…face validated MI350/FlyDSL content

README restructure (important-first; Acknowledgements moved to the very end):
- Lead with what-it-is → Hardware Scope (MI350/gfx950) → "Validated on real silicon"
  → What's Here → Install/Query → Architecture → Maintenance/Quality → License → Acknowledgements.
- Embed a hand-authored architecture diagram (docs/architecture.svg) in the Architecture section.

Validated MI350/FlyDSL content (first-party, MI350X silicon):
- New source anchor sources/refs/ref-flydsl-kernel-profiling.md — the rocprofv3 ATT sweep
  + GitHub Pages dashboard (17 kernels, AITER/CK/hipBLASLt baselines, ROCm 7.2).
- New wiki page wiki/kernels/flydsl-flash-attention.md — generic vs gfx950 dual-wave SWP,
  PR arc #225→#334→#462→#629→#661 (layout MMA-atom API), measured ~0.92x vs CK-tile
  (register-pressure-capped occupancy).
- Augment wiki/languages/flydsl.md with the FA/atom-API note + a "Measured on MI350X" section.
- data/tags.yaml: add profiling/rocprofv3/kernel-profiling/register-pressure misc tags.
- index.md: list the new page + a silicon-validation pointer.
- Regenerated queries/*.md indices.

validate.py: 0 errors. tests/test_validate.py: pass. generate-indices.py: clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sourcery-ai

sourcery-ai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Reviewer's Guide

Restructures the README to front-load hardware scope, MI350X silicon validation, and architecture details (including a new SVG diagram), and adds first-party MI350X-validated FlyDSL/flash-attention content plus corresponding tags and query index updates across the documentation set.

File-Level Changes

Change Details Files
Reordered and expanded README content to emphasize hardware scope, MI350X silicon validation, and architecture (with embedded SVG), while moving acknowledgements to the end and tightening maintenance/quality-gate descriptions.
  • Updated opening description to clarify the project as an agent-queryable Claude Code skill and adjusted wording of the knowledge cutoff block.
  • Moved Hardware Scope section near the top and added explicit note on gfx942 vs gfx950 FP8 incompatibility.
  • Introduced a detailed "Validated on real silicon" section summarizing MI350X verification, runnable examples, and FlyDSL profiling sweep results.
  • Reworked "What's Here" to reflect updated page counts, reference-repo studies, and runnable examples, and referenced VERIFICATION.md explicitly.
  • Embedded a new architecture SVG under the Architecture section and replaced bullet-style architecture description with a centered image and concise bullet list.
  • Simplified maintenance tooling table (removed summarize-diffs script row) and reshaped supporting-files description to point to data/ and references/ directories collectively.
  • Renamed and scoped the Quality Gates section and clarified MI350X verification language while removing redundant wording.
  • Moved acknowledgements/citation content from the top to a new final section and slightly reworded project non-affiliation with AMD/ROCm.
README.md
Added and wired in a new FlyDSL flash-attention kernel page documenting generic and gfx950 dual-wave fast paths, along with MI350X performance characterization and evolution history.
  • Created a new kernel page with full YAML frontmatter describing architectures, tags, kernel_types, languages, hardware_features, techniques, related pages, sources, and a performance_claims block for MI350X.
  • Documented the generic FlashAttention-2 kernel and the gfx950 dual-wave software-pipelined fast path, including dispatch conditions and architectural details (MFMA, LDS usage, wave scheduling).
  • Described the build pattern for FlyDSL kernels (Python+MLIR trace, compile-time metaprogramming) and provided an illustrative builder snippet.
  • Summarized the upstream PR evolution of the FlyDSL flash-attention kernel, including post-cutoff work annotated as such.
  • Explained MI350X performance vs CK-tile baseline and attributed the gap to register-pressure-limited occupancy, with pointers to relevant technique pages.
wiki/kernels/flydsl-flash-attention.md
Extended the FlyDSL language page to mention flash-attention, the layout MMA-atom API, and MI350X profiling results.
  • Updated frontmatter to add flash-attention/attention kernel_types, related flash-attention kernel, and new source ref for FlyDSL profiling.
  • Added a section on flash attention and the MMA-atom API describing dual implementations (generic and gfx950 fast path) and the shift to a layout-based MMA-atom abstraction.
  • Introduced a "Measured on MI350X" section summarizing profiling results across FlyDSL kernels (wins, parity, and headroom cases) and diagnosing register-pressure and cross-lane reduction bottlenecks.
wiki/languages/flydsl.md
Introduced a new reference entry describing the FlyDSL MI350X profiling study and dashboard, and linked it through the indices and index page.
  • Created a new sources/refs entry for FlyDSL kernel profiling with metadata (repo, URL, tags, architectures, languages, retrieved_at) and a narrative summary of method, verdicts, and key findings.
  • Linked the new reference from the site index banner and recommended kernel list to surface the FlyDSL flash-attention page and silicon validation story.
  • Regenerated language-based indices to include the new reference entry wherever FlyDSL-related refs are aggregated.
sources/refs/ref-flydsl-kernel-profiling.md
index.md
queries/by-language.md
Updated query indices to register the new FlyDSL flash-attention kernel under appropriate hardware features, techniques, and kernel types.
  • Added the FlyDSL flash-attention kernel to by-hardware-feature lists (e.g., mfma, vgpr) so it shows up under MFMA/occupancy-related queries.
  • Added it under multiple technique buckets (direct-to-lds, software-pipelining, vgpr-budgeting, wave-reduce, etc.) in by-technique indices.
  • Registered the kernel under flash-attention/attention-related kernel-type groupings in by-kernel-type, alongside the CK-tile and other attention kernels.
queries/by-hardware-feature.md
queries/by-technique.md
queries/by-kernel-type.md
Extended the controlled vocabulary and tagging to capture profiling- and register-pressure-related concepts used by the new content.
  • Added new misc tags for profiling, rocprofv3, kernel-profiling, and register-pressure to the tag schema.
  • Ensured these tags are used in the new FlyDSL profiling reference and FlyDSL flash-attention kernel frontmatter so they validate cleanly.
data/tags.yaml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In kernel-flydsl-flash-attention.md, the performance_claims.value field mixes a numeric and qualitative label ("~0.92x (HEADROOM)"); consider keeping this strictly machine-parseable (e.g., 0.92 or "~0.92") and representing the HEADROOM bucket in a separate field or tag for downstream tooling.
  • The MI350X/FlyDSL profiling summary is now spread across the README, lang-flydsl, kernel-flydsl-flash-attention, and ref-flydsl-kernel-profiling; you may want to pick one page (e.g., the new ref) as the single canonical source of detailed numbers and keep the others as shorter pointers to avoid future drift.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `kernel-flydsl-flash-attention.md`, the `performance_claims.value` field mixes a numeric and qualitative label (`"~0.92x (HEADROOM)"`); consider keeping this strictly machine-parseable (e.g., `0.92` or `"~0.92"`) and representing the HEADROOM bucket in a separate field or tag for downstream tooling.
- The MI350X/FlyDSL profiling summary is now spread across the README, `lang-flydsl`, `kernel-flydsl-flash-attention`, and `ref-flydsl-kernel-profiling`; you may want to pick one page (e.g., the new ref) as the single canonical source of detailed numbers and keep the others as shorter pointers to avoid future drift.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive documentation and profiling data for FlyDSL kernels on real AMD Instinct MI350X silicon (gfx950, ROCm 7.2). Key additions include a new reference page for the FlyDSL kernel profiling sweep, a synthesized wiki page detailing the FlyDSL Flash Attention generic and dual-wave fast paths, and an architecture diagram (docs/architecture.svg) explaining the three-layer structure of the wiki. The README, index, and query indices have been updated accordingly. Feedback on the changes suggests quoting font family names with spaces in the SVG file, removing synthesized wiki pages from the sources metadata list in the new Flash Attention page to maintain architectural separation, and correcting a URL-encoding typo (seq%256==0) in the frontmatter metadata.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread docs/architecture.svg Outdated
Comment thread wiki/kernels/flydsl-flash-attention.md Outdated
Comment thread wiki/kernels/flydsl-flash-attention.md Outdated

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

@jhinpan jhinpan merged commit 27e8218 into main Jun 11, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant