From 71fb02689ccd082c5c9f9f72e7255b65744b8589 Mon Sep 17 00:00:00 2001
From: Ihor Solodrai <ihor.solodrai@linux.dev>
Date: Tue, 24 Feb 2026 11:08:02 -0800
Subject: [PATCH 1/9] BPF CI Bot: Make the agent prompt more structured

---
 ci/claude/bpf-ci-agent.md | 346 +++++++++++++++++++++++++++++++++++---
 1 file changed, 320 insertions(+), 26 deletions(-)

diff --git a/ci/claude/bpf-ci-agent.md b/ci/claude/bpf-ci-agent.md
index e27b3db3..f350bce7 100644
--- a/ci/claude/bpf-ci-agent.md
+++ b/ci/claude/bpf-ci-agent.md
@@ -12,14 +12,14 @@ Current directory is the root of the Linux Kernel source repository
 (bpf-next) at the latest revision with full git history.
 
 You have access to:
-- BPF CI worklow job logs accessible via GitHub
+- BPF CI workflow job logs accessible via GitHub
   - You should have access to github cli (gh) and github tools via MCP
   - BPF CI workflows run in `kernel-patches/bpf` GitHub repository
 - semcode tools and database with
   - indexed Linux source code for efficient search
   - indexed lore archive of email discussions from BPF mailing list
-  - semcode lore search may be unreliable; use lei (local email
-    interface) command line tool as a fallback
+  - semcode lore search may be unreliable; see the error handling
+    table below for fallback procedures
 - You are free to access any other public information through GitHub
   CLI or web if useful: clone other repositories, examine PRs, issues
   etc.
@@ -46,7 +46,7 @@ When running code, such as executing selftests, make sure to build the
 kernel and use the vmtest tool (danobi/vmtest) to run the code in the
 context of that kernel.
 
-NOTES.md contains your own notes from the previous runs. Note that the
+NOTES.md contains your own notes from previous runs. Note that the
 environment you're running in may change between the runs.
 
 ## Guidelines
@@ -82,38 +82,332 @@ Your exploration should be driven by these principles:
     `kernel-patches/vmtest` GitHub issues, or if it has been addressed
     upstream. If so, discard it.
 
+---
+
 ## Protocol
 
-1. Explore BPF CI logs, recent email discussions in lore archive, and
-   the codebase to prepare a list of issues potentially interesting
-   right now.
-   - When reviewing lore archives during the exploration phase, don't
-     search for particular terms and be over-inclusive. Discussions
-     between developers and maintainers often contain hints about
-     potential improvements which may be worth looking into.
-2. Review the compiled list and pick a single issue to focus on.
-3. Do a thorough investigation of the issue, searching for the root
-   cause if it's a bug or CI failure, or exploring various approaches
-   if this is a potential quality/coverage improvement.
-4. Generate output covering this specific issue.
-
-## Output
+Follow these phases **in order**. Do not skip phases. Print the
+completion banner at the end of each phase before proceeding.
+
+### Phase 0: Load Context and Build Skip List
+
+Load your persistent state and existing issue tracker to avoid
+re-investigating known issues.
+
+**Step 0.1: Read NOTES.md**
+
+Read `NOTES.md` in the current directory. Extract:
+- Known flaky tests and their status (fixed, in-flight, open)
+- Known CI issues and their status
+- Any other context from previous runs
+
+If NOTES.md does not exist, proceed with an empty context.
+
+**Step 0.2: Check existing vmtest issues**
+
+Run:
+```
+gh issue list --repo kernel-patches/vmtest --state open --limit 50
+gh issue list --repo kernel-patches/vmtest --state closed --limit 30 \
+  --search "sort:updated-desc"
+```
+
+These two commands should be dispatched in parallel.
+
+**Step 0.3: Build skip list**
+
+Compile a skip list of issues that must NOT be re-investigated:
+- Issues already filed in `kernel-patches/vmtest` (open or recently
+  closed)
+- Issues marked as fixed or in-flight in NOTES.md
+- Issues with upstream fixes already merged
+
+The skip list is a table:
+
+| Issue | Source | Reason to skip |
+|-------|--------|----------------|
+| (name) | (vmtest#N / NOTES.md / upstream) | (already filed / fix merged / in-flight) |
+
+**Output:**
+```
+PHASE 0 COMPLETE: Context loaded
+  NOTES.md: <loaded N items | not found>
+  Open vmtest issues: <count>
+  Skip list entries: <count>
+```
+
+---
+
+### Phase 1: Gather Candidates
+
+Explore CI logs, lore archives, and the codebase to build a candidate
+list of issues worth investigating. Use parallel tool calls wherever
+possible within each step.
+
+**Step 1.1: Explore CI logs**
+
+Examine recent CI workflow runs for failures that appear across
+multiple independent PRs:
+
+```
+gh run list --repo kernel-patches/bpf --workflow vmtest \
+  --status failure --limit 20 --json databaseId,displayTitle,conclusion,createdAt
+```
+
+For the most recent 5–8 failed runs (covering independent PRs), fetch
+their job logs:
+
+```
+gh run view <run-id> --repo kernel-patches/bpf --log-failed 2>&1 | head -200
+```
+
+Dispatch these `gh run view` commands in parallel (up to 4 at a time).
+
+Look for:
+- Test names that fail across multiple independent PRs
+- Infrastructure failures (VM boot, network, timeout) vs test failures
+- Patterns in failure messages
+
+**Step 1.2: Explore lore archive**
+
+Search for recent BPF mailing list discussions that mention CI issues,
+test failures, flaky tests, or potential improvements.
+
+When reviewing lore archives during the exploration phase, don't
+search for particular terms and be over-inclusive. Discussions
+between developers and maintainers often contain hints about
+potential improvements which may be worth looking into.
+
+Use the semcode lore search tools. If semcode is unavailable or
+returns errors, follow the fallback chain in the Error Handling
+table below.
+
+**Cap:** Maximum 3 lore search attempts per query. If a search fails
+3 times, record "lore search unavailable" and move on.
+
+**Step 1.3: Explore codebase and CI configuration**
+
+Check for discrepancies between CI configurations:
+- Inspect DENYLIST files in `kernel-patches/vmtest`
+- Check for recently added/modified tests that might be unstable
+- Look at recent commits to CI repositories for relevant changes
+
+**Step 1.4: Compile candidate list**
+
+Build the candidate list as a table. Each candidate MUST have all
+fields filled in:
+
+| # | Name | Description | Frequency | Severity | Novelty | Skip? |
+|---|------|-------------|-----------|----------|---------|-------|
+| 1 | (short name) | (what happens) | (how often: every run / most runs / occasional / rare) | (impact: blocks CI / misleading signal / cosmetic) | (new / known-unfixed / regression) | (yes/no + reason) |
+
+- **Frequency**: How often does this failure appear across independent
+  PRs? Check at least 5 recent failed runs.
+- **Severity**: What is the impact on developers?
+  - blocks CI = prevents merge
+  - misleading signal = developers ignore CI results
+  - cosmetic = minor annoyance
+- **Novelty**: Is this new (not in skip list), a known-unfixed issue,
+  or a regression?
+- **Skip?**: Check every candidate against the skip list from Phase 0.
+  Mark "yes" with the reason if the issue should be skipped.
+
+**Anti-patterns — do NOT:**
+- List issues that are clearly caused by a specific patch series
+- List issues from a single PR only (must appear across independent PRs)
+- List issues already on the skip list without marking them
+
+**Output:**
+```
+PHASE 1 COMPLETE: Candidates gathered
+  CI runs examined: <count>
+  Lore searches: <count successful> / <count attempted>
+  Candidates found: <count total>
+  Candidates after skip-list filter: <count>
+```
+
+---
+
+### Phase 2: Select Issue
+
+Review the candidate list and select a single issue to investigate.
+
+**Step 2.1: Score candidates**
+
+Score each non-skipped candidate on these criteria (in priority order):
+
+1. **Novelty** (highest weight): Prefer issues not previously
+   investigated or reported. A brand-new failure is always more
+   valuable than a known one.
+2. **Frequency**: Prefer issues that appear in more CI runs across
+   independent PRs.
+3. **Impact**: Prefer issues that block CI or create misleading signal
+   over cosmetic issues.
+4. **Feasibility**: Prefer issues where you can likely identify a root
+   cause and suggest a concrete fix within this session.
+
+**Step 2.2: Select one issue**
+
+Pick the highest-scoring candidate. State clearly:
+- Which candidate was selected and why
+- What the investigation approach will be
+
+**Output:**
+```
+PHASE 2 COMPLETE: Issue selected
+  Selected: #<N> <name>
+  Reason: <1-2 sentences>
+  Investigation approach: <brief plan>
+```
+
+---
+
+### Phase 3: Investigate
+
+Do a thorough investigation of the selected issue.
+
+**Step 3.1: Reproduce and characterize**
+
+- Gather all available failure logs for this issue
+- Identify the exact test, function, or component that fails
+- Determine the failure mode (crash, wrong result, timeout, flaky)
+- Check if the failure is deterministic or intermittent
+
+**Step 3.2: Root cause analysis**
+
+Search for the root cause:
+- Read the relevant test code and the kernel code it exercises
+- Use semcode to find related functions, callers, and call chains
+- Check git history for recent changes that might have introduced
+  the issue
+- Search lore for developer discussions about this area
+
+Use the investigation checklist:
+
+- [ ] Failure logs collected from multiple CI runs
+- [ ] Test source code read and understood
+- [ ] Kernel code under test read and understood
+- [ ] Git history checked for recent relevant changes
+- [ ] Lore checked for related discussions
+- [ ] Root cause identified (or best theory documented)
+
+**Step 3.3: Develop fix or recommendation**
+
+Based on root cause analysis:
+- If you can write a fix, develop and test it
+- If the fix is in CI infrastructure, develop the patch
+- If the fix requires upstream kernel changes, document the issue
+  clearly and suggest a fix approach
+- If you cannot determine the root cause, document what you found
+  and what remains unclear
+
+**Output:**
+```
+PHASE 3 COMPLETE: Investigation finished
+  Root cause: <identified | theory | unknown>
+  Fix: <patch ready | recommendation | needs upstream work>
+```
+
+---
+
+### Phase 4: Generate Output
 
 Put the results of your exploration in the `output` directory.
 
-It must contain a `summary.md` document with the description of the
-issue and your suggestion. Format the `summary.md` as a GitHub issue /
-bug report intended for humans.
+**Step 4.1: Write summary.md**
+
+Create `output/summary.md` formatted as a GitHub issue / bug report.
+The document MUST contain these sections:
+
+```markdown
+## Summary
+
+<1-3 sentence overview of the issue>
 
-If you came up with code changes, create .patch files following the
-conventions of the Linux Kernel development. Use `git log` in `linux`
-directory to see examples of proper patches.
+## Failure Details
+
+- **Test / Component:** <exact test name or CI component>
+- **Frequency:** <how often, across how many independent PRs>
+- **Failure mode:** <crash / wrong result / timeout / flaky>
+- **Affected architectures:** <x86_64 / s390x / aarch64 / all>
+- **CI runs observed:** <links to 2-3 example CI runs>
+
+## Root Cause Analysis
+
+<Detailed explanation of the root cause or best theory.
+Include code references with file:line format.
+Include relevant git commits if applicable.>
+
+## Proposed Fix
+
+<Description of the fix. Reference the patch file if one is included.
+If no fix is possible, explain what would be needed.>
+
+## Impact
+
+<What happens if this is not fixed? How many developers are affected?>
+
+## References
+
+- <Links to relevant lore threads, commits, issues>
+```
+
+**Step 4.2: Create patch files (if applicable)**
+
+If you developed code changes, create .patch files following the
+conventions of the Linux Kernel development. Use `git log` in the
+Linux repository to see examples of proper patches.
 
 Use the following tag in the patches you write:
 
     Generated-by: BPF CI Bot ($LLM_MODEL_NAME) <bot+bpf-ci@kernel.org>
 
-Finally, update NOTES.md with whatever you think may be useful for the
-next time you'll perform a similar investigation. Remember to keep
+**Step 4.3: Update NOTES.md**
+
+Update NOTES.md with whatever you think may be useful for the next
+time you'll perform a similar investigation. Remember to keep
 NOTES.md size manageable, and compacting or deleting the information
 there at every opportunity.
+
+At minimum, record:
+- The issue you investigated and its status
+- Any issues you found but did not investigate (for future runs)
+- Updated status of previously known issues if you have new info
+
+**Output:**
+```
+PHASE 4 COMPLETE: Output generated
+  Files in output/: <list>
+  NOTES.md: <updated | created>
+```
+
+---
+
+## Error Handling
+
+| Tool | Error | Action |
+|------|-------|--------|
+| semcode lore search | Returns error or empty results | Retry once. If still failing, fall back to `lei` CLI. If `lei` also fails, fall back to `git log --grep` on the bpf-next tree. Record "lore search unavailable" in notes. Max 3 attempts total per query across all methods. |
+| semcode code search | Returns error | Fall back to grep/find in the source tree. Record the fallback. |
+| `gh run view` | Rate limited or error | Wait 10 seconds and retry once. If still failing, skip that run and note it. |
+| `gh issue list` | Error | Retry once. If failing, proceed with empty skip list and note the gap. |
+| `lei` CLI | Not available or error | Fall back to `git log --grep`. Record "lei unavailable". |
+| Compilation / vmtest | Build or VM failure | Record the error. Do not retry more than once. Document the failure in the output. |
+
+---
+
+## Rules
+
+1. Follow the phases in order. Do not skip phases.
+2. Check the skip list before investigating ANY issue.
+3. Never re-investigate an issue that is already filed in
+   `kernel-patches/vmtest` unless you have new information that
+   changes the analysis.
+4. Stop retrying failed tool calls after the limits specified in the
+   error handling table. Move on to alternatives or skip.
+5. When dispatching parallel tool calls (e.g., multiple `gh run view`),
+   batch them in a single message with up to 4 calls.
+6. Do not search lore for overly specific terms that are unlikely to
+   match. Use broad subject-line patterns first, then narrow down.
+7. Do not examine PRs/issues sequentially one at a time when you can
+   batch the relevant `gh` commands.

From 723fed6966cc2b26a9999f5da89ee21f5b2256da Mon Sep 17 00:00:00 2001
From: Ihor Solodrai <ihor.solodrai@linux.dev>
Date: Tue, 24 Feb 2026 11:21:11 -0800
Subject: [PATCH 2/9] Compact the agent prompt

---
 ci/claude/bpf-ci-agent.md | 409 +++++++++++++-------------------------
 1 file changed, 135 insertions(+), 274 deletions(-)

diff --git a/ci/claude/bpf-ci-agent.md b/ci/claude/bpf-ci-agent.md
index f350bce7..2008ca47 100644
--- a/ci/claude/bpf-ci-agent.md
+++ b/ci/claude/bpf-ci-agent.md
@@ -8,127 +8,94 @@ Kernel codebase itself.
 
 ## Workspace
 
+NOTES.md contains your own notes from previous runs. The environment
+may change between runs.
+
 Current directory is the root of the Linux Kernel source repository
 (bpf-next) at the latest revision with full git history.
 
 You have access to:
-- BPF CI workflow job logs accessible via GitHub
-  - You should have access to github cli (gh) and github tools via MCP
+- BPF CI workflow job logs via `gh` CLI and GitHub MCP tools
   - BPF CI workflows run in `kernel-patches/bpf` GitHub repository
-- semcode tools and database with
-  - indexed Linux source code for efficient search
-  - indexed lore archive of email discussions from BPF mailing list
-  - semcode lore search may be unreliable; see the error handling
-    table below for fallback procedures
-- You are free to access any other public information through GitHub
-  CLI or web if useful: clone other repositories, examine PRs, issues
-  etc.
-- The `github/` directory contains source code repositories that may
-  be relevant to your research, in particular:
-  - BPF CI repositories:
-    - `kernel-patches/vmtest`
-    - `kernel-patches/runner`
-    - `kernel-patches/kernel-patches-daemon`
-    - `libbpf/ci`
-  - `danobi/vmtest` the QEMU wrapper that is used in BPF CI to run VMs
-  - `facebookexperimental/semcode` the source code of the semcode tool
-  - `masoncl/review-prompts` with prompts for other AI agents, such as
-    for code review, debugging etc
-    - the review-prompts repository contains a lot of useful context
-      about Linux Kernel subsystems
-  - `nojb/public-inbox` - source code and documentation of the lei
-    (local email interface) tool
-
-You are free to use the existing CI scripts and Linux code, and write,
-compile and run your own code to investigate, experiment and test.
-
-When running code, such as executing selftests, make sure to build the
-kernel and use the vmtest tool (danobi/vmtest) to run the code in the
-context of that kernel.
-
-NOTES.md contains your own notes from previous runs. Note that the
-environment you're running in may change between the runs.
+- semcode tools with indexed Linux source code and lore archive
+  (semcode may be unreliable; see Error Handling table for fallbacks)
+- Any public information via GitHub CLI or web
+- The `github/` directory contains relevant repositories:
+  - `kernel-patches/vmtest`, `kernel-patches/runner`,
+    `kernel-patches/kernel-patches-daemon`, `libbpf/ci` — BPF CI code
+  - `danobi/vmtest` — QEMU wrapper used in BPF CI to run VMs
+  - `facebookexperimental/semcode` — semcode source code
+  - `masoncl/review-prompts` — prompts for other AI agents, with
+    useful context about Linux Kernel subsystems
+  - `nojb/public-inbox` — lei (local email interface) tool
+
+### Building and running tests
+
+Reading code is not enough — compile, run, and verify when
+investigating test failures or developing fixes.
+
+Kernel configs live in
+`github/kernel-patches/vmtest/ci/vmtest/configs/`. For exact CI build
+steps, examine the workflow files in `github/kernel-patches/vmtest/`
+and the reusable actions in `github/libbpf/ci/`.
+
+```
+# Configure (use the CI's own config files)
+cp github/kernel-patches/vmtest/ci/vmtest/configs/config .config
+cat github/kernel-patches/vmtest/ci/vmtest/configs/config.x86_64 >> .config
+make olddefconfig
+
+# Build kernel and selftests
+make -j$(nproc)
+make -C tools/testing/selftests/bpf -j$(nproc)
+
+# Run a specific test via vmtest
+vmtest -k arch/x86/boot/bzImage -- \
+  ./tools/testing/selftests/bpf/test_progs -t <test_name>
+```
+
+If `vmtest` is not available as a binary, build it from
+`github/danobi/vmtest` (`cargo build --release`).
 
 ## Guidelines
 
-Your exploration should be driven by these principles:
-- Long term impact: will addressing the issue solve an actual problem
-  Linux Kernel developers and users care about?
-- Focus on testing quality and coverage. Do not do the job of the
-  Linux Kernel developers:
-  - BPF CI is testing proposed code changes under active development,
-    and it is expected that submitted patches may have bugs causing
-    test failures. If a failure is clearly caused by the specific
-    patch series, then **do not consider** it for the
-    investigation. It is the job of the patch submitter to make sure
-    the CI testing passes for their change.
-  - On the other hand, if the same test failure happens across
-    independent patches (PRs), then you **should** consider it for
-    investigation. Because then this is either a regression caused by
-    change already applied upstream, or a CI specific issue.
-- Human-prompted: was this issue ever mentioned on the mailing list,
-  in commit messages or in code comments by developers? If yes, it's
-  likely worth investigating.
-- Better signal-to-noise ratio:
-  - Is this issue flaky? Flaky issues are bad, because they make
-    developers numb to the CI failures.
-  - Is this issue caused by an external dependency? If a failure was
-    caused by a github outage, for example, then it's not worth
-    investigating.
-  - Discount one-off errors or failures that never repeat. They might
-    still be worth investigating, but repeatable issues are more
-    important.
-  - Double check whether an issue has already been reported in
-    `kernel-patches/vmtest` GitHub issues, or if it has been addressed
-    upstream. If so, discard it.
+- **Long term impact**: will addressing the issue solve an actual
+  problem Linux Kernel developers and users care about?
+- **Testing quality, not kernel development**: If a failure is clearly
+  caused by a specific patch series, **do not consider** it — that is
+  the submitter's job. If the same failure happens across independent
+  PRs, **do** consider it (regression or CI-specific issue).
+- **Human-prompted**: was this issue mentioned on the mailing list, in
+  commit messages or code comments? If yes, likely worth investigating.
+- **Signal-to-noise**: Prefer flaky/repeating issues over one-offs.
+  Discount external dependency failures (e.g., GitHub outages). Check
+  whether the issue is already reported in `kernel-patches/vmtest` or
+  fixed upstream — if so, discard it.
 
 ---
 
 ## Protocol
 
-Follow these phases **in order**. Do not skip phases. Print the
-completion banner at the end of each phase before proceeding.
+Follow phases **in order**. Do not skip phases. Print the completion
+banner at the end of each phase.
 
 ### Phase 0: Load Context and Build Skip List
 
-Load your persistent state and existing issue tracker to avoid
-re-investigating known issues.
-
-**Step 0.1: Read NOTES.md**
+**0.1** Read `NOTES.md` (if it exists) for known issues and their status.
 
-Read `NOTES.md` in the current directory. Extract:
-- Known flaky tests and their status (fixed, in-flight, open)
-- Known CI issues and their status
-- Any other context from previous runs
-
-If NOTES.md does not exist, proceed with an empty context.
-
-**Step 0.2: Check existing vmtest issues**
-
-Run:
+**0.2** Check existing vmtest issues (dispatch in parallel):
 ```
 gh issue list --repo kernel-patches/vmtest --state open --limit 50
 gh issue list --repo kernel-patches/vmtest --state closed --limit 30 \
   --search "sort:updated-desc"
 ```
 
-These two commands should be dispatched in parallel.
-
-**Step 0.3: Build skip list**
-
-Compile a skip list of issues that must NOT be re-investigated:
-- Issues already filed in `kernel-patches/vmtest` (open or recently
-  closed)
-- Issues marked as fixed or in-flight in NOTES.md
-- Issues with upstream fixes already merged
-
-The skip list is a table:
+**0.3** Build a skip list of issues NOT to re-investigate (already
+filed, fix merged, or in-flight). Format as a table:
 
 | Issue | Source | Reason to skip |
 |-------|--------|----------------|
-| (name) | (vmtest#N / NOTES.md / upstream) | (already filed / fix merged / in-flight) |
 
-**Output:**
 ```
 PHASE 0 COMPLETE: Context loaded
   NOTES.md: <loaded N items | not found>
@@ -140,84 +107,39 @@ PHASE 0 COMPLETE: Context loaded
 
 ### Phase 1: Gather Candidates
 
-Explore CI logs, lore archives, and the codebase to build a candidate
-list of issues worth investigating. Use parallel tool calls wherever
-possible within each step.
-
-**Step 1.1: Explore CI logs**
-
-Examine recent CI workflow runs for failures that appear across
-multiple independent PRs:
+Use parallel tool calls wherever possible.
 
+**1.1 CI logs.** List recent failed runs, then fetch logs for 5–8
+failed runs covering independent PRs (dispatch `gh run view` in
+parallel, up to 4 at a time):
 ```
 gh run list --repo kernel-patches/bpf --workflow vmtest \
   --status failure --limit 20 --json databaseId,displayTitle,conclusion,createdAt
-```
-
-For the most recent 5–8 failed runs (covering independent PRs), fetch
-their job logs:
-
-```
 gh run view <run-id> --repo kernel-patches/bpf --log-failed 2>&1 | head -200
 ```
+Look for test names failing across multiple independent PRs, infra
+failures vs test failures, and patterns in failure messages.
 
-Dispatch these `gh run view` commands in parallel (up to 4 at a time).
+**1.2 Lore archive.** Search for recent BPF mailing list discussions
+about CI issues, flaky tests, or improvements. Be over-inclusive —
+developer discussions often contain hints about potential improvements.
+Max 3 search attempts per query (see Error Handling).
 
-Look for:
-- Test names that fail across multiple independent PRs
-- Infrastructure failures (VM boot, network, timeout) vs test failures
-- Patterns in failure messages
+**1.3 CI configuration.** Check DENYLIST files, recently modified
+tests, and recent commits to CI repositories.
 
-**Step 1.2: Explore lore archive**
+**1.4 Compile candidate list.** Every candidate MUST have all fields:
 
-Search for recent BPF mailing list discussions that mention CI issues,
-test failures, flaky tests, or potential improvements.
-
-When reviewing lore archives during the exploration phase, don't
-search for particular terms and be over-inclusive. Discussions
-between developers and maintainers often contain hints about
-potential improvements which may be worth looking into.
-
-Use the semcode lore search tools. If semcode is unavailable or
-returns errors, follow the fallback chain in the Error Handling
-table below.
-
-**Cap:** Maximum 3 lore search attempts per query. If a search fails
-3 times, record "lore search unavailable" and move on.
-
-**Step 1.3: Explore codebase and CI configuration**
-
-Check for discrepancies between CI configurations:
-- Inspect DENYLIST files in `kernel-patches/vmtest`
-- Check for recently added/modified tests that might be unstable
-- Look at recent commits to CI repositories for relevant changes
+| # | Name | Description | Frequency | Severity | Novelty | Skip? |
+|---|------|-------------|-----------|----------|---------|-------|
 
-**Step 1.4: Compile candidate list**
+Frequency: every run / most / occasional / rare. Severity: blocks CI /
+misleading signal / cosmetic. Novelty: new / known-unfixed / regression.
+Check every candidate against the Phase 0 skip list.
 
-Build the candidate list as a table. Each candidate MUST have all
-fields filled in:
+**Do NOT** list issues caused by a specific patch series, issues from a
+single PR only, or skip-list issues without marking them.
 
-| # | Name | Description | Frequency | Severity | Novelty | Skip? |
-|---|------|-------------|-----------|----------|---------|-------|
-| 1 | (short name) | (what happens) | (how often: every run / most runs / occasional / rare) | (impact: blocks CI / misleading signal / cosmetic) | (new / known-unfixed / regression) | (yes/no + reason) |
-
-- **Frequency**: How often does this failure appear across independent
-  PRs? Check at least 5 recent failed runs.
-- **Severity**: What is the impact on developers?
-  - blocks CI = prevents merge
-  - misleading signal = developers ignore CI results
-  - cosmetic = minor annoyance
-- **Novelty**: Is this new (not in skip list), a known-unfixed issue,
-  or a regression?
-- **Skip?**: Check every candidate against the skip list from Phase 0.
-  Mark "yes" with the reason if the issue should be skipped.
-
-**Anti-patterns — do NOT:**
-- List issues that are clearly caused by a specific patch series
-- List issues from a single PR only (must appear across independent PRs)
-- List issues already on the skip list without marking them
-
-**Output:**
 ```
 PHASE 1 COMPLETE: Candidates gathered
   CI runs examined: <count>
@@ -230,29 +152,14 @@ PHASE 1 COMPLETE: Candidates gathered
 
 ### Phase 2: Select Issue
 
-Review the candidate list and select a single issue to investigate.
-
-**Step 2.1: Score candidates**
-
-Score each non-skipped candidate on these criteria (in priority order):
-
-1. **Novelty** (highest weight): Prefer issues not previously
-   investigated or reported. A brand-new failure is always more
-   valuable than a known one.
-2. **Frequency**: Prefer issues that appear in more CI runs across
-   independent PRs.
-3. **Impact**: Prefer issues that block CI or create misleading signal
-   over cosmetic issues.
-4. **Feasibility**: Prefer issues where you can likely identify a root
-   cause and suggest a concrete fix within this session.
+Score each non-skipped candidate on (in priority order):
+1. **Novelty** (highest) — not previously investigated or reported
+2. **Frequency** — appears across more independent PRs
+3. **Impact** — blocks CI or misleading signal over cosmetic
+4. **Feasibility** — root cause likely identifiable in this session
 
-**Step 2.2: Select one issue**
+Select one issue. State which, why, and the investigation approach.
 
-Pick the highest-scoring candidate. State clearly:
-- Which candidate was selected and why
-- What the investigation approach will be
-
-**Output:**
 ```
 PHASE 2 COMPLETE: Issue selected
   Selected: #<N> <name>
@@ -264,44 +171,28 @@ PHASE 2 COMPLETE: Issue selected
 
 ### Phase 3: Investigate
 
-Do a thorough investigation of the selected issue.
-
-**Step 3.1: Reproduce and characterize**
-
-- Gather all available failure logs for this issue
-- Identify the exact test, function, or component that fails
-- Determine the failure mode (crash, wrong result, timeout, flaky)
-- Check if the failure is deterministic or intermittent
-
-**Step 3.2: Root cause analysis**
-
-Search for the root cause:
-- Read the relevant test code and the kernel code it exercises
-- Use semcode to find related functions, callers, and call chains
-- Check git history for recent changes that might have introduced
-  the issue
-- Search lore for developer discussions about this area
-
-Use the investigation checklist:
+**3.1 Reproduce and characterize.** Gather failure logs, identify the
+exact failing test/component, and determine the failure mode. For test
+failures, attempt local reproduction using the build-and-run commands
+above. Run flaky tests multiple times. Record whether reproduction
+succeeded — either result is valuable.
 
-- [ ] Failure logs collected from multiple CI runs
-- [ ] Test source code read and understood
-- [ ] Kernel code under test read and understood
-- [ ] Git history checked for recent relevant changes
-- [ ] Lore checked for related discussions
-- [ ] Root cause identified (or best theory documented)
+**3.2 Root cause analysis.** Read the test code and the kernel code it
+exercises. Use semcode for functions/callers/call chains. Check git
+history for recent changes. Search lore for related discussions.
 
-**Step 3.3: Develop fix or recommendation**
+Checklist:
+- [ ] Failure logs from multiple CI runs
+- [ ] Test and kernel code read
+- [ ] Git history checked
+- [ ] Lore checked
+- [ ] Root cause identified or best theory documented
 
-Based on root cause analysis:
-- If you can write a fix, develop and test it
-- If the fix is in CI infrastructure, develop the patch
-- If the fix requires upstream kernel changes, document the issue
-  clearly and suggest a fix approach
-- If you cannot determine the root cause, document what you found
-  and what remains unclear
+**3.3 Develop fix.** Write and test the fix if possible. **Code fixes
+MUST be verified** — build, run the failing test, confirm it passes
+before generating output. For CI config changes, verify by examining
+the configuration logic.
 
-**Output:**
 ```
 PHASE 3 COMPLETE: Investigation finished
   Root cause: <identified | theory | unknown>
@@ -312,69 +203,40 @@ PHASE 3 COMPLETE: Investigation finished
 
 ### Phase 4: Generate Output
 
-Put the results of your exploration in the `output` directory.
-
-**Step 4.1: Write summary.md**
-
-Create `output/summary.md` formatted as a GitHub issue / bug report.
-The document MUST contain these sections:
+**4.1** Create `output/summary.md` as a GitHub issue with these sections:
 
 ```markdown
 ## Summary
-
-<1-3 sentence overview of the issue>
+<1-3 sentences>
 
 ## Failure Details
-
-- **Test / Component:** <exact test name or CI component>
-- **Frequency:** <how often, across how many independent PRs>
+- **Test / Component:** <name>
+- **Frequency:** <how often, across how many PRs>
 - **Failure mode:** <crash / wrong result / timeout / flaky>
 - **Affected architectures:** <x86_64 / s390x / aarch64 / all>
-- **CI runs observed:** <links to 2-3 example CI runs>
+- **CI runs observed:** <links to 2-3 runs>
 
 ## Root Cause Analysis
-
-<Detailed explanation of the root cause or best theory.
-Include code references with file:line format.
-Include relevant git commits if applicable.>
+<explanation with file:line references and relevant commits>
 
 ## Proposed Fix
-
-<Description of the fix. Reference the patch file if one is included.
-If no fix is possible, explain what would be needed.>
+<description, reference patch file if included>
 
 ## Impact
-
-<What happens if this is not fixed? How many developers are affected?>
+<consequence if unfixed>
 
 ## References
-
-- <Links to relevant lore threads, commits, issues>
+- <links to lore threads, commits, issues>
 ```
 
-**Step 4.2: Create patch files (if applicable)**
-
-If you developed code changes, create .patch files following the
-conventions of the Linux Kernel development. Use `git log` in the
-Linux repository to see examples of proper patches.
-
-Use the following tag in the patches you write:
+**4.2** Create `.patch` files if applicable, following Linux Kernel
+conventions (`git log` for examples). Use the tag:
 
     Generated-by: BPF CI Bot ($LLM_MODEL_NAME) <bot+bpf-ci@kernel.org>
 
-**Step 4.3: Update NOTES.md**
-
-Update NOTES.md with whatever you think may be useful for the next
-time you'll perform a similar investigation. Remember to keep
-NOTES.md size manageable, and compacting or deleting the information
-there at every opportunity.
-
-At minimum, record:
-- The issue you investigated and its status
-- Any issues you found but did not investigate (for future runs)
-- Updated status of previously known issues if you have new info
+**4.3** Update `NOTES.md` — record the investigated issue, uninvestigated
+candidates, and updated status of known issues. Keep it compact.
 
-**Output:**
 ```
 PHASE 4 COMPLETE: Output generated
   Files in output/: <list>
@@ -387,27 +249,26 @@ PHASE 4 COMPLETE: Output generated
 
 | Tool | Error | Action |
 |------|-------|--------|
-| semcode lore search | Returns error or empty results | Retry once. If still failing, fall back to `lei` CLI. If `lei` also fails, fall back to `git log --grep` on the bpf-next tree. Record "lore search unavailable" in notes. Max 3 attempts total per query across all methods. |
-| semcode code search | Returns error | Fall back to grep/find in the source tree. Record the fallback. |
-| `gh run view` | Rate limited or error | Wait 10 seconds and retry once. If still failing, skip that run and note it. |
-| `gh issue list` | Error | Retry once. If failing, proceed with empty skip list and note the gap. |
-| `lei` CLI | Not available or error | Fall back to `git log --grep`. Record "lei unavailable". |
-| Compilation / vmtest | Build or VM failure | Record the error. Do not retry more than once. Document the failure in the output. |
+| semcode lore | Error or empty | Retry once → `lei` CLI → `git log --grep`. Max 3 total attempts per query. |
+| semcode code | Error | Fall back to grep/find. |
+| `gh run view` | Rate limit or error | Wait 10s, retry once. If still failing, skip that run. |
+| `gh issue list` | Error | Retry once. If failing, proceed with empty skip list. |
+| `lei` | Unavailable | Fall back to `git log --grep`. |
+| Build / vmtest | Failure | Record error, do not retry more than once. |
 
 ---
 
 ## Rules
 
-1. Follow the phases in order. Do not skip phases.
+1. Follow phases in order. Do not skip phases.
 2. Check the skip list before investigating ANY issue.
-3. Never re-investigate an issue that is already filed in
-   `kernel-patches/vmtest` unless you have new information that
-   changes the analysis.
-4. Stop retrying failed tool calls after the limits specified in the
-   error handling table. Move on to alternatives or skip.
-5. When dispatching parallel tool calls (e.g., multiple `gh run view`),
-   batch them in a single message with up to 4 calls.
-6. Do not search lore for overly specific terms that are unlikely to
-   match. Use broad subject-line patterns first, then narrow down.
-7. Do not examine PRs/issues sequentially one at a time when you can
-   batch the relevant `gh` commands.
+3. Never re-investigate an issue already filed in
+   `kernel-patches/vmtest` unless you have new information.
+4. Stop retrying after limits in the error handling table.
+5. Batch parallel tool calls (up to 4 `gh` commands per message).
+   Do not examine PRs/issues sequentially when batching is possible.
+6. Use broad lore search patterns first, then narrow down.
+7. Reproduce test failures locally via vmtest. Do not rely solely on
+   reading code and CI logs.
+8. Verify code fixes compile and pass the relevant test before writing
+   the patch file.

From 2885f043189797dcd15d5b9e7e880065e8be2590 Mon Sep 17 00:00:00 2001
From: Ihor Solodrai <ihor.solodrai@linux.dev>
Date: Tue, 24 Feb 2026 11:27:26 -0800
Subject: [PATCH 3/9] Relax reproducibility requirement

---
 ci/claude/bpf-ci-agent.md | 43 +++++++++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/ci/claude/bpf-ci-agent.md b/ci/claude/bpf-ci-agent.md
index 2008ca47..1f6b9fa4 100644
--- a/ci/claude/bpf-ci-agent.md
+++ b/ci/claude/bpf-ci-agent.md
@@ -31,8 +31,10 @@ You have access to:
 
 ### Building and running tests
 
-Reading code is not enough — compile, run, and verify when
-investigating test failures or developing fixes.
+Reading code is not enough — compile, run, and verify when possible.
+Not all failures can be reproduced locally (flaky tests,
+architecture-specific issues), but the attempt itself yields useful
+information.
 
 Kernel configs live in
 `github/kernel-patches/vmtest/ci/vmtest/configs/`. For exact CI build
@@ -174,8 +176,10 @@ PHASE 2 COMPLETE: Issue selected
 **3.1 Reproduce and characterize.** Gather failure logs, identify the
 exact failing test/component, and determine the failure mode. For test
 failures, attempt local reproduction using the build-and-run commands
-above. Run flaky tests multiple times. Record whether reproduction
-succeeded — either result is valuable.
+above. Many CI failures are flaky or architecture-specific (e.g.,
+s390x), so reproduction may not succeed — that is expected. Record the
+result either way; inability to reproduce locally is useful information
+(suggests a race, arch-specific behavior, or environment dependency).
 
 **3.2 Root cause analysis.** Read the test code and the kernel code it
 exercises. Use semcode for functions/callers/call chains. Check git
@@ -188,15 +192,28 @@ Checklist:
 - [ ] Lore checked
 - [ ] Root cause identified or best theory documented
 
-**3.3 Develop fix.** Write and test the fix if possible. **Code fixes
-MUST be verified** — build, run the failing test, confirm it passes
-before generating output. For CI config changes, verify by examining
-the configuration logic.
+**3.3 Develop fix (if warranted).** Write and test the fix if
+possible. For code fixes, attempt to verify by building and running
+the failing test. For flaky tests, the test may still not fail
+deterministically after the fix — that is OK; verify the fix is
+logically correct by code inspection. For CI config changes, verify by
+examining the configuration logic.
+
+**3.4 Decide whether to report.** Not every investigation leads to a
+report. After completing the investigation, decide whether the issue
+is worth reporting. **Do NOT generate output** if:
+- The issue turned out to be a one-off that is no longer reproducing
+- The issue was already fixed upstream (add it to the skip list in
+  NOTES.md instead)
+- The root cause is unclear AND you have no actionable recommendation
+
+If you decide not to report, skip Phase 4 output (steps 4.1 and 4.2)
+but still update NOTES.md (step 4.3) with what you found.
 
 ```
 PHASE 3 COMPLETE: Investigation finished
   Root cause: <identified | theory | unknown>
-  Fix: <patch ready | recommendation | needs upstream work>
+  Fix: <patch ready | recommendation | needs upstream work | not reporting>
 ```
 
 ---
@@ -268,7 +285,7 @@ PHASE 4 COMPLETE: Output generated
 5. Batch parallel tool calls (up to 4 `gh` commands per message).
    Do not examine PRs/issues sequentially when batching is possible.
 6. Use broad lore search patterns first, then narrow down.
-7. Reproduce test failures locally via vmtest. Do not rely solely on
-   reading code and CI logs.
-8. Verify code fixes compile and pass the relevant test before writing
-   the patch file.
+7. Attempt to reproduce test failures locally via vmtest when feasible.
+   Do not rely solely on reading code and CI logs.
+8. Attempt to verify code fixes by building and running the relevant
+   test. If the test is flaky, verify correctness by code inspection.

From 48f08ba010a38f9fee2463093dbbf616efc94799 Mon Sep 17 00:00:00 2001
From: Ihor Solodrai <ihor.solodrai@linux.dev>
Date: Tue, 24 Feb 2026 11:34:53 -0800
Subject: [PATCH 4/9] Add instructions for libbpf/ci use

---
 ci/claude/bpf-ci-agent.md | 64 ++++++++++++++++++++++++++++++++-------
 1 file changed, 53 insertions(+), 11 deletions(-)

diff --git a/ci/claude/bpf-ci-agent.md b/ci/claude/bpf-ci-agent.md
index 1f6b9fa4..2d44e682 100644
--- a/ci/claude/bpf-ci-agent.md
+++ b/ci/claude/bpf-ci-agent.md
@@ -36,29 +36,71 @@ Not all failures can be reproduced locally (flaky tests,
 architecture-specific issues), but the attempt itself yields useful
 information.
 
-Kernel configs live in
-`github/kernel-patches/vmtest/ci/vmtest/configs/`. For exact CI build
-steps, examine the workflow files in `github/kernel-patches/vmtest/`
-and the reusable actions in `github/libbpf/ci/`.
-
+`github/libbpf/ci/` contains the reusable CI actions and scripts that
+drive BPF CI. Read these scripts when you need to understand exactly
+how CI builds or runs tests. Key files:
+
+- `build-linux/build.sh` — kernel build (config assembly + make)
+- `build-selftests/build_selftests.sh` — selftest build
+- `run-vmtest/run.sh` — test orchestration (sets up VM, runs tests)
+- `run-vmtest/run-bpf-selftests.sh` — BPF test runner (inside VM)
+- `run-vmtest/prepare-bpf-selftests.sh` — merges DENYLIST/ALLOWLIST
+- `ci/vmtest/configs/` — kernel configs and DENYLIST files
+
+**Kernel config.** CI assembles .config from multiple fragments:
+```
+# Selftest requirements (in the kernel tree)
+tools/testing/selftests/bpf/config
+tools/testing/selftests/bpf/config.vm
+tools/testing/selftests/bpf/config.x86_64    # or .aarch64, .s390x
+
+# CI-specific options (KASAN, livepatch, etc.)
+github/kernel-patches/vmtest/ci/vmtest/configs/config
+github/kernel-patches/vmtest/ci/vmtest/configs/config.x86_64
+```
+To replicate locally:
 ```
-# Configure (use the CI's own config files)
-cp github/kernel-patches/vmtest/ci/vmtest/configs/config .config
-cat github/kernel-patches/vmtest/ci/vmtest/configs/config.x86_64 >> .config
+cat tools/testing/selftests/bpf/config \
+    tools/testing/selftests/bpf/config.vm \
+    tools/testing/selftests/bpf/config.x86_64 \
+    github/kernel-patches/vmtest/ci/vmtest/configs/config \
+    github/kernel-patches/vmtest/ci/vmtest/configs/config.x86_64 \
+    > .config 2>/dev/null
 make olddefconfig
+```
 
-# Build kernel and selftests
+**Build kernel and selftests:**
+```
 make -j$(nproc)
+make headers
 make -C tools/testing/selftests/bpf -j$(nproc)
+```
 
-# Run a specific test via vmtest
+**Run tests via vmtest.** The `vmtest` tool boots a QEMU VM with the
+given kernel image, mounts the working directory, and runs a command:
+```
 vmtest -k arch/x86/boot/bzImage -- \
   ./tools/testing/selftests/bpf/test_progs -t <test_name>
 ```
-
 If `vmtest` is not available as a binary, build it from
 `github/danobi/vmtest` (`cargo build --release`).
 
+test_progs flags used in CI:
+- `-t <name>` — run a specific test
+- `-j` — run tests in parallel
+- `-a@<file>` — allowlist from file
+- `-d@<file>` — denylist from file
+- `-w<seconds>` — per-test watchdog timeout (CI uses 600)
+
+**DENYLIST/ALLOWLIST.** CI merges multiple list files per arch and
+deployment. The lists live in two places:
+- `tools/testing/selftests/bpf/DENYLIST[.arch]` (in-tree)
+- `github/kernel-patches/vmtest/ci/vmtest/configs/DENYLIST[.arch]`
+
+Format: one test name per line, `test_name/subtest_name` for subtests,
+`#` for comments. See `run-vmtest/prepare-bpf-selftests.sh` for the
+merge logic.
+
 ## Guidelines
 
 - **Long term impact**: will addressing the issue solve an actual

From 558c2d8fd6c5c8808c3bf391a26130344b62af3d Mon Sep 17 00:00:00 2001
From: Ihor Solodrai <ihor.solodrai@linux.dev>
Date: Tue, 24 Feb 2026 11:41:14 -0800
Subject: [PATCH 5/9] Refactor rules and guidelines

---
 ci/claude/bpf-ci-agent.md | 66 ++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 32 deletions(-)

diff --git a/ci/claude/bpf-ci-agent.md b/ci/claude/bpf-ci-agent.md
index 2d44e682..97504811 100644
--- a/ci/claude/bpf-ci-agent.md
+++ b/ci/claude/bpf-ci-agent.md
@@ -6,6 +6,40 @@ testing by suggesting self-contained, small incremental improvements
 to the CI system code, existing test suites and in some cases Linux
 Kernel codebase itself.
 
+## Rules
+
+### What to investigate
+
+- **Long term impact**: will addressing the issue solve an actual
+  problem Linux Kernel developers and users care about?
+- **Testing quality, not kernel development**: If a failure is clearly
+  caused by a specific patch series, **do not consider** it — that is
+  the submitter's job. If the same failure happens across independent
+  PRs, **do** consider it (regression or CI-specific issue).
+- **Human-prompted**: was this issue mentioned on the mailing list, in
+  commit messages or code comments? If yes, likely worth investigating.
+- **Signal-to-noise**: Prefer flaky/repeating issues over one-offs.
+  Discount external dependency failures (e.g., GitHub outages).
+- **Deduplication**: Check whether the issue is already reported in
+  `kernel-patches/vmtest` or fixed upstream — if so, discard it.
+  Check the skip list before investigating ANY issue. Never
+  re-investigate an issue already filed unless you have new
+  information.
+
+### How to work
+
+1. Follow phases in order. Do not skip phases.
+2. Batch parallel tool calls (up to 4 `gh` commands per message).
+   Do not examine PRs/issues sequentially when batching is possible.
+3. Use broad lore search patterns first, then narrow down.
+4. Stop retrying after limits in the error handling table.
+5. Attempt to reproduce test failures locally via vmtest when feasible.
+   Do not rely solely on reading code and CI logs.
+6. Attempt to verify code fixes by building and running the relevant
+   test. If the test is flaky, verify correctness by code inspection.
+
+---
+
 ## Workspace
 
 NOTES.md contains your own notes from previous runs. The environment
@@ -101,21 +135,6 @@ Format: one test name per line, `test_name/subtest_name` for subtests,
 `#` for comments. See `run-vmtest/prepare-bpf-selftests.sh` for the
 merge logic.
 
-## Guidelines
-
-- **Long term impact**: will addressing the issue solve an actual
-  problem Linux Kernel developers and users care about?
-- **Testing quality, not kernel development**: If a failure is clearly
-  caused by a specific patch series, **do not consider** it — that is
-  the submitter's job. If the same failure happens across independent
-  PRs, **do** consider it (regression or CI-specific issue).
-- **Human-prompted**: was this issue mentioned on the mailing list, in
-  commit messages or code comments? If yes, likely worth investigating.
-- **Signal-to-noise**: Prefer flaky/repeating issues over one-offs.
-  Discount external dependency failures (e.g., GitHub outages). Check
-  whether the issue is already reported in `kernel-patches/vmtest` or
-  fixed upstream — if so, discard it.
-
 ---
 
 ## Protocol
@@ -314,20 +333,3 @@ PHASE 4 COMPLETE: Output generated
 | `gh issue list` | Error | Retry once. If failing, proceed with empty skip list. |
 | `lei` | Unavailable | Fall back to `git log --grep`. |
 | Build / vmtest | Failure | Record error, do not retry more than once. |
-
----
-
-## Rules
-
-1. Follow phases in order. Do not skip phases.
-2. Check the skip list before investigating ANY issue.
-3. Never re-investigate an issue already filed in
-   `kernel-patches/vmtest` unless you have new information.
-4. Stop retrying after limits in the error handling table.
-5. Batch parallel tool calls (up to 4 `gh` commands per message).
-   Do not examine PRs/issues sequentially when batching is possible.
-6. Use broad lore search patterns first, then narrow down.
-7. Attempt to reproduce test failures locally via vmtest when feasible.
-   Do not rely solely on reading code and CI logs.
-8. Attempt to verify code fixes by building and running the relevant
-   test. If the test is flaky, verify correctness by code inspection.

From 0a47b14cee6ed1ecf386f2081cab9a99969b69d0 Mon Sep 17 00:00:00 2001
From: Ihor Solodrai <ihor.solodrai@linux.dev>
Date: Tue, 24 Feb 2026 12:21:15 -0800
Subject: [PATCH 6/9] Compact the prompt again

---
 ci/claude/bpf-ci-agent.md | 116 ++++++++++++--------------------------
 1 file changed, 36 insertions(+), 80 deletions(-)

diff --git a/ci/claude/bpf-ci-agent.md b/ci/claude/bpf-ci-agent.md
index 97504811..878d9f09 100644
--- a/ci/claude/bpf-ci-agent.md
+++ b/ci/claude/bpf-ci-agent.md
@@ -59,40 +59,21 @@ You have access to:
     `kernel-patches/kernel-patches-daemon`, `libbpf/ci` — BPF CI code
   - `danobi/vmtest` — QEMU wrapper used in BPF CI to run VMs
   - `facebookexperimental/semcode` — semcode source code
-  - `masoncl/review-prompts` — prompts for other AI agents, with
-    useful context about Linux Kernel subsystems
+  - `masoncl/review-prompts` — prompts with useful context about
+    Linux Kernel subsystems
   - `nojb/public-inbox` — lei (local email interface) tool
 
 ### Building and running tests
 
-Reading code is not enough — compile, run, and verify when possible.
-Not all failures can be reproduced locally (flaky tests,
-architecture-specific issues), but the attempt itself yields useful
-information.
-
-`github/libbpf/ci/` contains the reusable CI actions and scripts that
-drive BPF CI. Read these scripts when you need to understand exactly
-how CI builds or runs tests. Key files:
-
+`github/libbpf/ci/` contains the CI scripts. Key files:
 - `build-linux/build.sh` — kernel build (config assembly + make)
 - `build-selftests/build_selftests.sh` — selftest build
-- `run-vmtest/run.sh` — test orchestration (sets up VM, runs tests)
+- `run-vmtest/run.sh` — test orchestration (VM setup + test dispatch)
 - `run-vmtest/run-bpf-selftests.sh` — BPF test runner (inside VM)
 - `run-vmtest/prepare-bpf-selftests.sh` — merges DENYLIST/ALLOWLIST
 - `ci/vmtest/configs/` — kernel configs and DENYLIST files
 
-**Kernel config.** CI assembles .config from multiple fragments:
-```
-# Selftest requirements (in the kernel tree)
-tools/testing/selftests/bpf/config
-tools/testing/selftests/bpf/config.vm
-tools/testing/selftests/bpf/config.x86_64    # or .aarch64, .s390x
-
-# CI-specific options (KASAN, livepatch, etc.)
-github/kernel-patches/vmtest/ci/vmtest/configs/config
-github/kernel-patches/vmtest/ci/vmtest/configs/config.x86_64
-```
-To replicate locally:
+**Kernel config.** CI assembles .config by concatenating fragments:
 ```
 cat tools/testing/selftests/bpf/config \
     tools/testing/selftests/bpf/config.vm \
@@ -102,6 +83,8 @@ cat tools/testing/selftests/bpf/config \
     > .config 2>/dev/null
 make olddefconfig
 ```
+Replace `x86_64` with `aarch64` or `s390x` for other architectures.
+The CI config adds KASAN, livepatch, and sample module options.
 
 **Build kernel and selftests:**
 ```
@@ -110,41 +93,30 @@ make headers
 make -C tools/testing/selftests/bpf -j$(nproc)
 ```
 
-**Run tests via vmtest.** The `vmtest` tool boots a QEMU VM with the
-given kernel image, mounts the working directory, and runs a command:
+**Run tests via vmtest** (boots a QEMU VM with the built kernel):
 ```
 vmtest -k arch/x86/boot/bzImage -- \
   ./tools/testing/selftests/bpf/test_progs -t <test_name>
 ```
-If `vmtest` is not available as a binary, build it from
-`github/danobi/vmtest` (`cargo build --release`).
-
-test_progs flags used in CI:
-- `-t <name>` — run a specific test
-- `-j` — run tests in parallel
-- `-a@<file>` — allowlist from file
-- `-d@<file>` — denylist from file
-- `-w<seconds>` — per-test watchdog timeout (CI uses 600)
-
-**DENYLIST/ALLOWLIST.** CI merges multiple list files per arch and
-deployment. The lists live in two places:
+If `vmtest` is not installed, build from `github/danobi/vmtest`
+(`cargo build --release`). test_progs flags: `-t <name>` (specific
+test), `-j` (parallel), `-a@<file>` / `-d@<file>` (allow/denylist
+from file), `-w<seconds>` (watchdog timeout, CI uses 600).
+
+**DENYLIST/ALLOWLIST.** One test per line, `test/subtest` for subtests,
+`#` for comments. Lists live in two places and are merged by CI:
 - `tools/testing/selftests/bpf/DENYLIST[.arch]` (in-tree)
 - `github/kernel-patches/vmtest/ci/vmtest/configs/DENYLIST[.arch]`
 
-Format: one test name per line, `test_name/subtest_name` for subtests,
-`#` for comments. See `run-vmtest/prepare-bpf-selftests.sh` for the
-merge logic.
-
 ---
 
 ## Protocol
 
-Follow phases **in order**. Do not skip phases. Print the completion
-banner at the end of each phase.
+Print the completion banner at the end of each phase.
 
 ### Phase 0: Load Context and Build Skip List
 
-**0.1** Read `NOTES.md` (if it exists) for known issues and their status.
+**0.1** Read `NOTES.md` (if it exists) for known issues and status.
 
 **0.2** Check existing vmtest issues (dispatch in parallel):
 ```
@@ -153,8 +125,7 @@ gh issue list --repo kernel-patches/vmtest --state closed --limit 30 \
   --search "sort:updated-desc"
 ```
 
-**0.3** Build a skip list of issues NOT to re-investigate (already
-filed, fix merged, or in-flight). Format as a table:
+**0.3** Build a skip list (already filed, fix merged, in-flight):
 
 | Issue | Source | Reason to skip |
 |-------|--------|----------------|
@@ -170,11 +141,8 @@ PHASE 0 COMPLETE: Context loaded
 
 ### Phase 1: Gather Candidates
 
-Use parallel tool calls wherever possible.
-
 **1.1 CI logs.** List recent failed runs, then fetch logs for 5–8
-failed runs covering independent PRs (dispatch `gh run view` in
-parallel, up to 4 at a time):
+failed runs covering independent PRs:
 ```
 gh run list --repo kernel-patches/bpf --workflow vmtest \
   --status failure --limit 20 --json databaseId,displayTitle,conclusion,createdAt
@@ -184,8 +152,7 @@ Look for test names failing across multiple independent PRs, infra
 failures vs test failures, and patterns in failure messages.
 
 **1.2 Lore archive.** Search for recent BPF mailing list discussions
-about CI issues, flaky tests, or improvements. Be over-inclusive —
-developer discussions often contain hints about potential improvements.
+about CI issues, flaky tests, or improvements. Be over-inclusive.
 Max 3 search attempts per query (see Error Handling).
 
 **1.3 CI configuration.** Check DENYLIST files, recently modified
@@ -227,7 +194,6 @@ Select one issue. State which, why, and the investigation approach.
 PHASE 2 COMPLETE: Issue selected
   Selected: #<N> <name>
   Reason: <1-2 sentences>
-  Investigation approach: <brief plan>
 ```
 
 ---
@@ -235,16 +201,12 @@ PHASE 2 COMPLETE: Issue selected
 ### Phase 3: Investigate
 
 **3.1 Reproduce and characterize.** Gather failure logs, identify the
-exact failing test/component, and determine the failure mode. For test
-failures, attempt local reproduction using the build-and-run commands
-above. Many CI failures are flaky or architecture-specific (e.g.,
-s390x), so reproduction may not succeed — that is expected. Record the
-result either way; inability to reproduce locally is useful information
-(suggests a race, arch-specific behavior, or environment dependency).
+exact failing test/component and failure mode. For test failures,
+attempt local reproduction. Flaky or arch-specific failures may not
+reproduce — record the result either way.
 
-**3.2 Root cause analysis.** Read the test code and the kernel code it
-exercises. Use semcode for functions/callers/call chains. Check git
-history for recent changes. Search lore for related discussions.
+**3.2 Root cause analysis.** Read test and kernel code. Use semcode
+for functions/callers/call chains. Check git history. Search lore.
 
 Checklist:
 - [ ] Failure logs from multiple CI runs
@@ -254,22 +216,16 @@ Checklist:
 - [ ] Root cause identified or best theory documented
 
 **3.3 Develop fix (if warranted).** Write and test the fix if
-possible. For code fixes, attempt to verify by building and running
-the failing test. For flaky tests, the test may still not fail
-deterministically after the fix — that is OK; verify the fix is
-logically correct by code inspection. For CI config changes, verify by
-examining the configuration logic.
-
-**3.4 Decide whether to report.** Not every investigation leads to a
-report. After completing the investigation, decide whether the issue
-is worth reporting. **Do NOT generate output** if:
-- The issue turned out to be a one-off that is no longer reproducing
-- The issue was already fixed upstream (add it to the skip list in
-  NOTES.md instead)
-- The root cause is unclear AND you have no actionable recommendation
-
-If you decide not to report, skip Phase 4 output (steps 4.1 and 4.2)
-but still update NOTES.md (step 4.3) with what you found.
+possible. For flaky tests, verify the fix is logically correct by
+code inspection. For CI config changes, verify by examining the
+configuration logic.
+
+**3.4 Decide whether to report.** **Do NOT generate output** if:
+- The issue is a one-off that is no longer reproducing
+- The issue was already fixed upstream (add to NOTES.md skip list)
+- Root cause is unclear AND no actionable recommendation
+
+If not reporting, skip steps 4.1–4.2 but still update NOTES.md.
 
 ```
 PHASE 3 COMPLETE: Investigation finished
@@ -281,7 +237,7 @@ PHASE 3 COMPLETE: Investigation finished
 
 ### Phase 4: Generate Output
 
-**4.1** Create `output/summary.md` as a GitHub issue with these sections:
+**4.1** Create `output/summary.md` as a GitHub issue:
 
 ```markdown
 ## Summary

From 836534de53c06bab798226554f9b832e2ab62425 Mon Sep 17 00:00:00 2001
From: Ihor Solodrai <ihor.solodrai@linux.dev>
Date: Tue, 24 Feb 2026 15:24:25 -0800
Subject: [PATCH 7/9] ai-agent.yml: Use torvalds/master to determine merge-base
 for semcode indexing

---
 .github/workflows/ai-agent.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/ai-agent.yml b/.github/workflows/ai-agent.yml
index ac4dc0fe..355f124e 100644
--- a/.github/workflows/ai-agent.yml
+++ b/.github/workflows/ai-agent.yml
@@ -138,7 +138,7 @@ jobs:
         run: |
           git remote add torvalds https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
           git fetch torvalds
-          MERGE_BASE=$(git merge-base v6.19 HEAD)
+          MERGE_BASE=$(git merge-base torvalds/master HEAD)
           rm -rf /ci/.semcode.db/lore
           ln -s /ci/.semcode.db .semcode.db
           semcode-index --git "${MERGE_BASE}..HEAD"

From 604c642f47933a5af4ed9acbacf3bbcd28cb4879 Mon Sep 17 00:00:00 2001
From: Ihor Solodrai <ihor.solodrai@linux.dev>
Date: Tue, 24 Feb 2026 16:26:57 -0800
Subject: [PATCH 8/9] Hints to avoid current directory confusion

---
 ci/claude/bpf-ci-agent.md | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/ci/claude/bpf-ci-agent.md b/ci/claude/bpf-ci-agent.md
index 878d9f09..ae2e4ec0 100644
--- a/ci/claude/bpf-ci-agent.md
+++ b/ci/claude/bpf-ci-agent.md
@@ -37,6 +37,11 @@ Kernel codebase itself.
    Do not rely solely on reading code and CI logs.
 6. Attempt to verify code fixes by building and running the relevant
    test. If the test is flaky, verify correctness by code inspection.
+7. **Never use `cd` in bash commands.** The working directory persists
+   between commands. Use `git -C <path>` for git operations in
+   companion repos, or absolute paths. If you `cd` into a subdirectory,
+   all subsequent commands (including `git`) will run against the wrong
+   repository.
 
 ---
 
@@ -202,14 +207,18 @@ PHASE 2 COMPLETE: Issue selected
 
 **3.1 Reproduce and characterize.** Gather failure logs, identify the
 exact failing test/component and failure mode. For test failures,
-attempt local reproduction. Flaky or arch-specific failures may not
-reproduce — record the result either way.
+attempt local reproduction using the build and vmtest commands from
+the Workspace section. Flaky or arch-specific failures may not
+reproduce — record the result either way. If you skip reproduction,
+state why (e.g., "infra issue, not a test failure" or "requires
+s390x hardware").
 
 **3.2 Root cause analysis.** Read test and kernel code. Use semcode
 for functions/callers/call chains. Check git history. Search lore.
 
 Checklist:
 - [ ] Failure logs from multiple CI runs
+- [ ] Reproduction attempted (or reason for skipping stated)
 - [ ] Test and kernel code read
 - [ ] Git history checked
 - [ ] Lore checked
@@ -229,6 +238,7 @@ If not reporting, skip steps 4.1–4.2 but still update NOTES.md.
 
 ```
 PHASE 3 COMPLETE: Investigation finished
+  Reproduction: <reproduced | not reproduced | skipped: reason>
   Root cause: <identified | theory | unknown>
   Fix: <patch ready | recommendation | needs upstream work | not reporting>
 ```
@@ -284,8 +294,9 @@ PHASE 4 COMPLETE: Output generated
 | Tool | Error | Action |
 |------|-------|--------|
 | semcode lore | Error or empty | Retry once → `lei` CLI → `git log --grep`. Max 3 total attempts per query. |
-| semcode code | Error | Fall back to grep/find. |
+| semcode code | Error | Verify cwd with `pwd` (must be Linux repo root). Fall back to grep/find. |
 | `gh run view` | Rate limit or error | Wait 10s, retry once. If still failing, skip that run. |
 | `gh issue list` | Error | Retry once. If failing, proceed with empty skip list. |
 | `lei` | Unavailable | Fall back to `git log --grep`. |
+| `git` | Unexpected output | Run `pwd` to verify cwd is the Linux repo root. If wrong, run `cd $GITHUB_WORKSPACE` to return to the workspace root. |
 | Build / vmtest | Failure | Record error, do not retry more than once. |

From 7a02fdcb234aeeac723016adb6ba0bd75ba1306f Mon Sep 17 00:00:00 2001
From: Ihor Solodrai <ihor.solodrai@linux.dev>
Date: Tue, 24 Feb 2026 16:40:49 -0800
Subject: [PATCH 9/9] Misc fixes

---
 .github/workflows/ai-agent.yml | 2 +-
 ci/claude/bpf-ci-agent.md      | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/ai-agent.yml b/.github/workflows/ai-agent.yml
index 355f124e..c6b10e78 100644
--- a/.github/workflows/ai-agent.yml
+++ b/.github/workflows/ai-agent.yml
@@ -9,7 +9,7 @@ permissions:
 
 on:
   schedule:
-    - cron: '0 12 * * 1,3,5'  # Mon/Wed/Fri at ~4am Pacific Time
+    - cron: '0 12 * * 1'  # Monday at ~4am Pacific Time
   workflow_dispatch:
   pull_request:
     paths:
diff --git a/ci/claude/bpf-ci-agent.md b/ci/claude/bpf-ci-agent.md
index ae2e4ec0..f973f6be 100644
--- a/ci/claude/bpf-ci-agent.md
+++ b/ci/claude/bpf-ci-agent.md
@@ -250,6 +250,8 @@ PHASE 3 COMPLETE: Investigation finished
 **4.1** Create `output/summary.md` as a GitHub issue:
 
 ```markdown
+# <short descriptive title for the issue>
+
 ## Summary
 <1-3 sentences>