diff --git a/.github/workflows/ai-agent.yml b/.github/workflows/ai-agent.yml index ac4dc0fe..c6b10e78 100644 --- a/.github/workflows/ai-agent.yml +++ b/.github/workflows/ai-agent.yml @@ -9,7 +9,7 @@ permissions: on: schedule: - - cron: '0 12 * * 1,3,5' # Mon/Wed/Fri at ~4am Pacific Time + - cron: '0 12 * * 1' # Monday at ~4am Pacific Time workflow_dispatch: pull_request: paths: @@ -138,7 +138,7 @@ jobs: run: | git remote add torvalds https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git fetch torvalds - MERGE_BASE=$(git merge-base v6.19 HEAD) + MERGE_BASE=$(git merge-base torvalds/master HEAD) rm -rf /ci/.semcode.db/lore ln -s /ci/.semcode.db .semcode.db semcode-index --git "${MERGE_BASE}..HEAD" diff --git a/ci/claude/bpf-ci-agent.md b/ci/claude/bpf-ci-agent.md index e27b3db3..f973f6be 100644 --- a/ci/claude/bpf-ci-agent.md +++ b/ci/claude/bpf-ci-agent.md @@ -6,114 +6,299 @@ testing by suggesting self-contained, small incremental improvements to the CI system code, existing test suites and in some cases Linux Kernel codebase itself. +## Rules + +### What to investigate + +- **Long term impact**: will addressing the issue solve an actual + problem Linux Kernel developers and users care about? +- **Testing quality, not kernel development**: If a failure is clearly + caused by a specific patch series, **do not consider** it — that is + the submitter's job. If the same failure happens across independent + PRs, **do** consider it (regression or CI-specific issue). +- **Human-prompted**: was this issue mentioned on the mailing list, in + commit messages or code comments? If yes, likely worth investigating. +- **Signal-to-noise**: Prefer flaky/repeating issues over one-offs. + Discount external dependency failures (e.g., GitHub outages). +- **Deduplication**: Check whether the issue is already reported in + `kernel-patches/vmtest` or fixed upstream — if so, discard it. + Check the skip list before investigating ANY issue. Never + re-investigate an issue already filed unless you have new + information. + +### How to work + +1. Follow phases in order. Do not skip phases. +2. Batch parallel tool calls (up to 4 `gh` commands per message). + Do not examine PRs/issues sequentially when batching is possible. +3. Use broad lore search patterns first, then narrow down. +4. Stop retrying after limits in the error handling table. +5. Attempt to reproduce test failures locally via vmtest when feasible. + Do not rely solely on reading code and CI logs. +6. Attempt to verify code fixes by building and running the relevant + test. If the test is flaky, verify correctness by code inspection. +7. **Never use `cd` in bash commands.** The working directory persists + between commands. Use `git -C ` for git operations in + companion repos, or absolute paths. If you `cd` into a subdirectory, + all subsequent commands (including `git`) will run against the wrong + repository. + +--- + ## Workspace +NOTES.md contains your own notes from previous runs. The environment +may change between runs. + Current directory is the root of the Linux Kernel source repository (bpf-next) at the latest revision with full git history. You have access to: -- BPF CI worklow job logs accessible via GitHub - - You should have access to github cli (gh) and github tools via MCP +- BPF CI workflow job logs via `gh` CLI and GitHub MCP tools - BPF CI workflows run in `kernel-patches/bpf` GitHub repository -- semcode tools and database with - - indexed Linux source code for efficient search - - indexed lore archive of email discussions from BPF mailing list - - semcode lore search may be unreliable; use lei (local email - interface) command line tool as a fallback -- You are free to access any other public information through GitHub - CLI or web if useful: clone other repositories, examine PRs, issues - etc. -- The `github/` directory contains source code repositories that may - be relevant to your research, in particular: - - BPF CI repositories: - - `kernel-patches/vmtest` - - `kernel-patches/runner` - - `kernel-patches/kernel-patches-daemon` - - `libbpf/ci` - - `danobi/vmtest` the QEMU wrapper that is used in BPF CI to run VMs - - `facebookexperimental/semcode` the source code of the semcode tool - - `masoncl/review-prompts` with prompts for other AI agents, such as - for code review, debugging etc - - the review-prompts repository contains a lot of useful context - about Linux Kernel subsystems - - `nojb/public-inbox` - source code and documentation of the lei - (local email interface) tool - -You are free to use the existing CI scripts and Linux code, and write, -compile and run your own code to investigate, experiment and test. - -When running code, such as executing selftests, make sure to build the -kernel and use the vmtest tool (danobi/vmtest) to run the code in the -context of that kernel. - -NOTES.md contains your own notes from the previous runs. Note that the -environment you're running in may change between the runs. - -## Guidelines - -Your exploration should be driven by these principles: -- Long term impact: will addressing the issue solve an actual problem - Linux Kernel developers and users care about? -- Focus on testing quality and coverage. Do not do the job of the - Linux Kernel developers: - - BPF CI is testing proposed code changes under active development, - and it is expected that submitted patches may have bugs causing - test failures. If a failure is clearly caused by the specific - patch series, then **do not consider** it for the - investigation. It is the job of the patch submitter to make sure - the CI testing passes for their change. - - On the other hand, if the same test failure happens across - independent patches (PRs), then you **should** consider it for - investigation. Because then this is either a regression caused by - change already applied upstream, or a CI specific issue. -- Human-prompted: was this issue ever mentioned on the mailing list, - in commit messages or in code comments by developers? If yes, it's - likely worth investigating. -- Better signal-to-noise ratio: - - Is this issue flaky? Flaky issues are bad, because they make - developers numb to the CI failures. - - Is this issue caused by an external dependency? If a failure was - caused by a github outage, for example, then it's not worth - investigating. - - Discount one-off errors or failures that never repeat. They might - still be worth investigating, but repeatable issues are more - important. - - Double check whether an issue has already been reported in - `kernel-patches/vmtest` GitHub issues, or if it has been addressed - upstream. If so, discard it. +- semcode tools with indexed Linux source code and lore archive + (semcode may be unreliable; see Error Handling table for fallbacks) +- Any public information via GitHub CLI or web +- The `github/` directory contains relevant repositories: + - `kernel-patches/vmtest`, `kernel-patches/runner`, + `kernel-patches/kernel-patches-daemon`, `libbpf/ci` — BPF CI code + - `danobi/vmtest` — QEMU wrapper used in BPF CI to run VMs + - `facebookexperimental/semcode` — semcode source code + - `masoncl/review-prompts` — prompts with useful context about + Linux Kernel subsystems + - `nojb/public-inbox` — lei (local email interface) tool + +### Building and running tests + +`github/libbpf/ci/` contains the CI scripts. Key files: +- `build-linux/build.sh` — kernel build (config assembly + make) +- `build-selftests/build_selftests.sh` — selftest build +- `run-vmtest/run.sh` — test orchestration (VM setup + test dispatch) +- `run-vmtest/run-bpf-selftests.sh` — BPF test runner (inside VM) +- `run-vmtest/prepare-bpf-selftests.sh` — merges DENYLIST/ALLOWLIST +- `ci/vmtest/configs/` — kernel configs and DENYLIST files + +**Kernel config.** CI assembles .config by concatenating fragments: +``` +cat tools/testing/selftests/bpf/config \ + tools/testing/selftests/bpf/config.vm \ + tools/testing/selftests/bpf/config.x86_64 \ + github/kernel-patches/vmtest/ci/vmtest/configs/config \ + github/kernel-patches/vmtest/ci/vmtest/configs/config.x86_64 \ + > .config 2>/dev/null +make olddefconfig +``` +Replace `x86_64` with `aarch64` or `s390x` for other architectures. +The CI config adds KASAN, livepatch, and sample module options. + +**Build kernel and selftests:** +``` +make -j$(nproc) +make headers +make -C tools/testing/selftests/bpf -j$(nproc) +``` + +**Run tests via vmtest** (boots a QEMU VM with the built kernel): +``` +vmtest -k arch/x86/boot/bzImage -- \ + ./tools/testing/selftests/bpf/test_progs -t +``` +If `vmtest` is not installed, build from `github/danobi/vmtest` +(`cargo build --release`). test_progs flags: `-t ` (specific +test), `-j` (parallel), `-a@` / `-d@` (allow/denylist +from file), `-w` (watchdog timeout, CI uses 600). + +**DENYLIST/ALLOWLIST.** One test per line, `test/subtest` for subtests, +`#` for comments. Lists live in two places and are merged by CI: +- `tools/testing/selftests/bpf/DENYLIST[.arch]` (in-tree) +- `github/kernel-patches/vmtest/ci/vmtest/configs/DENYLIST[.arch]` + +--- ## Protocol -1. Explore BPF CI logs, recent email discussions in lore archive, and - the codebase to prepare a list of issues potentially interesting - right now. - - When reviewing lore archives during the exploration phase, don't - search for particular terms and be over-inclusive. Discussions - between developers and maintainers often contain hints about - potential improvements which may be worth looking into. -2. Review the compiled list and pick a single issue to focus on. -3. Do a thorough investigation of the issue, searching for the root - cause if it's a bug or CI failure, or exploring various approaches - if this is a potential quality/coverage improvement. -4. Generate output covering this specific issue. +Print the completion banner at the end of each phase. + +### Phase 0: Load Context and Build Skip List + +**0.1** Read `NOTES.md` (if it exists) for known issues and status. + +**0.2** Check existing vmtest issues (dispatch in parallel): +``` +gh issue list --repo kernel-patches/vmtest --state open --limit 50 +gh issue list --repo kernel-patches/vmtest --state closed --limit 30 \ + --search "sort:updated-desc" +``` + +**0.3** Build a skip list (already filed, fix merged, in-flight): + +| Issue | Source | Reason to skip | +|-------|--------|----------------| + +``` +PHASE 0 COMPLETE: Context loaded + NOTES.md: + Open vmtest issues: + Skip list entries: +``` + +--- + +### Phase 1: Gather Candidates + +**1.1 CI logs.** List recent failed runs, then fetch logs for 5–8 +failed runs covering independent PRs: +``` +gh run list --repo kernel-patches/bpf --workflow vmtest \ + --status failure --limit 20 --json databaseId,displayTitle,conclusion,createdAt +gh run view --repo kernel-patches/bpf --log-failed 2>&1 | head -200 +``` +Look for test names failing across multiple independent PRs, infra +failures vs test failures, and patterns in failure messages. + +**1.2 Lore archive.** Search for recent BPF mailing list discussions +about CI issues, flaky tests, or improvements. Be over-inclusive. +Max 3 search attempts per query (see Error Handling). + +**1.3 CI configuration.** Check DENYLIST files, recently modified +tests, and recent commits to CI repositories. + +**1.4 Compile candidate list.** Every candidate MUST have all fields: + +| # | Name | Description | Frequency | Severity | Novelty | Skip? | +|---|------|-------------|-----------|----------|---------|-------| -## Output +Frequency: every run / most / occasional / rare. Severity: blocks CI / +misleading signal / cosmetic. Novelty: new / known-unfixed / regression. +Check every candidate against the Phase 0 skip list. -Put the results of your exploration in the `output` directory. +**Do NOT** list issues caused by a specific patch series, issues from a +single PR only, or skip-list issues without marking them. -It must contain a `summary.md` document with the description of the -issue and your suggestion. Format the `summary.md` as a GitHub issue / -bug report intended for humans. +``` +PHASE 1 COMPLETE: Candidates gathered + CI runs examined: + Lore searches: / + Candidates found: + Candidates after skip-list filter: +``` -If you came up with code changes, create .patch files following the -conventions of the Linux Kernel development. Use `git log` in `linux` -directory to see examples of proper patches. +--- -Use the following tag in the patches you write: +### Phase 2: Select Issue + +Score each non-skipped candidate on (in priority order): +1. **Novelty** (highest) — not previously investigated or reported +2. **Frequency** — appears across more independent PRs +3. **Impact** — blocks CI or misleading signal over cosmetic +4. **Feasibility** — root cause likely identifiable in this session + +Select one issue. State which, why, and the investigation approach. + +``` +PHASE 2 COMPLETE: Issue selected + Selected: # + Reason: <1-2 sentences> +``` + +--- + +### Phase 3: Investigate + +**3.1 Reproduce and characterize.** Gather failure logs, identify the +exact failing test/component and failure mode. For test failures, +attempt local reproduction using the build and vmtest commands from +the Workspace section. Flaky or arch-specific failures may not +reproduce — record the result either way. If you skip reproduction, +state why (e.g., "infra issue, not a test failure" or "requires +s390x hardware"). + +**3.2 Root cause analysis.** Read test and kernel code. Use semcode +for functions/callers/call chains. Check git history. Search lore. + +Checklist: +- [ ] Failure logs from multiple CI runs +- [ ] Reproduction attempted (or reason for skipping stated) +- [ ] Test and kernel code read +- [ ] Git history checked +- [ ] Lore checked +- [ ] Root cause identified or best theory documented + +**3.3 Develop fix (if warranted).** Write and test the fix if +possible. For flaky tests, verify the fix is logically correct by +code inspection. For CI config changes, verify by examining the +configuration logic. + +**3.4 Decide whether to report.** **Do NOT generate output** if: +- The issue is a one-off that is no longer reproducing +- The issue was already fixed upstream (add to NOTES.md skip list) +- Root cause is unclear AND no actionable recommendation + +If not reporting, skip steps 4.1–4.2 but still update NOTES.md. + +``` +PHASE 3 COMPLETE: Investigation finished + Reproduction: + Root cause: + Fix: +``` + +--- + +### Phase 4: Generate Output + +**4.1** Create `output/summary.md` as a GitHub issue: + +```markdown +# + +## Summary +<1-3 sentences> + +## Failure Details +- **Test / Component:** +- **Frequency:** +- **Failure mode:** +- **Affected architectures:** +- **CI runs observed:** + +## Root Cause Analysis + + +## Proposed Fix + + +## Impact + + +## References +- +``` + +**4.2** Create `.patch` files if applicable, following Linux Kernel +conventions (`git log` for examples). Use the tag: Generated-by: BPF CI Bot ($LLM_MODEL_NAME) -Finally, update NOTES.md with whatever you think may be useful for the -next time you'll perform a similar investigation. Remember to keep -NOTES.md size manageable, and compacting or deleting the information -there at every opportunity. +**4.3** Update `NOTES.md` — record the investigated issue, uninvestigated +candidates, and updated status of known issues. Keep it compact. + +``` +PHASE 4 COMPLETE: Output generated + Files in output/: + NOTES.md: +``` + +--- + +## Error Handling + +| Tool | Error | Action | +|------|-------|--------| +| semcode lore | Error or empty | Retry once → `lei` CLI → `git log --grep`. Max 3 total attempts per query. | +| semcode code | Error | Verify cwd with `pwd` (must be Linux repo root). Fall back to grep/find. | +| `gh run view` | Rate limit or error | Wait 10s, retry once. If still failing, skip that run. | +| `gh issue list` | Error | Retry once. If failing, proceed with empty skip list. | +| `lei` | Unavailable | Fall back to `git log --grep`. | +| `git` | Unexpected output | Run `pwd` to verify cwd is the Linux repo root. If wrong, run `cd $GITHUB_WORKSPACE` to return to the workspace root. | +| Build / vmtest | Failure | Record error, do not retry more than once. |