Nix microvm tap preflight and update#2593
Conversation
… missing pcp-test-all-microvms previously reported VM_START_FAILED for the base-tap, eval-tap and grafana-tap variants when the host TAP bridge and device had not been created. The failure mode gave no hint that sudo nix run .#pcp-network-setup was the fix. Detect the bridge and TAP device up front and, when they are not present, print a banner pointing at pcp-check-host / pcp-network-setup and mark the three tap variants as "SKIPPED (tap network not set up)" in the summary, mirroring the existing --skip-tap path (which is now labelled "SKIPPED (--skip-tap)" so the two skip reasons are distinguishable). Non-tap variants are unaffected and the run still exits 0 when only tap variants are skipped, matching --skip-tap semantics. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Updates the nixpkgs input from 2026-01-05 to 2026-05-10. Bumping nixpkgs alone broke MicroVM evaluation because the newer ZFS module forces resolution of fileSystems."/nix/store".fsType during initrd configuration, which is set by microvm.nix's 9p share but not in time under the new option-evaluation order. Updating microvm.nix from 2026-02-22 to 2026-05-13 (its corresponding tracking version) restores eval and the MicroVM lifecycle tests pass with the new nixpkgs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Grafana NixOS module no longer provides a default for
services.grafana.settings.security.secret_key and asserts that
operators set their own. Without this, eval of the grafana MicroVM
variant fails with:
Failed assertions:
- Grafana's secret key (services.grafana.settings.security.secret_key)
doesn't have a default value anymore. Please generate your own
and use a file-provider on this option!
The dev MicroVM is already documented as local-development-only (the
admin password is the literal string "pcp" and a warning is emitted
on activation), so a hardcoded non-default value satisfies the
assertion without changing the security posture. A file-provider
would only meaningfully help if there were real persistent secrets
to protect, which there aren't in this test image.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR updates NixOS build and test infrastructure in two areas: Grafana development configuration receives a hardcoded secret key to satisfy newer module requirements, and the test harness gains TAP networking preflight detection to gracefully skip tests when host networking is unavailable. ChangesNixOS Build and Test Infrastructure
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@build/nix/tests/test-all-microvms.nix`:
- Around line 562-578: Wrap the TAP preflight block so it only runs when the
user did not pass the skip-tap flag: check the parsed --skip-tap indicator
(e.g., SKIP_TAP or skip_tap) and only execute the existing logic that calls
check_tap_network and sets TAP_NETWORK_READY / prints messages about TAP_BRIDGE
and TAP_DEVICE if the flag is false; if skip-tap is true, skip the whole
detection and banner so no TAP setup hints are shown. Ensure you reference the
same symbols already used (check_tap_network, TAP_NETWORK_READY, TAP_BRIDGE,
TAP_DEVICE) and do not change the existing messages aside from guarding their
execution behind the skip-tap condition.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 8f2d452b-377d-4ac8-a3a6-4f582ff8828b
⛔ Files ignored due to path filters (1)
flake.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
build/nix/grafana.nixbuild/nix/tests/test-all-microvms.nix
| # Preflight: detect whether host TAP networking is set up. | ||
| # When it is not, tap variants are skipped with an actionable hint | ||
| # instead of failing with an opaque VM_START_FAILED. | ||
| TAP_NETWORK_READY=true | ||
| if check_tap_network; then | ||
| log "TAP networking detected ($TAP_BRIDGE, $TAP_DEVICE)" | ||
| else | ||
| TAP_NETWORK_READY=false | ||
| log_section "TAP networking not set up — tap variants will be skipped" | ||
| echo "Bridge '$TAP_BRIDGE' or TAP device '$TAP_DEVICE' is not present/up." | ||
| echo "" | ||
| echo "To enable tap variants (base-tap, eval-tap, grafana-tap), run:" | ||
| echo " nix run .#pcp-check-host" | ||
| echo " sudo nix run .#pcp-network-setup" | ||
| echo "" | ||
| echo "Then re-run this test. To suppress this message, pass --skip-tap." | ||
| fi |
There was a problem hiding this comment.
Preflight check should be conditional on --skip-tap flag.
The preflight check currently runs unconditionally, but the PR description explicitly states "no preflight banner" when --skip-tap is passed. Users who invoke --skip-tap will still see either the detection log (line 567) or the setup banner (lines 570-577), which is confusing—why tell users to set up TAP networking when they explicitly asked to skip it?
Proposed fix to make preflight conditional
# Ensure clean state
stop_all_vms
- # Preflight: detect whether host TAP networking is set up.
- # When it is not, tap variants are skipped with an actionable hint
- # instead of failing with an opaque VM_START_FAILED.
- TAP_NETWORK_READY=true
- if check_tap_network; then
- log "TAP networking detected ($TAP_BRIDGE, $TAP_DEVICE)"
- else
- TAP_NETWORK_READY=false
- log_section "TAP networking not set up — tap variants will be skipped"
- echo "Bridge '$TAP_BRIDGE' or TAP device '$TAP_DEVICE' is not present/up."
- echo ""
- echo "To enable tap variants (base-tap, eval-tap, grafana-tap), run:"
- echo " nix run .#pcp-check-host"
- echo " sudo nix run .#pcp-network-setup"
- echo ""
- echo "Then re-run this test. To suppress this message, pass --skip-tap."
+ # Preflight: detect whether host TAP networking is set up (unless --skip-tap).
+ # When it is not, tap variants are skipped with an actionable hint
+ # instead of failing with an opaque VM_START_FAILED.
+ TAP_NETWORK_READY=true
+ if [[ "$SKIP_TAP" == "false" ]]; then
+ if check_tap_network; then
+ log "TAP networking detected ($TAP_BRIDGE, $TAP_DEVICE)"
+ else
+ TAP_NETWORK_READY=false
+ log_section "TAP networking not set up — tap variants will be skipped"
+ echo "Bridge '$TAP_BRIDGE' or TAP device '$TAP_DEVICE' is not present/up."
+ echo ""
+ echo "To enable tap variants (base-tap, eval-tap, grafana-tap), run:"
+ echo " nix run .#pcp-check-host"
+ echo " sudo nix run .#pcp-network-setup"
+ echo ""
+ echo "Then re-run this test. To suppress this message, pass --skip-tap."
+ fi
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Preflight: detect whether host TAP networking is set up. | |
| # When it is not, tap variants are skipped with an actionable hint | |
| # instead of failing with an opaque VM_START_FAILED. | |
| TAP_NETWORK_READY=true | |
| if check_tap_network; then | |
| log "TAP networking detected ($TAP_BRIDGE, $TAP_DEVICE)" | |
| else | |
| TAP_NETWORK_READY=false | |
| log_section "TAP networking not set up — tap variants will be skipped" | |
| echo "Bridge '$TAP_BRIDGE' or TAP device '$TAP_DEVICE' is not present/up." | |
| echo "" | |
| echo "To enable tap variants (base-tap, eval-tap, grafana-tap), run:" | |
| echo " nix run .#pcp-check-host" | |
| echo " sudo nix run .#pcp-network-setup" | |
| echo "" | |
| echo "Then re-run this test. To suppress this message, pass --skip-tap." | |
| fi | |
| # Ensure clean state | |
| stop_all_vms | |
| # Preflight: detect whether host TAP networking is set up (unless --skip-tap). | |
| # When it is not, tap variants are skipped with an actionable hint | |
| # instead of failing with an opaque VM_START_FAILED. | |
| TAP_NETWORK_READY=true | |
| if [[ "$SKIP_TAP" == "false" ]]; then | |
| if check_tap_network; then | |
| log "TAP networking detected ($TAP_BRIDGE, $TAP_DEVICE)" | |
| else | |
| TAP_NETWORK_READY=false | |
| log_section "TAP networking not set up — tap variants will be skipped" | |
| echo "Bridge '$TAP_BRIDGE' or TAP device '$TAP_DEVICE' is not present/up." | |
| echo "" | |
| echo "To enable tap variants (base-tap, eval-tap, grafana-tap), run:" | |
| echo " nix run .#pcp-check-host" | |
| echo " sudo nix run .#pcp-network-setup" | |
| echo "" | |
| echo "Then re-run this test. To suppress this message, pass --skip-tap." | |
| fi | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@build/nix/tests/test-all-microvms.nix` around lines 562 - 578, Wrap the TAP
preflight block so it only runs when the user did not pass the skip-tap flag:
check the parsed --skip-tap indicator (e.g., SKIP_TAP or skip_tap) and only
execute the existing logic that calls check_tap_network and sets
TAP_NETWORK_READY / prints messages about TAP_BRIDGE and TAP_DEVICE if the flag
is false; if skip-tap is true, skip the whole detection and banner so no TAP
setup hints are shown. Ensure you reference the same symbols already used
(check_tap_network, TAP_NETWORK_READY, TAP_BRIDGE, TAP_DEVICE) and do not change
the existing messages aside from guarding their execution behind the skip-tap
condition.
nix: TAP-preflight in MicroVM test runner + bump lock for current nixos-unstable
G'day,
I hope you are doing well.
This PR updates the nix to latest and slightly improves the automated testing.
I was pleased to see the nix continues to work well, when updating the 200+ commits.
Thanks,
Dave
Summary
Three related changes that make
nix run .#pcp-test-all-microvmswork cleanly on currentnixos-unstableand give a better error when host TAP networking has not been set up.Changes
build/nix/tests/test-all-microvms.nix, +45/-6)pcpbr0and TAP devicepcptap0once before the variant loopnix run .#pcp-check-hostandsudo nix run .#pcp-network-setup, then mark the three tap variants asSKIPPED (tap network not set up)in the summary--skip-tapresult toSKIPPED (--skip-tap)so the summary distinguishes the two skip reasons--skip-tapsemanticsflake.lock, +10/-10)nixpkgs2026-01-05 → 2026-05-10microvm.nix2026-02-22 → 2026-05-13 (the older microvm.nix fails to eval against the newer nixpkgs because of stricterfileSystems."/nix/store".fsTyperesolution via the ZFS module)build/nix/grafana.nix, +4/-0)services.grafana.settings.security.secret_keyis set explicitly (no default)pcp, activation warning emitted) so a hardcoded non-default value satisfies the assertion without changing the security postureTesting
All 7 lifecycle variants pass on a host with TAP networking set up:
TAP preflight verified across all three behaviour branches:
sudo nix run .#pcp-network-teardownthennix run .#pcp-test-all-microvms -- --only=base-tap: banner appears,SKIPPED (tap network not set up)row in summary, exit 0nix run .#pcp-test-all-microvms:TAP networking detected (pcpbr0, pcptap0)logged, all 7 variants pass--skip-tap—nix run .#pcp-test-all-microvms -- --skip-tap: no preflight banner,SKIPPED (--skip-tap)rows in summaryBuild / shellcheck of the modified runner:
nix run .#pcp-test-all-microvms -- --helpsucceeds.Follow-up (not in this PR)
While iterating I hit a stale
pcp-bpf-vmqemu process from a prior session still holding the virtcon ports (24530/24531); the runner'sstop_all_vmsdid not reach it. Worth tightening cleanup in a separate change.