Skip to content

Nix microvm tap preflight and update#2593

Open
randomizedcoder wants to merge 3 commits into
performancecopilot:mainfrom
randomizedcoder:nix-microvm-tap-preflight
Open

Nix microvm tap preflight and update#2593
randomizedcoder wants to merge 3 commits into
performancecopilot:mainfrom
randomizedcoder:nix-microvm-tap-preflight

Conversation

@randomizedcoder
Copy link
Copy Markdown
Contributor

@randomizedcoder randomizedcoder commented May 13, 2026

nix: TAP-preflight in MicroVM test runner + bump lock for current nixos-unstable

G'day,

I hope you are doing well.

This PR updates the nix to latest and slightly improves the automated testing.

I was pleased to see the nix continues to work well, when updating the 200+ commits.

Thanks,
Dave

Summary

Three related changes that make nix run .#pcp-test-all-microvms work cleanly on current nixos-unstable and give a better error when host TAP networking has not been set up.

Changes

  • TAP preflight in the test runner (build/nix/tests/test-all-microvms.nix, +45/-6)
    • Detect host bridge pcpbr0 and TAP device pcptap0 once before the variant loop
    • When absent, print an actionable banner pointing at nix run .#pcp-check-host and sudo nix run .#pcp-network-setup, then mark the three tap variants as SKIPPED (tap network not set up) in the summary
    • Rename the existing --skip-tap result to SKIPPED (--skip-tap) so the summary distinguishes the two skip reasons
    • Exit code unchanged — matches existing --skip-tap semantics
  • Bump flake inputs (flake.lock, +10/-10)
    • nixpkgs 2026-01-05 → 2026-05-10
    • microvm.nix 2026-02-22 → 2026-05-13 (the older microvm.nix fails to eval against the newer nixpkgs because of stricter fileSystems."/nix/store".fsType resolution via the ZFS module)
  • Grafana secret_key for new nixpkgs (build/nix/grafana.nix, +4/-0)
    • Newer NixOS Grafana module asserts services.grafana.settings.security.secret_key is set explicitly (no default)
    • MicroVM is documented dev-only (admin password is the literal pcp, activation warning emitted) so a hardcoded non-default value satisfies the assertion without changing the security posture

Testing

All 7 lifecycle variants pass on a host with TAP networking set up:

Variant Result Checks Duration
base PASSED 6 40s
base-tap PASSED 6 39s
eval PASSED 7 45s
eval-tap PASSED 7 40s
grafana PASSED 11 126s
grafana-tap PASSED 11 41s
bpf PASSED 8 41s

TAP preflight verified across all three behaviour branches:

  • TAP not set upsudo nix run .#pcp-network-teardown then nix run .#pcp-test-all-microvms -- --only=base-tap: banner appears, SKIPPED (tap network not set up) row in summary, exit 0
  • TAP set up, full runnix run .#pcp-test-all-microvms: TAP networking detected (pcpbr0, pcptap0) logged, all 7 variants pass
  • --skip-tapnix run .#pcp-test-all-microvms -- --skip-tap: no preflight banner, SKIPPED (--skip-tap) rows in summary

Build / shellcheck of the modified runner: nix run .#pcp-test-all-microvms -- --help succeeds.

Follow-up (not in this PR)

While iterating I hit a stale pcp-bpf-vm qemu process from a prior session still holding the virtcon ports (24530/24531); the runner's stop_all_vms did not reach it. Worth tightening cleanup in a separate change.

  • Added TAP network preflight detection to test runner: detects host bridge and TAP device availability before running variants; marks TAP variants as skipped when networking not available instead of failing, with actionable setup instructions
  • Updated flake inputs for nixos-unstable compatibility: bumped nixpkgs to 2026-05-10 and microvm.nix to 2026-05-13 to address stricter ZFS module option resolution
  • Set explicit Grafana secret_key for compatibility with newer NixOS Grafana module assertions

Review Change Stack

randomizedcoder and others added 3 commits May 13, 2026 09:36
… missing

pcp-test-all-microvms previously reported VM_START_FAILED for the
base-tap, eval-tap and grafana-tap variants when the host TAP bridge
and device had not been created. The failure mode gave no hint that
sudo nix run .#pcp-network-setup was the fix.

Detect the bridge and TAP device up front and, when they are not
present, print a banner pointing at pcp-check-host / pcp-network-setup
and mark the three tap variants as "SKIPPED (tap network not set up)"
in the summary, mirroring the existing --skip-tap path (which is now
labelled "SKIPPED (--skip-tap)" so the two skip reasons are
distinguishable). Non-tap variants are unaffected and the run still
exits 0 when only tap variants are skipped, matching --skip-tap
semantics.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Updates the nixpkgs input from 2026-01-05 to 2026-05-10. Bumping
nixpkgs alone broke MicroVM evaluation because the newer ZFS module
forces resolution of fileSystems."/nix/store".fsType during initrd
configuration, which is set by microvm.nix's 9p share but not in time
under the new option-evaluation order. Updating microvm.nix from
2026-02-22 to 2026-05-13 (its corresponding tracking version)
restores eval and the MicroVM lifecycle tests pass with the new
nixpkgs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Grafana NixOS module no longer provides a default for
services.grafana.settings.security.secret_key and asserts that
operators set their own. Without this, eval of the grafana MicroVM
variant fails with:

    Failed assertions:
    - Grafana's secret key (services.grafana.settings.security.secret_key)
      doesn't have a default value anymore. Please generate your own
      and use a file-provider on this option!

The dev MicroVM is already documented as local-development-only (the
admin password is the literal string "pcp" and a warning is emitted
on activation), so a hardcoded non-default value satisfies the
assertion without changing the security posture. A file-provider
would only meaningfully help if there were real persistent secrets
to protect, which there aren't in this test image.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

📝 Walkthrough

Walkthrough

This PR updates NixOS build and test infrastructure in two areas: Grafana development configuration receives a hardcoded secret key to satisfy newer module requirements, and the test harness gains TAP networking preflight detection to gracefully skip tests when host networking is unavailable.

Changes

NixOS Build and Test Infrastructure

Layer / File(s) Summary
Grafana development secret key
build/nix/grafana.nix
Grafana configuration now explicitly sets settings.security.secret_key to a hardcoded development-only value, as required by newer NixOS Grafana modules.
TAP networking detection infrastructure
build/nix/tests/test-all-microvms.nix
Test harness adds iproute2 runtime dependency, defines TAP_BRIDGE and TAP_DEVICE configuration variables, and implements check_tap_network() helper to verify host TAP device existence via ip link show.
TAP variant conditional skipping
build/nix/tests/test-all-microvms.nix
Test script executes TAP preflight check before running variants, sets TAP_NETWORK_READY flag, logs setup instructions when TAP is unavailable, and skips TAP variants either via --skip-tap flag or when preflight detection fails, recording skipped results with reasons.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A secret key for Grafana's heart,
And TAP networks ready to start,
With preflight checks in place,
Tests skip with grace,
When hardware's missing—tests depart!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Nix microvm tap preflight and update' directly and specifically describes the main changes in the PR: adding TAP preflight detection logic to the test runner and updating Nix configurations for compatibility with newer nixos-unstable.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@build/nix/tests/test-all-microvms.nix`:
- Around line 562-578: Wrap the TAP preflight block so it only runs when the
user did not pass the skip-tap flag: check the parsed --skip-tap indicator
(e.g., SKIP_TAP or skip_tap) and only execute the existing logic that calls
check_tap_network and sets TAP_NETWORK_READY / prints messages about TAP_BRIDGE
and TAP_DEVICE if the flag is false; if skip-tap is true, skip the whole
detection and banner so no TAP setup hints are shown. Ensure you reference the
same symbols already used (check_tap_network, TAP_NETWORK_READY, TAP_BRIDGE,
TAP_DEVICE) and do not change the existing messages aside from guarding their
execution behind the skip-tap condition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8f2d452b-377d-4ac8-a3a6-4f582ff8828b

📥 Commits

Reviewing files that changed from the base of the PR and between 57844fc and d1a1cbc.

⛔ Files ignored due to path filters (1)
  • flake.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • build/nix/grafana.nix
  • build/nix/tests/test-all-microvms.nix

Comment on lines +562 to +578
# Preflight: detect whether host TAP networking is set up.
# When it is not, tap variants are skipped with an actionable hint
# instead of failing with an opaque VM_START_FAILED.
TAP_NETWORK_READY=true
if check_tap_network; then
log "TAP networking detected ($TAP_BRIDGE, $TAP_DEVICE)"
else
TAP_NETWORK_READY=false
log_section "TAP networking not set up — tap variants will be skipped"
echo "Bridge '$TAP_BRIDGE' or TAP device '$TAP_DEVICE' is not present/up."
echo ""
echo "To enable tap variants (base-tap, eval-tap, grafana-tap), run:"
echo " nix run .#pcp-check-host"
echo " sudo nix run .#pcp-network-setup"
echo ""
echo "Then re-run this test. To suppress this message, pass --skip-tap."
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preflight check should be conditional on --skip-tap flag.

The preflight check currently runs unconditionally, but the PR description explicitly states "no preflight banner" when --skip-tap is passed. Users who invoke --skip-tap will still see either the detection log (line 567) or the setup banner (lines 570-577), which is confusing—why tell users to set up TAP networking when they explicitly asked to skip it?

Proposed fix to make preflight conditional
   # Ensure clean state
   stop_all_vms
 
-  # Preflight: detect whether host TAP networking is set up.
-  # When it is not, tap variants are skipped with an actionable hint
-  # instead of failing with an opaque VM_START_FAILED.
-  TAP_NETWORK_READY=true
-  if check_tap_network; then
-    log "TAP networking detected ($TAP_BRIDGE, $TAP_DEVICE)"
-  else
-    TAP_NETWORK_READY=false
-    log_section "TAP networking not set up — tap variants will be skipped"
-    echo "Bridge '$TAP_BRIDGE' or TAP device '$TAP_DEVICE' is not present/up."
-    echo ""
-    echo "To enable tap variants (base-tap, eval-tap, grafana-tap), run:"
-    echo "    nix run .#pcp-check-host"
-    echo "    sudo nix run .#pcp-network-setup"
-    echo ""
-    echo "Then re-run this test. To suppress this message, pass --skip-tap."
+  # Preflight: detect whether host TAP networking is set up (unless --skip-tap).
+  # When it is not, tap variants are skipped with an actionable hint
+  # instead of failing with an opaque VM_START_FAILED.
+  TAP_NETWORK_READY=true
+  if [[ "$SKIP_TAP" == "false" ]]; then
+    if check_tap_network; then
+      log "TAP networking detected ($TAP_BRIDGE, $TAP_DEVICE)"
+    else
+      TAP_NETWORK_READY=false
+      log_section "TAP networking not set up — tap variants will be skipped"
+      echo "Bridge '$TAP_BRIDGE' or TAP device '$TAP_DEVICE' is not present/up."
+      echo ""
+      echo "To enable tap variants (base-tap, eval-tap, grafana-tap), run:"
+      echo "    nix run .#pcp-check-host"
+      echo "    sudo nix run .#pcp-network-setup"
+      echo ""
+      echo "Then re-run this test. To suppress this message, pass --skip-tap."
+    fi
   fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Preflight: detect whether host TAP networking is set up.
# When it is not, tap variants are skipped with an actionable hint
# instead of failing with an opaque VM_START_FAILED.
TAP_NETWORK_READY=true
if check_tap_network; then
log "TAP networking detected ($TAP_BRIDGE, $TAP_DEVICE)"
else
TAP_NETWORK_READY=false
log_section "TAP networking not set up — tap variants will be skipped"
echo "Bridge '$TAP_BRIDGE' or TAP device '$TAP_DEVICE' is not present/up."
echo ""
echo "To enable tap variants (base-tap, eval-tap, grafana-tap), run:"
echo " nix run .#pcp-check-host"
echo " sudo nix run .#pcp-network-setup"
echo ""
echo "Then re-run this test. To suppress this message, pass --skip-tap."
fi
# Ensure clean state
stop_all_vms
# Preflight: detect whether host TAP networking is set up (unless --skip-tap).
# When it is not, tap variants are skipped with an actionable hint
# instead of failing with an opaque VM_START_FAILED.
TAP_NETWORK_READY=true
if [[ "$SKIP_TAP" == "false" ]]; then
if check_tap_network; then
log "TAP networking detected ($TAP_BRIDGE, $TAP_DEVICE)"
else
TAP_NETWORK_READY=false
log_section "TAP networking not set up — tap variants will be skipped"
echo "Bridge '$TAP_BRIDGE' or TAP device '$TAP_DEVICE' is not present/up."
echo ""
echo "To enable tap variants (base-tap, eval-tap, grafana-tap), run:"
echo " nix run .#pcp-check-host"
echo " sudo nix run .#pcp-network-setup"
echo ""
echo "Then re-run this test. To suppress this message, pass --skip-tap."
fi
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@build/nix/tests/test-all-microvms.nix` around lines 562 - 578, Wrap the TAP
preflight block so it only runs when the user did not pass the skip-tap flag:
check the parsed --skip-tap indicator (e.g., SKIP_TAP or skip_tap) and only
execute the existing logic that calls check_tap_network and sets
TAP_NETWORK_READY / prints messages about TAP_BRIDGE and TAP_DEVICE if the flag
is false; if skip-tap is true, skip the whole detection and banner so no TAP
setup hints are shown. Ensure you reference the same symbols already used
(check_tap_network, TAP_NETWORK_READY, TAP_BRIDGE, TAP_DEVICE) and do not change
the existing messages aside from guarding their execution behind the skip-tap
condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant