Skip to content

userspace: switch PID 1 to busybox init#263

Merged
sysheap merged 16 commits into
mainfrom
busybox-init
Apr 19, 2026
Merged

userspace: switch PID 1 to busybox init#263
sysheap merged 16 commits into
mainfrom
busybox-init

Conversation

@sysheap
Copy link
Copy Markdown
Owner

@sysheap sysheap commented Apr 19, 2026

Summary

  • Replaces Solaya's Rust init binary with busybox init as PID 1, driven by /etc/inittab from the buildroot overlay; busybox runs /etc/init.d/rcS, waits on /bin/dhcpd, and respawns /bin/dash -i on the serial console.
  • Adds the kernel primitives busybox expects on boot/shutdown: shebang exec (4 layers), rt_sigtimedwait, reboot(2) (with LINUX_REBOOT_* now bindgen'd from linux/reboot.h), a sync(2) stub, and open("/dev/console")FileDescriptor::Tty (via a new CharDevice::is_tty trait method, replacing an Arc-identity static).
  • Converts socket(2)'s unsupported-domain/type panics into EAFNOSUPPORT / EPROTONOSUPPORT.
  • Deletes the Rust init binary, re-points load_init_bytes to /sbin/init first, and re-syncs QEMU boot on the dash shell prompt instead of the old Rust-init banner lines.

Review follow-ups included in the same branch: a shebang fast-path (peek SHEBANG_MAX_LINE before pulling a multi-MiB ELF into memory), loader::load_elf errors now propagate as ENOEXEC to userspace, PendingSignals::first_unblocked/first_in collapsed into one first_matching, and the rt_sigtimedwait wake path now uses a sigtimedwait_mask: Option<u64> armed/disarmed by the future instead of a bespoke signal_waker: Option<Waker> slot on Thread.

Test plan

  • cmake --build build --target clippy clean.
  • cmake --build build --target test-system — all 71 system tests pass, including shutdown (busybox haltreboot(2) → HALT) and the signal/job-control suites.
  • Spot-check an interactive boot (just run): busybox inittab runs dhcpd, dash comes up on the console, halt cleanly exits QEMU.

🤖 Generated with Claude Code

sysheap and others added 16 commits April 19, 2026 10:24
Enables BR2_PACKAGE_BUSYBOX + BR2_INIT_BUSYBOX so buildroot installs
/sbin/init as a symlink to /bin/busybox.  Kernel's load_init_bytes now
searches /sbin/init first; /bin/init (Rust fallback) stays in the
overlay so a single-line reorder reverts the swap if bring-up gets
stuck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
do_socket was asserting on AF != AF_INET and panicking on unknown
socket types.  A syscall driven by userspace input must never panic
the kernel.  Busybox init calls socket(AF_UNIX, ...) during startup
and hit the assert immediately; returning the proper errnos lets init
proceed so we can see what it actually needs next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Booting a busybox userspace crashed the kernel with "Cannot parse ELF
file: MagicNumberWrong" in do_execve when PID 1 tried to exec
/etc/init.d/rcS (a `#!/bin/dash` script). do_execve now peeks the first
two bytes of the file, and on a `#!` header resolves the interpreter
path + optional arg and re-execs against it, following up to
MAX_SHEBANG_DEPTH=4 layers (ELOOP beyond). Malformed shebangs or
non-ELF final files return ENOEXEC instead of panicking, so userspace
input can no longer crash the kernel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Busybox init's event loop spins on rt_sigtimedwait waiting for SIGCHLD
(and SIGHUP/SIGUSR*/SIGTERM). Without the syscall we returned ENOSYS,
so init hot-looped and dash respawned forever. Implement the NULL-info
infinite-wait path (and a zero-timeout poll path) that busybox uses:
dequeue the lowest pending signal in the caller's set, or block on a
new per-thread signal_waker that send_signal wakes regardless of
sigmask (critical because callers typically block the set beforehand).
SIGKILL/SIGSTOP are stripped from the wait mask. Non-NULL siginfo and
finite non-zero timeouts are rejected with EINVAL for now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The boot-synchronization expectations for "init process started" and
"starting shell" were printed by the deprecated Rust init.  Busybox
init (now PID 1 via /sbin/init) does not emit them, so every system
test was hanging on boot.  Drop those markers and let the shell
prompt be the sync point; keep the dhcpd marker when the test boots
with networking since our Rust dhcpd still runs via inittab's
::wait:/bin/dhcpd entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Busybox init's console::respawn child closes FDs 0/1/2 and reopens
/dev/console before exec'ing dash. Before this change the reopened fd was
FileDescriptor::VfsFile, whose read path drains ConsoleCharDevice
synchronously and returns EAGAIN when the TTY buffer is empty — so dash's
blocking read on stdin never unblocked and typed input was silently
dropped. do_openat now recognises the console char device via Arc
identity (new CONSOLE_CHAR_DEVICE handle + as-char-device VfsNode hook)
and produces FileDescriptor::Tty, which blocks via the async ReadTty
future like the initial FDs in FdTable::new.

Two supporting fixes fell out:

- send_signal called ThreadWaker::wake() while holding the target
  thread's lock; wake() re-locks the same thread, which deadlocked on
  the same CPU the first time busybox dash self-suspended via
  kill(0, SIGTTIN). Take the waker out first, drop the lock, then wake.
- busybox init calls setsid() in the spawned child before opening the
  console, so dash's pgid (= its tid) never matches the TTY's default
  fg_pgid (=1) and dash loops on SIGTTIN to stop itself until
  foregrounded. On the openat path, set fg_pgid to the opener's pgid.
  Proper TIOCSCTTY/ctty handling remains deferred to #250.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sybox init

Add a Linux-compatible reboot(2) handler (SYSCALL_NR_REBOOT = 142) so that
busybox init's boot-time reboot(LINUX_REBOOT_CMD_CAD_OFF) no longer hits the
UNIMPLEMENTED syscall path, and so userspace has a direct route to a clean
system shutdown. Magic values follow the public man 2 reboot contract: magic1
must be 0xfee1dead and magic2 one of the documented constants. CAD_OFF/CAD_ON
are no-ops; HALT/POWER_OFF print "shutting down system" and call
qemu_exit::exit_success(); RESTART delegates to platform::reset::trigger_reset().
RESTART2, SW_SUSPEND, and KEXEC return EINVAL. Credential checks are deferred
until we grow a capability subsystem.

The shutdown system test now uses busybox's "halt -n" instead of dash's "exit".
With busybox init as PID 1, dash's exit is caught by the console respawn entry
in inittab, so the process table never empties. "halt -n" skips the
unimplemented sync(2) call that plain "halt" makes, and reaches the shutdown
path via init, producing the same literal "shutting down system" message the
test already waits for.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Solaya has no writeback caches — filesystem operations either hit
tmpfs or the virtio-blk synchronous path, both of which persist
immediately. A no-op sync(2) is semantically correct for this kernel
and unblocks busybox's halt applet, which otherwise traps on an
unimplemented syscall before reaching reboot(HALT). The shutdown
system test can now invoke plain `halt` instead of `halt -n`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes userspace/src/bin/init.rs and drops /bin/init from the
kernel's INIT_PATHS search list.  Busybox (/sbin/init) is the only
supported PID 1; /init stays as the conventional initramfs fallback.
Also cleans up stale defconfig commentary that referenced the
fallback path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the userspace architecture section and the boot-sequence test
expectations to describe the busybox init flow (read /sbin/init,
inittab from overlay, dhcpd wait, dash respawn on console) now that
the Rust init is gone.  Removes references to the deleted init.rs
binary from the BUILD.md userspace build chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop comments that narrate the recent busybox switch or restate what
the code already says; keep only the non-obvious invariants. Also
update qemu_wrapper.sh to reference /sbin/init (the current PID-1
path) in its "no rootfs" error note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
first_unblocked and first_in did the same bitmask-then-ctz work with
different input mask interpretations.  Collapse into first_matching(),
have callers pass !sigmask for delivery and the set directly for
sigtimedwait.  Also drop duplicated sigmask-access boilerplate in
has_pending_unblocked_signal and take_next_pending_signal by routing
them through peek_first_unblocked_signal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A multi-MiB ELF binary was being pulled into memory on every execve
just to check whether its first two bytes were '#!'.  Split the old
try_read_from_vfs into resolve_against_cwd + read_full_node so
resolve_shebang can peek up to SHEBANG_MAX_LINE bytes first and only
commit to the full read once we know we're not following a shebang
chain (and reuse the peek buffer verbatim for files <= the peek size).

While here, map loader::load_elf errors to ENOEXEC so a corrupted-but-
parseable ELF reports back to userspace instead of panicking the
kernel — matching the ElfFile::parse path that already does this.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
openat's "is this /dev/console?" check was comparing the node's
CharDevice Arc against a second global Arc stored just for this
purpose.  Push the decision into the trait: default is_tty() == false,
override on ConsoleCharDevice.  Drops the CONSOLE_CHAR_DEVICE static,
the is_console_char_device helper, and the RuntimeInitializedData
import — fs_ops::openat just asks the device.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per CLAUDE.md's "reuse Linux/musl header definitions" rule, pull the
reboot magic + command constants from linux/reboot.h via the bindgen
driver instead of redefining them inline.  MAGIC2_SET now reuses the
four bindgen'd LINUX_REBOOT_MAGIC2* values, and the magic compare
widens to u32 (via i32::cast_unsigned) to match the generated type
while preserving the raw 32-bit bit patterns the syscall ABI expects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SigTimedWait::poll stored the task's cx.waker() in a dedicated
signal_waker field on Thread, and send_signal had to take() + wake()
that waker out of band (with the usual "release the lock before
waking to avoid the ThreadWaker re-lock deadlock" dance).  That slot
never cleared on Poll::Ready, and only held a single Waker so a
concurrent sigtimedwait would silently overwrite an earlier one.

Replace it with an Option<u64> mask that the Future arms on Pending
and disarms in its Drop impl (covering both Ready-return and task
cancellation).  send_signal just adds `sigtimedwait_matches(sig)` to
its existing Waiting->Runnable gate, and set_syscall_task_and_suspend
adds `sigtimedwait_pending()` to its race-avoidance check — the
existing ThreadWaker / RUN_QUEUE machinery re-polls the future without
any custom wake plumbing.

Shutdown test (busybox halt -> reboot/sync -> HALT) still passes, as
do the remaining 70 system tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented Apr 19, 2026

AI Architectural Review

TL;DR: Good direction — swapping the Rust PID 1 for busybox init, wiring up the syscalls busybox actually exercises (rt_sigtimedwait, reboot, sync, 4-layer shebang) and turning bad socket(2) inputs into errnos instead of panics are all net wins for Linux-compat; the main smells are policy nits (a single-caller helper, duplicated path-composition) and a couple of stop-gaps (/dev/console implicit-ctty, unchecked reboot authority) that should be tracked so they don't calcify.

Must-fix (correctness / safety / policy violations)

none

Consider (architecture / duplication / design smell)

  • read_full_node is a helper with a single call site — crates/kernel/src/syscalls/exec_ops.rs:273 (invoked only at :228). CLAUDE.md explicitly bans single-caller helpers; the 64 MiB cap + read + truncate is 5 lines and reads fine inline.
  • resolve_against_cwd reinvents crates/kernel/src/syscalls/helpers.rs:12 (compose_abs) — same absolute/ends-with-//else branching at crates/kernel/src/syscalls/exec_ops.rs:260. compose_abs additionally canonicalizes ../. segments, which is arguably more correct for shebang-resolved paths. Reuse the existing helper rather than carrying a parallel copy.
  • Implicit-ctty stop-gap at crates/kernel/src/syscalls/fs_ops.rs:67-77 unconditionally hijacks console_tty().fg_pgid on every openat("/dev/console"), not just the session leader's first open — any later opener silently steals the foreground pgid from the shell. The comment points at Unprivileged foreground-group hijack via open(/dev/console) #262 but the failure mode (a rogue userspace program breaking dash's job control with a single open) is wide open until then; at minimum gate it on "fg_pgid unset or matches caller's session" so it degrades to a no-op after dash claims the console.
  • FileDescriptor::Tty(console_tty().clone()) at crates/kernel/src/syscalls/fs_ops.rs:77 ignores the node/dev that triggered is_tty — the CharDevice::is_tty trait method suggests a generic TTY-wrapping story, but the concrete wiring assumes exactly one global console. Either drop the trait method and special-case the console node in devfs, or thread the actual Arc<dyn CharDevice> through so a second TTY device doesn't silently alias to the console.
  • do_rt_sigtimedwait rejects every finite non-zero timeout with EINVAL (crates/kernel/src/syscalls/signal_ops.rs:171). That's fine for busybox-init's usage today, but it's a ticking Linux-compat trap — a posted finding/issue would keep it from being forgotten once something actually passes a 100 ms timeout.
  • parse_shebang (crates/kernel/src/syscalls/exec_ops.rs:173) is pure byte-slice logic with non-trivial edge cases (empty line, all-whitespace, no-newline-in-257-bytes, optional-arg trimming). CLAUDE.md calls these out as prime Kani targets — worth a proof that it never panics on arbitrary byte input and that layer count stays ≤ MAX_SHEBANG_DEPTH.
  • do_reboot has no authority check (crates/kernel/src/syscalls/process_ops.rs:221). The TODO: CAP_SYS_BOOT comment is honest, but until Solaya has credentials any userspace process (prog1, a buggy dhcpd) can halt the kernel — flag this as a tracked issue so it doesn't disappear into the diff.

Noted (weird but I'm not sure — maintainer eyeball please)

  • parse_shebang returns Errno::ENOEXEC when the first line lacks a \n within SHEBANG_MAX_LINE (crates/kernel/src/syscalls/exec_ops.rs:180). Linux's BINPRM_BUF_SIZE path truncates at the buffer and parses what it has — ours is stricter. Almost certainly irrelevant for real scripts, but flagging because it's a silent divergence from the reference implementation we're targeting.
  • info!("No more processes to schedule, shutting down system") in do_reboot (crates/kernel/src/syscalls/process_ops.rs:236) is factually wrong for a deliberate reboot(HALT) — there may still be other processes. The string is load-bearing for the shutdown system test, so swapping it requires a test update; noting in case the message is supposed to mean something specific and should be kept accurate.
  • No dedicated system test exercises the multi-layer shebang path — rcS (configs/overlay/etc/init.d/rcS, a single-layer #!/bin/dash) exercises depth-1 via boot, but depth-2+ and the ELOOP-on-4-layers edge go untested. Given this is the first implementation, a tiny system-test with a chain of three shebangs would lock in the argv-assembly semantics.

Skipped

Style, clippy-level lints, naming, docstrings on private items, commit-message conventions — just ci enforces these and the maintainer has them out of scope.

@sysheap sysheap merged commit 15720ab into main Apr 19, 2026
2 checks passed
@sysheap sysheap deleted the busybox-init branch April 19, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant