Summary
DragonOS 的 procfs 实现存在缓存问题:当进程通过 wait4() 等 wait 系统调用被回收(reap)后,其对应的 /proc/<pid> 目录条目仍然保留在 procfs 缓存中,导致访问已释放进程的 /proc 目录仍能成功。这与 Linux 的行为不一致。
Environment
Git Commit ID:
546a9cad - fix(process): 修复多线程exec的de_thread竞态与线程退出语义 (#1748)
DragonOS Version: 2026-02-01
Test Case
Source Code
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <errno.h>
#include <string.h>
int main(void) {
pid_t pid;
int status;
char proc_path[256];
printf("=== Test: procfs zombie PID entries ===\n");
printf("This test verifies that /proc entries are cleaned up after process exit\n\n");
pid = fork();
if (pid < 0) {
perror("fork");
exit(1);
}
if (pid == 0) {
// Child process
printf("Child: PID=%d, exiting immediately\n", getpid());
exit(42); // Exit with code 42
}
// Parent process
printf("Parent: Child PID=%d\n", pid);
// Check /proc/<pid> exists before wait
snprintf(proc_path, sizeof(proc_path), "/proc/%d", pid);
printf("Parent: Checking %s exists (BEFORE wait)...\n", proc_path);
if (access(proc_path, F_OK) == 0) {
printf("Parent: %s EXISTS (expected)\n", proc_path);
} else {
printf("Parent: %s DOES NOT EXIST (unexpected!)\n", proc_path);
}
// Wait for child to exit
printf("Parent: Waiting for child...\n");
wait(&status);
printf("Parent: Child exited with status %d\n", WEXITPATH(status));
// Check /proc/<pid> exists AFTER wait and reap
printf("\nParent: Checking %s exists (AFTER wait+reap)...\n", proc_path);
if (access(proc_path, F_OK) == 0) {
printf("Parent: %s STILL EXISTS (BUG! procfs not cleaned up)\n", proc_path);
printf("Parent: This is the bug - process was reaped but /proc entry remains\n");
return 1;
} else {
printf("Parent: %s DOES NOT EXIST (correct - procfs cleaned up)\n", proc_path);
}
// List what's in /proc to confirm
printf("\nParent: Listing PIDs in /proc:\n");
system("ls /proc | grep -E '^[0-9]+$' | head -20");
printf("\n=== TEST PASSED: procfs cleanup works correctly ===\n");
return 0;
}
Compilation
# DragonOS
make user
make write_diskimage
# Linux (Ubuntu 24.04 Docker)
gcc -Wall -O2 -static user/apps/c_unitest/test_procfs_zombie_cleanup.c -o /tmp/test_procfs_zombie_cleanup
Expected Behavior (Linux)
Linux Command:
docker run --rm -v "$(pwd):/dragonos" ubuntu:24.04 bash -c "
cd /dragonos
apt-get update && apt-get install -y gcc
gcc -Wall -O2 -static user/apps/c_unitest/test_procfs_zombie_cleanup.c -o /tmp/test_procfs_zombie_cleanup
/tmp/test_procfs_zombie_cleanup
"
Linux Output:
=== Test: procfs zombie PID entries ===
This test verifies that /proc entries are cleaned up after process exit
Child: PID=1096, exiting immediately
Parent: Child PID=1096
Parent: Checking /proc/1096 exists (BEFORE wait)...
Parent: /proc/1096 EXISTS (expected)
Parent: Waiting for child...
Parent: Child exited with status 42
Parent: Checking /proc/1096 exists (AFTER wait+reap)...
Parent: /proc/1096 DOES NOT EXIST (correct - procfs cleaned up)
Parent: Listing PIDs in /proc:
...
=== TEST PASSED: procfs cleanup works correctly ===
Summary: Linux 正确地在进程被回收后清理了 /proc/1096 目录条目。
Actual Behavior (DragonOS)
DragonOS Command:
python3 dragonos_qemu_interactive.py --commands "/bin/test_procfs_zombie_cleanup"
DragonOS Output:
=== Test: procfs zombie PID entries ===
This test verifies that /proc entries are cleaned up after process exit
Child: PID=19, exiting immediately
Parent: Child PID=19
Parent: Checking /proc/19 exists (BEFORE wait)...
Parent: /proc/19 EXISTS (expected)
Parent: Waiting for child...
Parent: Child exited with status 42
Parent: Checking /proc/19 exists (AFTER wait+reap)...
Parent: /proc/19 STILL EXISTS (BUG! procfs not cleaned up)
Parent: This is the bug - process was reaped but /proc entry remains
Error Details:
- Return value: Test returns 1 (failure)
- errno: N/A
- Kernel panic/assert: None
- System behavior: Test completes, but incorrectly reports /proc entry still exists
Analysis
Root Cause
这是一个 procfs 缓存失效问题,而不是进程泄漏问题。
通过代码分析,我发现:
-
进程确实已正确释放:
- 在
exit.rs:998 和 exit.rs:442 中,调用 ProcessManager::release(pid) 从 ALL_PROCESS 进程表中移除进程
release() 函数(kernel/src/process/mod.rs:824)正确地从全局进程映射表中删除了 PID 条目:
ALL_PROCESS.lock_irqsave().as_mut().unwrap().remove(&pid);
-
问题在于 procfs 的缓存机制:
RootDirOps::populate_children() (kernel/src/filesystem/procfs/root.rs:120) 在填充 /proc 子目录时,使用了 cached_children 缓存
PidDirOps::new_inode() 标记为 volatile(kernel/src/filesystem/procfs/pid/mod.rs:54),但这个标志未实际用于缓存失效
- 在
ProcDir<Ops>::find() (kernel/src/filesystem/procfs/template/dir.rs:143) 中,查找逻辑会先检查缓存,但**validate_child() 默认总是返回 true**
PidDirOps 没有重写 validate_child() 来检查进程是否仍然存在
-
对比:
- Linux: procfs 条目在进程被回收后立即失效,访问返回
ENOENT
- DragonOS: procfs 条目永久缓存在
cached_children 中,即使进程已从 ALL_PROCESS 中移除
关键代码位置
问题 1: 缓存查找从不失效 (kernel/src/filesystem/procfs/template/dir.rs:159-166)
// 先查缓存(使用作用域来确保锁及时释放)
{
let cached_children = self.cached_children.read();
if let Some(inode) = cached_children.get(name) {
if self.inner.validate_child(inode.as_ref()) { // <-- 总是返回 true!
return Ok(inode.clone());
}
}
}
问题 2: PidDirOps 没有重写 validate_child() (kernel/src/filesystem/procfs/pid/mod.rs)
impl DirOps for PidDirOps {
fn lookup_child(...) { ... }
fn populate_children(...) { ... }
// 缺少: fn validate_child(&self, child: &dyn IndexNode) -> bool
}
问题 3: 进程已从表中移除 (kernel/src/process/mod.rs:837)
pub(super) unsafe fn release(pid: RawPid) {
let pcb = ProcessManager::find(pid);
if let Some(ref pcb) = pcb {
// ...
ALL_PROCESS.lock_irqsave().as_mut().unwrap().remove(&pid); // <-- 进程已移除
}
}
Comparison
| Aspect |
Linux |
DragonOS |
Impact |
| 进程释放 |
正确释放进程 |
正确释放进程 |
✅ 一致 |
| procfs 缓存 |
进程回收后失效 |
永久缓存条目 |
❌ 不一致 |
/proc/<pid> 访问 |
返回 ENOENT |
返回成功(bug) |
❌ 不一致 |
validate_child() |
检查进程存在 |
总是返回 true |
❌ 缺失实现 |
Severity
Severity Justification:
这是一个High级别的 bug,原因如下:
- 语义错误:违反了 POSIX/Linux 语义,应用程序可能依赖
/proc/<pid> 在进程回收后消失的行为
- 安全问题:已释放进程的 /proc 条目可能泄露信息(虽然内容可能已失效)
- 资源泄漏:虽然进程已释放,但 procfs 缓存条目永久占用内存
- 兼容性问题:依赖 Linux 行为的工具可能在 DragonOS 上表现异常
Related Files
-
DragonOS implementation:
kernel/src/filesystem/procfs/root.rs:120-148 - RootDirOps::populate_children() 和 lookup_child()
kernel/src/filesystem/procfs/pid/mod.rs:158-196 - PidDirOps (缺少 validate_child)
kernel/src/filesystem/procfs/template/dir.rs:143-172 - ProcDir::find() 缓存逻辑
kernel/src/process/exit.rs:998,442 - ProcessManager::release() 调用
kernel/src/process/mod.rs:824-839 - ProcessManager::release() 实现
-
Linux reference:
- Linux 内核:
fs/proc/base.c - proc_pid_lookup() 检查进程是否存在
- Linux 内核:
fs/proc/root.c - proc_root_lookup() 缓存管理
- man 5 proc: "/proc/ entries disappear after the process is reaped"
Suggested Fix
-
在 PidDirOps 中实现 validate_child():
impl DirOps for PidDirOps {
// ... 现有方法 ...
fn validate_child(&self, _child: &dyn IndexNode) -> bool {
// 检查进程是否仍在 ALL_PROCESS 中
ProcessManager::find(self.pid).is_some()
}
}
-
或者,在 RootDirOps::lookup_child() 中检查进程存在性:
在 kernel/src/filesystem/procfs/root.rs:89 处已有检查,但缓存查找在后面。
-
或者,实现缓存失效机制:
当 ProcessManager::release() 被调用时,通知 procfs 清理对应缓存条目。
Summary
DragonOS 的 procfs 实现存在缓存问题:当进程通过
wait4()等 wait 系统调用被回收(reap)后,其对应的/proc/<pid>目录条目仍然保留在 procfs 缓存中,导致访问已释放进程的 /proc 目录仍能成功。这与 Linux 的行为不一致。Environment
Git Commit ID:
DragonOS Version: 2026-02-01
Test Case
Source Code
Compilation
Expected Behavior (Linux)
Linux Command:
Linux Output:
Summary: Linux 正确地在进程被回收后清理了
/proc/1096目录条目。Actual Behavior (DragonOS)
DragonOS Command:
python3 dragonos_qemu_interactive.py --commands "/bin/test_procfs_zombie_cleanup"DragonOS Output:
Error Details:
Analysis
Root Cause
这是一个 procfs 缓存失效问题,而不是进程泄漏问题。
通过代码分析,我发现:
进程确实已正确释放:
exit.rs:998和exit.rs:442中,调用ProcessManager::release(pid)从ALL_PROCESS进程表中移除进程release()函数(kernel/src/process/mod.rs:824)正确地从全局进程映射表中删除了 PID 条目:问题在于 procfs 的缓存机制:
RootDirOps::populate_children()(kernel/src/filesystem/procfs/root.rs:120) 在填充/proc子目录时,使用了cached_children缓存PidDirOps::new_inode()标记为volatile(kernel/src/filesystem/procfs/pid/mod.rs:54),但这个标志未实际用于缓存失效ProcDir<Ops>::find()(kernel/src/filesystem/procfs/template/dir.rs:143) 中,查找逻辑会先检查缓存,但**validate_child()默认总是返回true**PidDirOps没有重写validate_child()来检查进程是否仍然存在对比:
ENOENTcached_children中,即使进程已从ALL_PROCESS中移除关键代码位置
问题 1: 缓存查找从不失效 (
kernel/src/filesystem/procfs/template/dir.rs:159-166)问题 2:
PidDirOps没有重写validate_child()(kernel/src/filesystem/procfs/pid/mod.rs)问题 3: 进程已从表中移除 (
kernel/src/process/mod.rs:837)Comparison
/proc/<pid>访问ENOENTvalidate_child()trueSeverity
Severity Justification:
这是一个High级别的 bug,原因如下:
/proc/<pid>在进程回收后消失的行为Related Files
DragonOS implementation:
kernel/src/filesystem/procfs/root.rs:120-148- RootDirOps::populate_children() 和 lookup_child()kernel/src/filesystem/procfs/pid/mod.rs:158-196- PidDirOps (缺少 validate_child)kernel/src/filesystem/procfs/template/dir.rs:143-172- ProcDir::find() 缓存逻辑kernel/src/process/exit.rs:998,442- ProcessManager::release() 调用kernel/src/process/mod.rs:824-839- ProcessManager::release() 实现Linux reference:
fs/proc/base.c- proc_pid_lookup() 检查进程是否存在fs/proc/root.c- proc_root_lookup() 缓存管理Suggested Fix
在
PidDirOps中实现validate_child():或者,在
RootDirOps::lookup_child()中检查进程存在性:在
kernel/src/filesystem/procfs/root.rs:89处已有检查,但缓存查找在后面。或者,实现缓存失效机制:
当
ProcessManager::release()被调用时,通知 procfs 清理对应缓存条目。