-
Notifications
You must be signed in to change notification settings - Fork 9
[lab] CS537 Spring 2018 Project 4b #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[lab] CS537 Spring 2018 Project 4b #95
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds a new CourseLab benchmark task for UW-Madison CS537 Spring 2018 Project 4b (xv6 kernel threads), including setup, evaluation, and task instructions.
Changes:
- Introduces the new
cs537-projects-spring-2018__concurrency_xv6_threadstask (task text, container setup, and evaluation script). - Adds an xv6-thread-focused test suite wired into the ostep-projects harness via a generated Makefile and
.run/.outfixtures. - Registers the CS537 Spring 2018 course in
courses.json.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/task.md | Adds the task prompt/instructions for xv6 kernel threads. |
| benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/preprocess.sh | Clones pinned sources, generates tests, builds xv6 artifacts, and writes protected-file checksums. |
| benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/evaluate.sh | Verifies protected files and runs the ostep test harness under a timeout. |
| benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/config.json | Declares the benchmark instance metadata and artifact allowlist. |
| benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/compose.yaml | Defines the container image and init script for the task environment. |
| benchmarks/courselab_bench/data/courses.json | Adds the CS537 Spring 2018 course entry. |
Comments suppressed due to low confidence (1)
benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/task.md:74
- The task text points readers to
kernel/proc.cfor thefork()implementation, but in the xv6-public tree used by this benchmark it isproc.cat the repo root (i.e.,src/proc.c). Updating these file paths will reduce confusion and make the instructions consistent with the provided source layout.
call. The `fork()` system call will serve as a template for `clone()`, with
some modifications. For example, in `kernel/proc.c`, we see the beginning of
the `fork()` implementation:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| printf(1, "XV6_TEST_OUTPUT clone_join_fail\n"); | ||
| exit(); | ||
| } | ||
|
|
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_clone doesn't verify that join() returns the PID of the thread created by clone(). As written, a broken implementation that returns a different child PID could still pass. Please assert joined == pid (and keep the existing < 0 failure check).
| if (joined != pid) { | |
| printf(1, "XV6_TEST_OUTPUT clone_join_fail\n"); | |
| exit(); | |
| } |
| EOT | ||
|
|
||
| cat > tests/1.run <<'EOT' | ||
| cd src; ../../tester/run-xv6-command.exp CPUS=1 Makefile.test test_clone | grep XV6_TEST_OUTPUT; cd .. |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generated tests/*.run scripts pipe the xv6 runner output into grep without pipefail. If run-xv6-command.exp fails (non-zero) but still writes matching output, the test could incorrectly pass because the pipeline exit code comes from grep. Consider adding set -euo pipefail to each .run script (or otherwise ensure the xv6 runner's exit status is checked).
| cd src; ../../tester/run-xv6-command.exp CPUS=1 Makefile.test test_clone | grep XV6_TEST_OUTPUT; cd .. | |
| #!/bin/bash | |
| set -euo pipefail | |
| cd src | |
| ../../tester/run-xv6-command.exp CPUS=1 Makefile.test test_clone | grep XV6_TEST_OUTPUT | |
| cd .. |
|
|
||
| One other thing you'll have to understand to make this all work is the x86 | ||
| calling convention, and exactly how the stack works when calling a function. | ||
| This is you can read about in [Programming From The Ground |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar: "This is you can read about" is ungrammatical and reads like a typo. Consider changing it to "This you can read about" or "This is something you can read about".
| This is you can read about in [Programming From The Ground | |
| This you can read about in [Programming From The Ground |
| xv6. Thus, you should add prototypes to `user/user.h` and the actual code to | ||
| implement the library routines in `user/ulib.c`. |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instructions mention adding prototypes to user/user.h and code to user/ulib.c, but this benchmark clones xv6-public, where these files are at the repo root (user.h, ulib.c). Please update the paths in the task text to match the layout learners will see (e.g., src/user.h / src/ulib.c in this harness).
| xv6. Thus, you should add prototypes to `user/user.h` and the actual code to | |
| implement the library routines in `user/ulib.c`. | |
| xv6. Thus, you should add prototypes to `src/user.h` and the actual code to | |
| implement the library routines in `src/ulib.c`. |
| echo "Creating checksums for protected files" | ||
| mkdir -p /tmp/checksums | ||
| CHECKSUM_FILE=/tmp/checksums/protected.sha256 | ||
| : > "$CHECKSUM_FILE" | ||
|
|
||
| PROTECTED_FILES=( | ||
| "tests/pre" | ||
| "tests/1.desc" | ||
| "tests/1.run" | ||
| "tests/1.out" | ||
| "tests/1.err" | ||
| "tests/1.rc" | ||
| "tests/2.desc" | ||
| "tests/2.run" | ||
| "tests/2.out" | ||
| "tests/2.err" | ||
| "tests/2.rc" | ||
| "tests/test_clone.c" | ||
| "tests/test_thread.c" | ||
| "tests/test_join_none.c" | ||
| "tests/test_thread_args.c" | ||
| "tests/test_thread_join_none.c" | ||
| "tests/3.desc" | ||
| "tests/3.run" | ||
| "tests/3.out" | ||
| "tests/3.err" | ||
| "tests/3.rc" | ||
| "tests/4.desc" | ||
| "tests/4.run" | ||
| "tests/4.out" | ||
| "tests/4.err" | ||
| "tests/4.rc" | ||
| "tests/5.desc" | ||
| "tests/5.run" | ||
| "tests/5.out" | ||
| "tests/5.err" | ||
| "tests/5.rc" | ||
| ) | ||
|
|
||
| for file in "${PROTECTED_FILES[@]}"; do | ||
| if [ -f "$file" ]; then | ||
| sha256sum "$file" >> "$CHECKSUM_FILE" | ||
| echo " Protected: $file" |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integrity checksums for protected test files are written to CHECKSUM_FILE in /tmp, which is a world-writable location that code under test can modify or delete. An attacker can tamper with the checksum file between preprocessing and evaluation to make arbitrary changes to the supposedly protected test files while still passing the integrity check in evaluate.sh. Consider storing this checksum file in a directory not writable by the code under test (or tightening its permissions) and treating any modification or removal as a hard failure during evaluation.
| if [ -f /tmp/checksums/protected.sha256 ]; then | ||
| sha256sum -c /tmp/checksums/protected.sha256 || { | ||
| echo "FAIL: Protected files were modified" | ||
| exit 1 | ||
| } | ||
| fi |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integrity verification in evaluate.sh relies on /tmp/checksums/protected.sha256, but skips the check entirely if the file is missing even though /tmp is writable by the untrusted submission. A malicious submission can remove or overwrite this checksum file after preprocess.sh runs to bypass detection of modifications to the "protected" test files while still allowing the script to continue. To harden this, store the checksum file in a location not writable by the code under test and treat a missing or invalid checksum file as an immediate evaluation failure.
| if [ -f /tmp/checksums/protected.sha256 ]; then | |
| sha256sum -c /tmp/checksums/protected.sha256 || { | |
| echo "FAIL: Protected files were modified" | |
| exit 1 | |
| } | |
| fi | |
| CHECKSUM_FILE="/workspace/checksums/protected.sha256" | |
| if [ ! -f "$CHECKSUM_FILE" ]; then | |
| echo "FAIL: Protected checksums file missing" | |
| exit 1 | |
| fi | |
| sha256sum -c "$CHECKSUM_FILE" || { | |
| echo "FAIL: Protected files were modified" | |
| exit 1 | |
| } |
Description
This project was seen as part of CS537 in Spring 2018
Testing
Did a test with openai/gpt-5.2-2025-12-11:
Model ran, but failed because its xv6 user library code used PGSIZE without including the header that defines it, so the build stopped at compile time.
Then also tested with with anthropic/claude-opus-4-5-20251101:
inspect eval courselab
--model anthropic/claude-opus-4-5-20251101
-T 'task_ids=["cs537-projects-spring-2018__concurrency_xv6_threads"]'
-T 'max_turns=200'
Model ran, but failed because xv6 booted, but the thread/lock code deadlocked or failed to make progress in test_thread, so the test timed out.