[lab] CS537 Spring 2018 Project 4b #95

qimcis · 2026-01-27T04:46:57Z

Description

This project was seen as part of CS537 in Spring 2018

Testing

Did a test with openai/gpt-5.2-2025-12-11:

inspect eval courselab \
  --model openai/gpt-5.2-2025-12-11 \
  -T 'task_ids=["cs537-projects-spring-2018__concurrency_xv6_threads"]' \
  -T 'max_turns=200'

Model ran, but failed because its xv6 user library code used PGSIZE without including the header that defines it, so the build stopped at compile time.

Completed in 0:08:33.
Pass rate: 0/1 (0.0%).

Then also tested with with anthropic/claude-opus-4-5-20251101:

inspect eval courselab
--model anthropic/claude-opus-4-5-20251101
-T 'task_ids=["cs537-projects-spring-2018__concurrency_xv6_threads"]'
-T 'max_turns=200'

Model ran, but failed because xv6 booted, but the thread/lock code deadlocked or failed to make progress in test_thread, so the test timed out.

Completed in 0:19:40.
Pass rate: 0/1 (0.0%).

Copilot

Pull request overview

Adds a new CourseLab benchmark task for UW-Madison CS537 Spring 2018 Project 4b (xv6 kernel threads), including setup, evaluation, and task instructions.

Changes:

Introduces the new cs537-projects-spring-2018__concurrency_xv6_threads task (task text, container setup, and evaluation script).
Adds an xv6-thread-focused test suite wired into the ostep-projects harness via a generated Makefile and .run/.out fixtures.
Registers the CS537 Spring 2018 course in courses.json.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/task.md	Adds the task prompt/instructions for xv6 kernel threads.
benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/preprocess.sh	Clones pinned sources, generates tests, builds xv6 artifacts, and writes protected-file checksums.
benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/evaluate.sh	Verifies protected files and runs the ostep test harness under a timeout.
benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/config.json	Declares the benchmark instance metadata and artifact allowlist.
benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/compose.yaml	Defines the container image and init script for the task environment.
benchmarks/courselab_bench/data/courses.json	Adds the CS537 Spring 2018 course entry.

Comments suppressed due to low confidence (1)

benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/task.md:74

The task text points readers to kernel/proc.c for the fork() implementation, but in the xv6-public tree used by this benchmark it is proc.c at the repo root (i.e., src/proc.c). Updating these file paths will reduce confusion and make the instructions consistent with the provided source layout.

call. The `fork()` system call will serve as a template for `clone()`, with
some modifications. For example, in `kernel/proc.c`, we see the beginning of
the `fork()` implementation:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T05:17:19Z

...s/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/preprocess.sh

+        printf(1, "XV6_TEST_OUTPUT clone_join_fail\n");
+        exit();
+    }
+


test_clone doesn't verify that join() returns the PID of the thread created by clone(). As written, a broken implementation that returns a different child PID could still pass. Please assert joined == pid (and keep the existing < 0 failure check).

Suggested change

if (joined != pid) {

printf(1, "XV6_TEST_OUTPUT clone_join_fail\n");

exit();

}

Copilot · 2026-01-27T05:17:20Z

...s/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/preprocess.sh

+EOT
+
+cat > tests/1.run <<'EOT'
+cd src; ../../tester/run-xv6-command.exp CPUS=1 Makefile.test test_clone | grep XV6_TEST_OUTPUT; cd ..


The generated tests/*.run scripts pipe the xv6 runner output into grep without pipefail. If run-xv6-command.exp fails (non-zero) but still writes matching output, the test could incorrectly pass because the pipeline exit code comes from grep. Consider adding set -euo pipefail to each .run script (or otherwise ensure the xv6 runner's exit status is checked).

Suggested change

cd src; ../../tester/run-xv6-command.exp CPUS=1 Makefile.test test_clone | grep XV6_TEST_OUTPUT; cd ..

#!/bin/bash

set -euo pipefail

cd src

../../tester/run-xv6-command.exp CPUS=1 Makefile.test test_clone | grep XV6_TEST_OUTPUT

cd ..

Copilot · 2026-01-27T05:17:20Z

benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/task.md

+
+One other thing you'll have to understand to make this all work is the x86
+calling convention, and exactly how the stack works when calling a function.
+This is you can read about in [Programming From The Ground


Grammar: "This is you can read about" is ungrammatical and reads like a typo. Consider changing it to "This you can read about" or "This is something you can read about".

Suggested change

This is you can read about in [Programming From The Ground

This you can read about in [Programming From The Ground

Copilot · 2026-01-27T05:17:20Z

benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/task.md

+xv6. Thus, you should add prototypes to `user/user.h` and the actual code to
+implement the library routines in `user/ulib.c`.


The instructions mention adding prototypes to user/user.h and code to user/ulib.c, but this benchmark clones xv6-public, where these files are at the repo root (user.h, ulib.c). Please update the paths in the task text to match the layout learners will see (e.g., src/user.h / src/ulib.c in this harness).

Suggested change

xv6. Thus, you should add prototypes to `user/user.h` and the actual code to

implement the library routines in `user/ulib.c`.

xv6. Thus, you should add prototypes to `src/user.h` and the actual code to

implement the library routines in `src/ulib.c`.

Copilot · 2026-01-27T05:17:20Z

...s/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/preprocess.sh

+echo "Creating checksums for protected files"
+mkdir -p /tmp/checksums
+CHECKSUM_FILE=/tmp/checksums/protected.sha256
+: > "$CHECKSUM_FILE"
+
+PROTECTED_FILES=(
+  "tests/pre"
+  "tests/1.desc"
+  "tests/1.run"
+  "tests/1.out"
+  "tests/1.err"
+  "tests/1.rc"
+  "tests/2.desc"
+  "tests/2.run"
+  "tests/2.out"
+  "tests/2.err"
+  "tests/2.rc"
+  "tests/test_clone.c"
+  "tests/test_thread.c"
+  "tests/test_join_none.c"
+  "tests/test_thread_args.c"
+  "tests/test_thread_join_none.c"
+  "tests/3.desc"
+  "tests/3.run"
+  "tests/3.out"
+  "tests/3.err"
+  "tests/3.rc"
+  "tests/4.desc"
+  "tests/4.run"
+  "tests/4.out"
+  "tests/4.err"
+  "tests/4.rc"
+  "tests/5.desc"
+  "tests/5.run"
+  "tests/5.out"
+  "tests/5.err"
+  "tests/5.rc"
+)
+
+for file in "${PROTECTED_FILES[@]}"; do
+  if [ -f "$file" ]; then
+    sha256sum "$file" >> "$CHECKSUM_FILE"
+    echo "  Protected: $file"


Integrity checksums for protected test files are written to CHECKSUM_FILE in /tmp, which is a world-writable location that code under test can modify or delete. An attacker can tamper with the checksum file between preprocessing and evaluation to make arbitrary changes to the supposedly protected test files while still passing the integrity check in evaluate.sh. Consider storing this checksum file in a directory not writable by the code under test (or tightening its permissions) and treating any modification or removal as a hard failure during evaluation.

Copilot · 2026-01-27T05:17:20Z

...rks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_xv6_threads/evaluate.sh

+if [ -f /tmp/checksums/protected.sha256 ]; then
+  sha256sum -c /tmp/checksums/protected.sha256 || {
+    echo "FAIL: Protected files were modified"
+    exit 1
+  }
+fi


The integrity verification in evaluate.sh relies on /tmp/checksums/protected.sha256, but skips the check entirely if the file is missing even though /tmp is writable by the untrusted submission. A malicious submission can remove or overwrite this checksum file after preprocess.sh runs to bypass detection of modifications to the "protected" test files while still allowing the script to continue. To harden this, store the checksum file in a location not writable by the code under test and treat a missing or invalid checksum file as an immediate evaluation failure.

Suggested change

if [ -f /tmp/checksums/protected.sha256 ]; then

sha256sum -c /tmp/checksums/protected.sha256 || {

echo "FAIL: Protected files were modified"

exit 1

}

fi

CHECKSUM_FILE="/workspace/checksums/protected.sha256"

if [ ! -f "$CHECKSUM_FILE" ]; then

echo "FAIL: Protected checksums file missing"

exit 1

fi

sha256sum -c "$CHECKSUM_FILE" || {

echo "FAIL: Protected files were modified"

exit 1

}

qimcis added 4 commits January 26, 2026 12:07

Add CS537 Spring 2018 xv6 threads lab

db10232

Relax xv6 build warning for GCC 12

91bbd16

expand xv6 threads tests

b8d8118

sync xv6 threads task with upstream README

7fc8577

qimcis marked this pull request as ready for review January 27, 2026 04:55

xuafeng requested review from Copilot and tareknaser January 27, 2026 05:11

Copilot started reviewing on behalf of xuafeng January 27, 2026 05:12 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lab] CS537 Spring 2018 Project 4b #95

[lab] CS537 Spring 2018 Project 4b #95

Uh oh!

qimcis commented Jan 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	This is you can read about in [Programming From The Ground
	This you can read about in [Programming From The Ground

		xv6. Thus, you should add prototypes to `user/user.h` and the actual code to
		implement the library routines in `user/ulib.c`.

[lab] CS537 Spring 2018 Project 4b #95

Are you sure you want to change the base?

[lab] CS537 Spring 2018 Project 4b #95

Uh oh!

Conversation

qimcis commented Jan 27, 2026

Description

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant