Skip to content

Conversation

@qimcis
Copy link
Collaborator

@qimcis qimcis commented Jan 27, 2026

Description

This project was seen as part of CS537 in Spring 2018

Testing

Did a simple test with openai/gpt-5.2-2025-12-11:

inspect eval courselab \
  --model openai/gpt-5.2-2025-12-11 \
  -T 'task_ids=["cs537-projects-spring-2018__concurrency_mapreduce"]' \
  -T 'max_turns=200'

Model ran, and successfully completed the project:

Completed in 0:01:47.
Pass rate: 1/1 (100.0%).

Checklist

  • Tests pass locally
  • Code follows project style guidelines
  • Documentation updated (if needed)

@qimcis qimcis marked this pull request as ready for review January 27, 2026 04:55
@xuafeng xuafeng requested review from Copilot and tareknaser January 27, 2026 05:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new CourseLab benchmark task for UW-Madison CS537 Spring 2018 Project 4a (concurrency MapReduce), including the task prompt, container setup, and evaluation harness.

Changes:

  • Introduces a new CS537 Spring 2018 MapReduce task (prompt + configuration + compose setup).
  • Adds preprocessing to fetch the upstream scaffold and generate protected test inputs/programs.
  • Adds an evaluation script that builds mapreduce.c and runs functional tests; registers the new course in courses.json.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_mapreduce/task.md Adds the MapReduce assignment prompt and API overview for the task.
benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_mapreduce/preprocess.sh Clones scaffold repo, creates test fixtures, and records checksums for protected files.
benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_mapreduce/evaluate.sh Compiles mapreduce.c, builds test drivers, and validates output against expectations.
benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_mapreduce/config.json Registers the task instance metadata and produced artifact path.
benchmarks/courselab_bench/data/cs537-projects-spring-2018/task_concurrency_mapreduce/compose.yaml Defines the GCC-based container environment and preprocess hook.
benchmarks/courselab_bench/data/courses.json Adds the CS537 Spring 2018 course entry to the course catalog.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +47 to +48
output=$(printf "%s" "$output" | LC_ALL=C sort)

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluator sorts the program output before comparison, which means it no longer verifies the spec requirement that keys are processed in sorted order per partition. Consider adjusting the tests to validate per-partition key ordering without depending on global output order (e.g., make the test reducer emit partition_number + key and check ordering within each partition), or include at least one run where ordering is deterministic and must already be sorted.

Copilot uses AI. Check for mistakes.

In 2004, engineers at Google introduced a new paradigm for large-scale
parallel data processing known as MapReduce (see the original paper
[here](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf),
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MapReduce paper link has a double slash (.../en//archive/...), which is likely accidental and may break if the host stops redirecting. Update the URL to a canonical path.

Suggested change
[here](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf),
[here](https://static.googleusercontent.com/media/research.google.com/en/archive/mapreduce-osdi04.pdf),

Copilot uses AI. Check for mistakes.
"tests/input_copy.txt"
"tests/input_copy2.txt"
"tests/mr_wordcount.c"
"tests/mr_copytest.c"
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapreduce.h (the provided API contract) isn’t included in the protected-file checksums. As a result, a submission could modify the header to change/inline MR_* behavior and bypass parts of the intended interface. Add mapreduce.h (and any other provided, non-student-editable scaffolding files in this directory) to PROTECTED_FILES so the checksum check enforces the assignment boundary.

Suggested change
"tests/mr_copytest.c"
"tests/mr_copytest.c"
"mapreduce.h"

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +38
set +e
output=$(eval "$cmd" 2>"/tmp/${name}.err")
rc=$?
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_test() uses eval to execute the command string. This is unnecessary here and makes the script more fragile (quoting/escaping issues) and opens command-injection risks if inputs ever change. Prefer executing the command without eval (e.g., pass argv as an array) so arguments are handled safely.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant