Skip to content

Conversation

@qimcis
Copy link
Collaborator

@qimcis qimcis commented Jan 27, 2026

Description

This project was seen as part of CS537 in Spring 2018

See original repo: https://github.com/remzi-arpacidusseau/ostep-projects

Testing

Did a simple test with openai/gpt-5.2-2025-12-11:

inspect eval courselab \
  --model openai/gpt-5.2-2025-12-11 \
  -T 'task_ids=["cs537-projects-spring-2018__filesystems_checker"]' \
  -T 'max_turns=200'

Model ran, and successfully completed the project:

Completed in 0:02:35.
Pass rate: 1/1 (100.0%).

Checklist

  • Tests pass locally
  • Code follows project style guidelines
  • Documentation updated (if needed)

@qimcis qimcis marked this pull request as ready for review January 27, 2026 04:56
@xuafeng xuafeng requested review from Copilot and tareknaser January 27, 2026 05:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a CS537 Spring 2018 Project 5 task for file system checking to the courselab benchmark. The task requires implementing an xv6 file system checker (xcheck) that validates file system images for 13 different types of consistency errors.

Changes:

  • Added task description with detailed specifications for implementing a file system checker
  • Created preprocessing script to clone xv6 and ostep-projects repositories and generate 14 test file system images (1 valid, 13 with specific errors)
  • Implemented evaluation script that tests all error cases, usage handling, and valid image processing
  • Added course metadata to courses.json for CS537 Spring 2018

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
task.md Comprehensive task description specifying the file system checker requirements, error messages, and hints
preprocess.sh Setup script that clones dependencies, builds mkfs, and generates test images with specific file system inconsistencies
evaluate.sh Test harness that validates xcheck implementation against all 16 test cases (usage, missing file, valid image, and 13 error types)
config.json Task configuration with timeout, tags, and artifact paths
compose.yaml Docker compose configuration using gcc:12 image
courses.json Added CS537 Spring 2018 course entry with metadata

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

is provided, you should print the usage error shown below:
```
prompt> xcheck
Usage: xcheck <file_system_image>
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage message has a trailing space after <file_system_image>. While this doesn't cause test failures (since evaluate.sh strips trailing whitespace with sed), it's cleaner to remove the trailing space for consistency.

Suggested change
Usage: xcheck <file_system_image>
Usage: xcheck <file_system_image>

Copilot uses AI. Check for mistakes.
If the file system image does not exist, you should print the error `image not
found.` to standard error and exit with the error code of 1.

If the checker detects any one of the 12 errors above, it should print the
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states "any one of the 12 errors above" but actually lists 13 distinct error conditions (numbered 1-13 in the preceding list). This should be corrected to say "13 errors" to match the actual number of error cases.

Suggested change
If the checker detects any one of the 12 errors above, it should print the
If the checker detects any one of the 13 errors above, it should print the

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant