Skip to content

Conversation

@qimcis
Copy link
Collaborator

@qimcis qimcis commented Jan 18, 2026

Description

Add CS 537 Spring 2018 Midterm exam dataset and solution file to the repository.

Changes

  • Add exam metadata for "CS 537 Spring 2018 Midterm" (ID: cs537_spring_2018_midterm).
  • Add raw exam directory at /home/qi/system-intelligence-benchmark/benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm.
  • Add solutions file 18-spring-mid-answers.pdf and note that there are no reference files.

Testing

Ran data preparation script: python3 courseexam/prepare.py, completed without errors.

Checklist

  • Tests pass locally
  • Code follows project style guidelines
  • Documentation updated (if needed)

Copy link
Collaborator

@777lefty 777lefty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't touch every code snippet, this one looks good to me as long as every code snippet is in code block

@qimcis qimcis force-pushed the cs537-spring-2018-midterm branch from a6dfadc to a23f522 Compare January 26, 2026 15:39
@qimcis qimcis force-pushed the cs537-spring-2018-midterm branch from a23f522 to 52511e9 Compare January 26, 2026 16:34
@qimcis qimcis marked this pull request as ready for review January 27, 2026 04:55
@xuafeng xuafeng requested review from Copilot and tareknaser January 27, 2026 05:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new CourseExam raw exam dataset for CS 537 Spring 2018 Midterm to the courseexam_bench benchmark corpus.

Changes:

  • Introduces a new raw exam markdown file with exam-level metadata and 30 question entries.
  • Encodes per-question grading fields (answer, llm_judge_instructions, tags, points) for dataset generation via prepare.py.
Comments suppressed due to low confidence (3)

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md:110

  • This C snippet still uses escaped quotes inside a Markdown code fence (e.g., printf(\"%d\\n\", counter);). Please remove the backslashes so the code is valid C and the rendered prompt matches the original exam.
```c
volatile int counter = 1000;
void *worker(void *arg) {
  counter--;
  return NULL;
}
int main(int argc, char *argv[]) {
  pthread_t p1, p2;
  pthread_create(&p1, NULL, worker, NULL);
  pthread_create(&p2, NULL, worker, NULL);
  pthread_join(p1, NULL);
  pthread_join(p2, NULL);
  printf(\"%d\\n\", counter);
  return 0;
}

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md:165

  • This code fence includes escaped quotes (printf(\"a\");, printf(\"b\");). In Markdown code blocks the quotes should not be escaped; please drop the backslashes to keep the snippet valid C.
```c
int main(int argc, char *argv[]) {
  printf(\"a\");
  fork();
  printf(\"b\");
  return 0;
}

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md:226

  • In this code fence, the string literal is escaped ({ \"/bin/true\", NULL }) and printf(\"1\") etc. Please remove the backslashes inside the C snippet so it compiles as written and renders cleanly in Markdown.
```c
int main(int argc, char *argv[]) {
  int rc = fork();
  if (rc == 0) {
    char *my_argv[] = { \"/bin/true\", NULL };
    execv(my_argv[0], my_argv);
    printf(\"1\");
  } else if (rc > 0) {
    wait(NULL);
    printf(\"2\");
  } else {
    printf(\"3\");
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +42 to +45
"type": "Freeform",
"tags": ["concurrency"],
"answer": "A",
"llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These A/B style questions are deterministic single-token answers, but they are marked as type: "Freeform" and graded via LLM rubric. To match the benchmark's established pattern (e.g., example_course_2024_midterm and cs537_fall_2021_final use ExactMatch for multiple-choice/True-False), consider using type: "ExactMatch" with choices (e.g., ["possible", "not possible"]) so grading is simple and consistent.

Suggested change
"type": "Freeform",
"tags": ["concurrency"],
"answer": "A",
"llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."
"type": "ExactMatch",
"tags": ["concurrency"],
"answer": "A",
"choices": ["A", "B"]

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +45
"points": 1,
"type": "Freeform",
"tags": ["concurrency"],
"answer": "A",
"llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many questions list multiple sub-items (e.g., Q1 lists 1–5) but the JSON block provides only a single-letter answer and rubric, making the intended target ambiguous and effectively ungradeable as written. Please either (a) split each sub-item into its own question with distinct problem_id values (the README explicitly allows forms like "2.1"/"4a") and per-sub-item points, or (b) change answer to encode all sub-answers (e.g., a 5-letter string) and update llm_judge_instructions (and points/score_total if needed) to grade accordingly.

Suggested change
"points": 1,
"type": "Freeform",
"tags": ["concurrency"],
"answer": "A",
"llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."
"points": 5,
"type": "Freeform",
"tags": ["concurrency"],
"answer": "AAAAA",
"llm_judge_instructions": "The question has five sub-items (1–5). The expected answer is a 5-character string over {A, B}, where each character is the answer (A or B) to the corresponding sub-item in order. Compare the student's answer string to \"AAAAA\" character by character. Award 1 point for each position where the student's character matches the expected character (up to 5 points total). Award 0 points for mismatched or missing characters."

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +63
```c
int value = 0;
int main(int argc, char *argv[]) {
while (1) {
printf(\"%d\", value);
value++;
}
return 0;
}
```
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several C code blocks contain escaped quotes (e.g., printf(\"%d\", value);, printf(\"a\");, and char *my_argv[] = { \"/bin/true\", NULL };). Inside Markdown code fences these backslashes should be removed; otherwise the snippets are not valid C and can confuse downstream consumers/LLMs.

This issue also appears in the following locations of the same file:

  • line 96
  • line 159
  • line 214

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@tareknaser tareknaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed only the first four questions. Please revise the PR and request another review.


---

## Question 1 [5 point(s)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question is worth only 1 point


---

## Question 2 [3 point(s)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here and for the rest of the questions in the exam

Comment on lines +31 to +47

1. abababab ...
2. aaaaaaaa ...
3. bbbbbbbb ...
4. aaaabbbb ...
5. bbbbaaaa ...

```json
{
"problem_id": "1",
"points": 1,
"type": "Freeform",
"tags": ["concurrency"],
"answer": "A",
"llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."
}
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the correct answer. All of the options are possible

1. abababab ... A. Possible
2. aaaaaaaa ... A. Possible
3. bbbbbbbb ... A. Possible
4. aaaabbbb ... A. Possible
5. bbbbaaaa ... A. Possible

Comment on lines +72 to +89
Which of the following are possible outputs of the increment process? To answer: Fill in A for possible, B for not possible.

6. 012345678 ...
7. 012301234 ...
8. 012345670123 ...
9. 01234567891011 ...
10. 123456789 ...

```json
{
"problem_id": "2",
"points": 1,
"type": "Freeform",
"tags": ["concurrency","memory-model"],
"answer": "A",
"llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."
}
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the correct answer. Options 6 and 9 are possible. Other options are not possible. Please double check the exam PDF

Comment on lines +120 to +129
```json
{
"problem_id": "3",
"points": 1,
"type": "Freeform",
"tags": ["concurrency","threads"],
"answer": "B",
"llm_judge_instructions": "Award 1 point if the student's answer equals 'B'. Award 0 points otherwise."
}
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not the correct answer. The possible options are 13 and 14. Other options are not possible. B is a wrong answer

Comment on lines +143 to +152
```json
{
"problem_id": "4",
"points": 1,
"type": "Freeform",
"tags": ["os","process-states"],
"answer": "A",
"llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."
}
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the reference solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants