add cs537 spring 2018 midterm #90

qimcis · 2026-01-18T05:18:51Z

Description

Add CS 537 Spring 2018 Midterm exam dataset and solution file to the repository.

Changes

Add exam metadata for "CS 537 Spring 2018 Midterm" (ID: cs537_spring_2018_midterm).
Add raw exam directory at /home/qi/system-intelligence-benchmark/benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm.
Add solutions file 18-spring-mid-answers.pdf and note that there are no reference files.

Testing

Ran data preparation script: python3 courseexam/prepare.py, completed without errors.

Checklist

Tests pass locally
Code follows project style guidelines
Documentation updated (if needed)

777lefty

Didn't touch every code snippet, this one looks good to me as long as every code snippet is in code block

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md

Copilot

Pull request overview

Adds a new CourseExam raw exam dataset for CS 537 Spring 2018 Midterm to the courseexam_bench benchmark corpus.

Changes:

Introduces a new raw exam markdown file with exam-level metadata and 30 question entries.
Encodes per-question grading fields (answer, llm_judge_instructions, tags, points) for dataset generation via prepare.py.

Comments suppressed due to low confidence (3)

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md:110

This C snippet still uses escaped quotes inside a Markdown code fence (e.g., printf(\"%d\\n\", counter);). Please remove the backslashes so the code is valid C and the rendered prompt matches the original exam.

```c
volatile int counter = 1000;
void *worker(void *arg) {
  counter--;
  return NULL;
}
int main(int argc, char *argv[]) {
  pthread_t p1, p2;
  pthread_create(&p1, NULL, worker, NULL);
  pthread_create(&p2, NULL, worker, NULL);
  pthread_join(p1, NULL);
  pthread_join(p2, NULL);
  printf(\"%d\\n\", counter);
  return 0;
}

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md:165

This code fence includes escaped quotes (printf(\"a\");, printf(\"b\");). In Markdown code blocks the quotes should not be escaped; please drop the backslashes to keep the snippet valid C.

```c
int main(int argc, char *argv[]) {
  printf(\"a\");
  fork();
  printf(\"b\");
  return 0;
}

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md:226

In this code fence, the string literal is escaped ({ \"/bin/true\", NULL }) and printf(\"1\") etc. Please remove the backslashes inside the C snippet so it compiles as written and renders cleanly in Markdown.

```c
int main(int argc, char *argv[]) {
  int rc = fork();
  if (rc == 0) {
    char *my_argv[] = { \"/bin/true\", NULL };
    execv(my_argv[0], my_argv);
    printf(\"1\");
  } else if (rc > 0) {
    wait(NULL);
    printf(\"2\");
  } else {
    printf(\"3\");
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T05:26:16Z

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md

+  "type": "Freeform",
+  "tags": ["concurrency"],
+  "answer": "A",
+  "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."


These A/B style questions are deterministic single-token answers, but they are marked as type: "Freeform" and graded via LLM rubric. To match the benchmark's established pattern (e.g., example_course_2024_midterm and cs537_fall_2021_final use ExactMatch for multiple-choice/True-False), consider using type: "ExactMatch" with choices (e.g., ["possible", "not possible"]) so grading is simple and consistent.

Suggested change

"type": "Freeform",

"tags": ["concurrency"],

"answer": "A",

"llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."

"type": "ExactMatch",

"tags": ["concurrency"],

"answer": "A",

"choices": ["A", "B"]

Copilot · 2026-01-27T05:26:17Z

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md

+  "points": 1,
+  "type": "Freeform",
+  "tags": ["concurrency"],
+  "answer": "A",
+  "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."


Many questions list multiple sub-items (e.g., Q1 lists 1–5) but the JSON block provides only a single-letter answer and rubric, making the intended target ambiguous and effectively ungradeable as written. Please either (a) split each sub-item into its own question with distinct problem_id values (the README explicitly allows forms like "2.1"/"4a") and per-sub-item points, or (b) change answer to encode all sub-answers (e.g., a 5-letter string) and update llm_judge_instructions (and points/score_total if needed) to grade accordingly.

Suggested change

"points": 1,

"type": "Freeform",

"tags": ["concurrency"],

"answer": "A",

"llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."

"points": 5,

"type": "Freeform",

"tags": ["concurrency"],

"answer": "AAAAA",

"llm_judge_instructions": "The question has five sub-items (1–5). The expected answer is a 5-character string over {A, B}, where each character is the answer (A or B) to the corresponding sub-item in order. Compare the student's answer string to \"AAAAA\" character by character. Award 1 point for each position where the student's character matches the expected character (up to 5 points total). Award 0 points for mismatched or missing characters."

Copilot · 2026-01-27T05:26:17Z

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md

+```c
+int value = 0;
+int main(int argc, char *argv[]) {
+  while (1) {
+    printf(\"%d\", value);
+    value++;
+  }
+  return 0;
+}
+```


Several C code blocks contain escaped quotes (e.g., printf(\"%d\", value);, printf(\"a\");, and char *my_argv[] = { \"/bin/true\", NULL };). Inside Markdown code fences these backslashes should be removed; otherwise the snippets are not valid C and can confuse downstream consumers/LLMs.

This issue also appears in the following locations of the same file:

line 96

line 159

line 214

tareknaser

I reviewed only the first four questions. Please revise the PR and request another review.

tareknaser · 2026-01-27T17:19:55Z

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md

+
+---
+
+## Question 1 [5 point(s)]


This question is worth only 1 point

tareknaser · 2026-01-27T17:20:09Z

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md

+
+---
+
+## Question 2 [3 point(s)]


Same here and for the rest of the questions in the exam

tareknaser · 2026-01-27T17:23:57Z

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md

+
+1. abababab ...
+2. aaaaaaaa ...
+3. bbbbbbbb ...
+4. aaaabbbb ...
+5. bbbbaaaa ...
+
+```json
+{
+  "problem_id": "1",
+  "points": 1,
+  "type": "Freeform",
+  "tags": ["concurrency"],
+  "answer": "A",
+  "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."
+}
+```


That's not the correct answer. All of the options are possible

1. abababab ... A. Possible 2. aaaaaaaa ... A. Possible 3. bbbbbbbb ... A. Possible 4. aaaabbbb ... A. Possible 5. bbbbaaaa ... A. Possible

tareknaser · 2026-01-27T17:25:37Z

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md

+Which of the following are possible outputs of the increment process? To answer: Fill in A for possible, B for not possible.
+
+6. 012345678 ...
+7. 012301234 ...
+8. 012345670123 ...
+9. 01234567891011 ...
+10. 123456789 ...
+
+```json
+{
+  "problem_id": "2",
+  "points": 1,
+  "type": "Freeform",
+  "tags": ["concurrency","memory-model"],
+  "answer": "A",
+  "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."
+}
+```


That's not the correct answer. Options 6 and 9 are possible. Other options are not possible. Please double check the exam PDF

tareknaser · 2026-01-27T17:26:51Z

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md

+```json
+{
+  "problem_id": "3",
+  "points": 1,
+  "type": "Freeform",
+  "tags": ["concurrency","threads"],
+  "answer": "B",
+  "llm_judge_instructions": "Award 1 point if the student's answer equals 'B'. Award 0 points otherwise."
+}
+```


Also not the correct answer. The possible options are 13 and 14. Other options are not possible. B is a wrong answer

tareknaser · 2026-01-27T17:27:18Z

benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md

+```json
+{
+  "problem_id": "4",
+  "points": 1,
+  "type": "Freeform",
+  "tags": ["os","process-states"],
+  "answer": "A",
+  "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise."
+}
+```


Please check the reference solution

777lefty reviewed Jan 22, 2026

View reviewed changes

qimcis force-pushed the cs537-spring-2018-midterm branch from a6dfadc to a23f522 Compare January 26, 2026 15:39

qimcis added 3 commits January 26, 2026 11:25

add CS 537 Spring 2018 Midterm

fef84a0

add CS-537: Midterm Spring 2018 Mission Impossible

748e1db

add CS 537 Spring 2018 Midterm

52511e9

qimcis force-pushed the cs537-spring-2018-midterm branch from a23f522 to 52511e9 Compare January 26, 2026 16:34

courseexam: wrap midterm code snippets in code blocks

98e1365

qimcis marked this pull request as ready for review January 27, 2026 04:55

xuafeng requested review from Copilot and tareknaser January 27, 2026 05:22

Copilot started reviewing on behalf of xuafeng January 27, 2026 05:22 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

tareknaser requested changes Jan 27, 2026

View reviewed changes

add cs537 spring 2018 midterm #90

Are you sure you want to change the base?

add cs537 spring 2018 midterm #90

Uh oh!

Conversation

qimcis commented Jan 18, 2026

Description

Changes

Testing

Checklist

Uh oh!

777lefty left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

tareknaser left a comment

Choose a reason for hiding this comment

Uh oh!

tareknaser Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

tareknaser Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

tareknaser Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

tareknaser Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

tareknaser Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

tareknaser Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants