-
Notifications
You must be signed in to change notification settings - Fork 9
add cs537 spring 2018 midterm #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
777lefty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't touch every code snippet, this one looks good to me as long as every code snippet is in code block
a6dfadc to
a23f522
Compare
a23f522 to
52511e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds a new CourseExam raw exam dataset for CS 537 Spring 2018 Midterm to the courseexam_bench benchmark corpus.
Changes:
- Introduces a new raw exam markdown file with exam-level metadata and 30 question entries.
- Encodes per-question grading fields (
answer,llm_judge_instructions, tags, points) for dataset generation viaprepare.py.
Comments suppressed due to low confidence (3)
benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md:110
- This C snippet still uses escaped quotes inside a Markdown code fence (e.g.,
printf(\"%d\\n\", counter);). Please remove the backslashes so the code is valid C and the rendered prompt matches the original exam.
```c
volatile int counter = 1000;
void *worker(void *arg) {
counter--;
return NULL;
}
int main(int argc, char *argv[]) {
pthread_t p1, p2;
pthread_create(&p1, NULL, worker, NULL);
pthread_create(&p2, NULL, worker, NULL);
pthread_join(p1, NULL);
pthread_join(p2, NULL);
printf(\"%d\\n\", counter);
return 0;
}
benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md:165
- This code fence includes escaped quotes (
printf(\"a\");,printf(\"b\");). In Markdown code blocks the quotes should not be escaped; please drop the backslashes to keep the snippet valid C.
```c
int main(int argc, char *argv[]) {
printf(\"a\");
fork();
printf(\"b\");
return 0;
}
benchmarks/courseexam_bench/data/raw/cs537_spring_2018_midterm/exam.md:226
- In this code fence, the string literal is escaped (
{ \"/bin/true\", NULL }) andprintf(\"1\")etc. Please remove the backslashes inside the C snippet so it compiles as written and renders cleanly in Markdown.
```c
int main(int argc, char *argv[]) {
int rc = fork();
if (rc == 0) {
char *my_argv[] = { \"/bin/true\", NULL };
execv(my_argv[0], my_argv);
printf(\"1\");
} else if (rc > 0) {
wait(NULL);
printf(\"2\");
} else {
printf(\"3\");
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "type": "Freeform", | ||
| "tags": ["concurrency"], | ||
| "answer": "A", | ||
| "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise." |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These A/B style questions are deterministic single-token answers, but they are marked as type: "Freeform" and graded via LLM rubric. To match the benchmark's established pattern (e.g., example_course_2024_midterm and cs537_fall_2021_final use ExactMatch for multiple-choice/True-False), consider using type: "ExactMatch" with choices (e.g., ["possible", "not possible"]) so grading is simple and consistent.
| "type": "Freeform", | |
| "tags": ["concurrency"], | |
| "answer": "A", | |
| "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise." | |
| "type": "ExactMatch", | |
| "tags": ["concurrency"], | |
| "answer": "A", | |
| "choices": ["A", "B"] |
| "points": 1, | ||
| "type": "Freeform", | ||
| "tags": ["concurrency"], | ||
| "answer": "A", | ||
| "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise." |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many questions list multiple sub-items (e.g., Q1 lists 1–5) but the JSON block provides only a single-letter answer and rubric, making the intended target ambiguous and effectively ungradeable as written. Please either (a) split each sub-item into its own question with distinct problem_id values (the README explicitly allows forms like "2.1"/"4a") and per-sub-item points, or (b) change answer to encode all sub-answers (e.g., a 5-letter string) and update llm_judge_instructions (and points/score_total if needed) to grade accordingly.
| "points": 1, | |
| "type": "Freeform", | |
| "tags": ["concurrency"], | |
| "answer": "A", | |
| "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise." | |
| "points": 5, | |
| "type": "Freeform", | |
| "tags": ["concurrency"], | |
| "answer": "AAAAA", | |
| "llm_judge_instructions": "The question has five sub-items (1–5). The expected answer is a 5-character string over {A, B}, where each character is the answer (A or B) to the corresponding sub-item in order. Compare the student's answer string to \"AAAAA\" character by character. Award 1 point for each position where the student's character matches the expected character (up to 5 points total). Award 0 points for mismatched or missing characters." |
| ```c | ||
| int value = 0; | ||
| int main(int argc, char *argv[]) { | ||
| while (1) { | ||
| printf(\"%d\", value); | ||
| value++; | ||
| } | ||
| return 0; | ||
| } | ||
| ``` |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several C code blocks contain escaped quotes (e.g., printf(\"%d\", value);, printf(\"a\");, and char *my_argv[] = { \"/bin/true\", NULL };). Inside Markdown code fences these backslashes should be removed; otherwise the snippets are not valid C and can confuse downstream consumers/LLMs.
This issue also appears in the following locations of the same file:
- line 96
- line 159
- line 214
tareknaser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed only the first four questions. Please revise the PR and request another review.
|
|
||
| --- | ||
|
|
||
| ## Question 1 [5 point(s)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question is worth only 1 point
|
|
||
| --- | ||
|
|
||
| ## Question 2 [3 point(s)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and for the rest of the questions in the exam
|
|
||
| 1. abababab ... | ||
| 2. aaaaaaaa ... | ||
| 3. bbbbbbbb ... | ||
| 4. aaaabbbb ... | ||
| 5. bbbbaaaa ... | ||
|
|
||
| ```json | ||
| { | ||
| "problem_id": "1", | ||
| "points": 1, | ||
| "type": "Freeform", | ||
| "tags": ["concurrency"], | ||
| "answer": "A", | ||
| "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise." | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not the correct answer. All of the options are possible
1. abababab ... A. Possible
2. aaaaaaaa ... A. Possible
3. bbbbbbbb ... A. Possible
4. aaaabbbb ... A. Possible
5. bbbbaaaa ... A. Possible
| Which of the following are possible outputs of the increment process? To answer: Fill in A for possible, B for not possible. | ||
|
|
||
| 6. 012345678 ... | ||
| 7. 012301234 ... | ||
| 8. 012345670123 ... | ||
| 9. 01234567891011 ... | ||
| 10. 123456789 ... | ||
|
|
||
| ```json | ||
| { | ||
| "problem_id": "2", | ||
| "points": 1, | ||
| "type": "Freeform", | ||
| "tags": ["concurrency","memory-model"], | ||
| "answer": "A", | ||
| "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise." | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not the correct answer. Options 6 and 9 are possible. Other options are not possible. Please double check the exam PDF
| ```json | ||
| { | ||
| "problem_id": "3", | ||
| "points": 1, | ||
| "type": "Freeform", | ||
| "tags": ["concurrency","threads"], | ||
| "answer": "B", | ||
| "llm_judge_instructions": "Award 1 point if the student's answer equals 'B'. Award 0 points otherwise." | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not the correct answer. The possible options are 13 and 14. Other options are not possible. B is a wrong answer
| ```json | ||
| { | ||
| "problem_id": "4", | ||
| "points": 1, | ||
| "type": "Freeform", | ||
| "tags": ["os","process-states"], | ||
| "answer": "A", | ||
| "llm_judge_instructions": "Award 1 point if the student's answer equals 'A'. Award 0 points otherwise." | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the reference solution
Description
Add CS 537 Spring 2018 Midterm exam dataset and solution file to the repository.
Changes
Testing
Ran data preparation script: python3 courseexam/prepare.py, completed without errors.
Checklist