Redo commented code 2#11
Conversation
Maillman
left a comment
There was a problem hiding this comment.
The updated check is working really well against all of the student code that I've ran it against! It's even catching commented code that the previous check wasn't catching because of empty comment lines inserted in the commented code. Like this commented code that I found when I ran it on a certain student's code:
//Session session;
//URI socketURI = new URI(url+"/ws");
// this.notificationHandler = notificationHandler;
//
// WebSocketContainer container = ContainerProvider.getWebSocketContainer();
//
// this.session = container.connectToServer(this,socketURI);
I also ran it against the autograder code under src/. It unfortunately marked https://github.com/softwareconstruction240/autograder/blob/edcbd89b68d4fdd58d0181363d0cf308ed6af8e0/src/main/java/edu/byu/cs/service/ConfigService.java#L95-L97 as a "3 comment lines likely containing commented code (75% confidence)" and https://github.com/softwareconstruction240/autograder/blob/edcbd89b68d4fdd58d0181363d0cf308ed6af8e0/src/main/java/edu/byu/cs/service/ConfigService.java#L121-L123 as a "3 comment lines likely containing commented code (75% confidence)".
It seems to include the empty comment lines as part of the "likely containing commented code" when I don't think it should.
Could you make it so it doesn't include empty lines as part of the check?
…king reserved words
|
Yep, Thank you for catching that |
| static { | ||
| ENTIRE_COMMENT_CODE_PATTERNS.put(Pattern.compile("[\\s\\S]*(?:if|for|while)\\s*\\([\\s\\S]+\\)\\s*\\{[\\s\\S]*}[\\s\\S]*"), 0.9); //simple if/for/while | ||
| ENTIRE_COMMENT_CODE_PATTERNS.put(Pattern.compile("[\\s\\S]*if\\s*\\([\\s\\S]+\\)\\s*\\{[\\s\\S]*}\\s*else(?:\\sif\\s*\\([\\s\\S]+\\))?\\s*\\{[\\s\\S]*}[\\s\\S]*"), 0.95); //if/else or if/elseif | ||
| ENTIRE_COMMENT_CODE_PATTERNS.put(Pattern.compile("^(?:[\\s\\S]*\\n\\s*)?(?:private|public|protected)?\\s+\\w+\\s+\\w+\\s*\\([\\s\\S]*\\)\\s*\\{[\\s\\S]*}[\\s\\S]*$"), 0.9); //method declaration | ||
|
|
||
| SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^(?:if|while)\\s*\\([\\s\\S]+\\)\\s*\\{?\\s$"), 0.85); //simple if/while | ||
| SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^for\\s*\\([\\s\\S]*;[\\s\\S]*;[\\s\\S]*\\)\\s*\\{?\\s$"), 0.9); //start of for loop | ||
| SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^\\w+\\s+\\w+\\s*=[\\s\\S]*$"), 0.9); //variable declaration | ||
| SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^\\w+\\s*(?:\\+|-|\\*|/|%|&|\\^|\\||<<|>>|>>>)?=[\\s\\S]*$"), 0.85); //variable assignment | ||
| SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^(?:\\w+(?:\\(.*\\))?\\.)*\\w+\\(.*\\);$"), 0.85); //method call | ||
| SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^return\\s+.*;$"), 0.9); // return statements | ||
| SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^\\w+(?:--|\\+\\+);?$"), 0.9); //increment/decrement | ||
| SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^.*;$"), 0.8); //end semicolon | ||
| SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^}$"), 0.6); //only end curly | ||
|
|
||
|
|
||
| ENTIRE_COMMENT_NONCODE_PATTERNS.put(Pattern.compile("^(?:\\*.*\n)*\\*.*?$"), 0.85); //Javadoc comment | ||
| ENTIRE_COMMENT_NONCODE_PATTERNS.put(Pattern.compile("^\\{(?:\\s*\"\\w+\":\\s*(?:\"?\\w+\"?|\"\"|\\[[\\s\\S]*]|\\{[\\s\\S]*}),?\\s*)+}$"), 0.75); //JSON object | ||
|
|
||
| SINGLE_LINE_NONCODE_PATTERNS.put(Pattern.compile("^\\*.*$"), 0.65); //start of javadoc comment line | ||
| SINGLE_LINE_NONCODE_PATTERNS.put(Pattern.compile("^\\*\\s+@(?:param|return|throws)\\s+.*$"), 0.8); //start of javadoc comment line | ||
| SINGLE_LINE_NONCODE_PATTERNS.put(Pattern.compile("^\"\\w+\":\\s*(?:\"?[\\w: ]+\"?|\"\"),?$"), 0.65); //individual line of json field | ||
| SINGLE_LINE_NONCODE_PATTERNS.put(Pattern.compile("^(?i)(?:TODO|FIXME):.*$"), 0.8); | ||
| SINGLE_LINE_NONCODE_PATTERNS.put(Pattern.compile("^NOTE:.*$"), 0.7); | ||
| } |
There was a problem hiding this comment.
@Maillman Are there any other code patterns or common non-code patterns you think we should be checking for?
There was a problem hiding this comment.
I think this is robust enough, and already significantly better than the previous commented code check. I don't think there's anything else I would add.
Maillman
left a comment
There was a problem hiding this comment.
Looks great! I'll get this merged in and make a PR for the autograder to the updated checkstyle jar.
Second improvement to the commented code check. Does more in-depth regex scanning (including regex for negative examples that shouldn't be counted as code, most importantly javadoc comments/documentation) rather than the simple approach of checking the end character of each line.
I'd recommend playing with this, maybe changing some of the default variables, and running it on real student code before merging (I no longer have access to the code of students taking the course)