Merged
Conversation
Implements configurable score extraction from GitHub Actions logs with support for multiple patterns and automatic decimal separator detection.
Features:
- Score extraction from CI logs using configurable regex patterns
- Multiple pattern support (tried in order until match found)
- Automatic decimal separator detection from Google Sheets locale
- Score validation (all occurrences must match)
- Format: v@10.5 or v@10,5 depending on locale
- Combined format with penalty: v@10.5-3 or v@10,5-3
- Frontend display of extracted scores
- Comprehensive test coverage
Backend changes:
- New module: grading/score.py with score extraction logic
- Updated grading/grader.py with check_score() method
- Updated grading/sheets_client.py with get_decimal_separator()
- Updated main.py to integrate score processing
- Updated grading/__init__.py exports
Frontend changes:
- Updated RegistrationForm to display score information
Configuration:
- Add 'score.patterns' list to lab config in YAML
- Patterns are regex with first capturing group = score
- Optional feature (backward compatible)
Documentation:
- Updated CLAUDE.md with score configuration examples
- Added test suite in tests/test_score.py
Example config:
```yaml
labs:
"1":
score:
patterns:
- '##\[notice\]Points\s+(\d+(?:[.,]\d+)?)/\d+'
- 'Score\s+is\s+(\d+(?:[.,]\d+)?)'
```
- Added detailed section on score extraction from CI logs - Included multiple pattern examples for different log formats - Documented flexible pattern syntax with .*? for robust matching - Added tips for creating reliable regex patterns - Explained decimal separator auto-detection - Updated process description with score extraction step - Provided frontend display examples Key improvements: - Flexible pattern: 'ПРЕДВАРИТЕЛЬНАЯ.*?ОЦЕНКА.*?ЖУРНАЛ:\s*(\d+(?:[.,]\d+)?)' - Handles variable whitespace and formatting in logs - Better guidance for YAML regex configuration
Updated pattern to use .*? (non-greedy any character) for robust matching: - 'ПРЕДВАРИТЕЛЬНАЯ.*?ОЦЕНКА.*?ЖУРНАЛ:\s*(\d+(?:[.,]\d+)?)' This handles: - Variable whitespace after GitHub Actions timestamp - Any formatting between key words - Optional whitespace before the score value Added additional patterns for common formats: - 'ИТОГО:\s*(\d+(?:[.,]\d+)?)\s*баллов' for total score format
Enhanced debugging to diagnose score pattern matching issues: - Show first/last 500 chars of logs being searched - Log each pattern attempt with detailed results - Show sample lines containing keywords (ОЦЕНКА, ЖУРНАЛ) - Number patterns and jobs for easier tracking - Display matched values when pattern succeeds - Indicate when no keyword lines found in logs This will help identify: - If logs contain expected text - Encoding or formatting issues - Which pattern (if any) should match - Which jobs are being checked
The issue appears to be that the Python script output (ИТОГОВЫЙ ОТЧЁТ) is not present in the logs fetched from GitHub API, even though it's visible in the GitHub UI. This commit adds diagnostics to: - Search for common report keywords (ИТОГОВЫЙ ОТЧЁТ, ИТОГО:, баллов, etc.) - Show context around any found keywords - Check for timestamps around the expected output time (05:22:34) - Warn if no report keywords found This will help confirm whether the issue is: 1. Logs incomplete/truncated by GitHub API 2. Output in a different job or step 3. Some other fetching issue
Problem: Was only showing first/last 500 chars of logs in debug output, while logs contain 87K+ chars. The Python script output is in the middle. Changes: - Show middle 500 chars sample (around position len/2) - Add case-insensitive keyword search (both exact and lowercase) - Search for timestamp lines around 05:22:34 (expected score time) - Show sample lines from 25%, 50%, 75% positions in logs - Add ОЦЕНКА, ЖУРНАЛ, ПРЕДВАРИТЕЛЬНАЯ to keyword search list - Show position and context when keyword found This will reveal exactly where the score output is in the logs and why the pattern isn't matching it.
Added byte-level debugging to diagnose score pattern matching: - Test simple string search for 'ПРЕДВАРИТЕЛЬНАЯ' to verify Cyrillic works - Show UTF-8 bytes of both pattern and actual log line - Show number of matches found by each pattern attempt - Display sample line containing the target text with its bytes This will reveal if the issue is: - Encoding mismatch between pattern and logs - Pattern syntax error - Invisible characters in logs
Root cause: GitHub API returns job logs in UTF-8, but without proper charset in Content-Type headers. The requests library was auto-detecting encoding incorrectly (likely as Latin-1/ISO-8859-1), causing Cyrillic characters to be mojibake (e.g., 'ПРЕДВАРИТЕЛЬНАЯ' became 'Ð\x9fÑ\x80ед...') Solution: Explicitly set resp.encoding = 'utf-8' before accessing resp.text to force proper UTF-8 decoding of GitHub Actions logs. This fixes: - Pattern matching for Cyrillic text in score extraction - TASKID extraction with Cyrillic output - Any other log parsing with non-ASCII characters Evidence from debug logs: - Simple string search for 'ПРЕДВАРИТЕЛЬНАЯ' was failing - Line 731 showed mojibake: 'ЦенÑ\x82Ñ\x80...' instead of Cyrillic - Pattern bytes were correct UTF-8 (b'\xd0\x9f\xd0\xa0...') - But log text was decoded wrong
Cleaned up verbose debug output now that encoding issue is resolved: - Removed log samples (first/middle/last 500 chars) - Removed keyword search diagnostics - Removed byte-level encoding checks - Removed pattern bytes display - Kept essential logging: log size, pattern attempts, match results The encoding fix (UTF-8) resolved the core issue, so detailed diagnostics are no longer needed for normal operation.
Created docs/COURSE_CONFIG.md with complete reference for all supported YAML configuration options: **Top-level sections:** - course: General course info (name, semester, university, etc.) - course.github: GitHub integration (organization, teachers) - course.google: Google Sheets integration (spreadsheet, columns) - course.staff: Teaching staff list - course.labs: Lab configurations (per lab settings) - misc: System settings (timeouts, etc.) **Lab-level options:** - Basic: github-prefix, short-name - CI/CD: ci (workflows/jobs configuration) - TASKID: taskid-max, taskid-shift, ignore-task-id - Penalties: penalty-max, penalty-strategy (weekly/daily) - Score extraction: score.patterns (regex list for extracting points) - File checks: files, forbidden-modifications - MOSS: language, max-matches, local-path, additional, basefiles - Requirements: report (required sections) - Validation: commits, issues (custom validation rules) **Key features documented:** - Score extraction with regex patterns and decimal separator auto-detection - Penalty strategies (weekly vs daily) - TASKID validation with shift calculation - CI workflow filtering - MOSS plagiarism detection configuration - Custom validation rules for commits/issues Updated CLAUDE.md to reference new documentation.
Problem: Deadlines specified as dates only (e.g., '19.11.2025') were parsed as midnight (00:00:00) instead of end of day (23:59:59). This caused incorrect penalty calculations: Example 1 (incorrect before fix): - Deadline in sheet: 19.11.2025 - Parsed as: 19.11.2025 00:00:00 UTC+3 - Tests passed: 2025-11-18T22:55:34Z = 19.11.2025 01:55:34 UTC+3 - Result: 01:55:34 > 00:00:00 → late by ~1.5 hours → penalty -1 - Expected: No penalty (submitted before end of deadline day) Example 2 (incorrect before fix): - Deadline: 19.11.2025 00:00:00 - Submitted: 03.12.2025 10:00 - Delta: 14 days 10 hours - Calculation: 14 // 7 = 2, 14 % 7 = 0, but seconds > 0 → rounds up - Result: 2 + 1 = 3 weeks → penalty -3 - Expected: penalty -2 Solution: When deadline is parsed without explicit time, set time to 23:59:59 (end of day). This makes deadlines like '19.11.2025' mean 'until 23:59:59 on that day' as users expect. After fix: - Deadline: 19.11.2025 23:59:59 UTC+3 - Example 1: 01:55:34 < 23:59:59 → no penalty ✓ - Example 2: 13 days delay → (13//7 + 1) = 2 weeks → penalty -2 ✓ The weekly penalty calculation formula is correct and unchanged: - 1-7 days late = -1 point - 8-14 days late = -2 points - 15-21 days late = -3 points etc.
Added ability to view archived courses for students with academic debt. Frontend changes: - Added status parameter to fetchCourses() API call (active/archived/all) - Added course status toggle buttons above course list - Active courses (default) - shows currently running courses - Archived courses - shows past courses for debt resolution - All courses - shows both (admin only) - Status selection persists across course list updates UI implementation: - ButtonGroup with 3 status options (active/archived/all) - Highlighted active selection with colors.selected - Admin-only "All courses" option for full visibility - Automatically refetch courses when status changes Translations added (ru/en/zh): - activeCourses: "Активные курсы" / "Active Courses" / "活跃课程" - archivedCourses: "Архив курсов" / "Archived Courses" / "归档课程" - allCourses: "Все курсы" / "All Courses" / "所有课程" Backend support: - Backend already supports ?status= query parameter - active (default), archived, all This allows students with academic debt to access archived course materials while keeping main page focused on current courses.
- Move from center to fixed top-right corner (below language selector) - Change to compact vertical button group (120px width) - Fix color contrast: selected=#3c3c43 (dark) with white text, unselected=#f5f5f5 (light) with dark text - Reduce prominence: smaller font (12px), less padding - Position at top:60px, right:16px, z-index:2999 This makes the toggle less obtrusive since archived courses are used by few students while active courses serve hundreds.
Changed position from top:60px to top:110px to prevent overlap with "Для преподавателей" button.
Increased z-index from 3000 to 3200 and added MenuProps to ensure dropdown menu appears above admin button (z-index 3100) and course status toggle (z-index 2999).
Using disablePortal renders the dropdown in normal DOM hierarchy instead of Portal, which should properly apply z-index stacking.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements configurable score extraction from GitHub Actions logs with support for multiple patterns and automatic decimal separator detection.
Features:
Backend changes:
Frontend changes:
Configuration:
Documentation:
Example config: