Human in the loop assignment submission: jdvakil#82
Human in the loop assignment submission: jdvakil#82
Conversation
* Enhance audit pages with textbook-style layout - Add left sidebar with chapter navigation - Add right sidebar for table of contents - Use AuditLayout component to match textbook styling - Improves visual consistency across the site * Add FAQ section to Paper Audit assignment Integrate comprehensive FAQ covering: - Scope and technical focus (primary paper, interface auditing) - Format and objectives (argue vs summarize, Amazon Principle) - Presentation logistics (audit vs presentation, time allocation) Students now have clear guidance on what the audit should accomplish and how it differs from a traditional paper summary. * Fix staging deployment SSH timeout issue - Add stdin redirect (< /dev/null) to properly detach nohup - Add sleep before starting new server to ensure old process is killed - Remove PID echo that could cause hanging This should prevent the SSH action from being terminated with SIGTERM. * Fix PR preview paths for different content types - Detect whether PR contains audit or contributor profile - Generate correct preview URL based on content type - Show appropriate checklist (audit vs contributor) - Extract actual filename for accurate path Fixes incorrect preview URLs for contributor profile PRs like #25. * Fix content type detection in PR preview workflow - Fetch base branch explicitly for git diff - Store FILES output to avoid repeated git diff calls - Use fetch-depth: 0 to ensure full git history available This should properly detect contributor vs audit files. * Add contributor profile: Gyanig Kumar (#25) Co-authored-by: Christoffer Heckman <christoffer.heckman@colorado.edu> * Fix SSH timeout with setsid and explicit exit - Use setsid instead of nohup for better process daemonization - Add explicit exit 0 after starting background process - Move success message before starting background job - Increase sleep to 2s to ensure pkill completes This should prevent the SSH session from hanging and getting terminated. * Add debug output to staging deployment script * Split deployment into two SSH sessions Separate static site deployment from API server startup to avoid SSH timeout. Each SSH session is now shorter and focused on a single task. * Remove unused API server from staging deployment The API server was for a comments/review system that is not currently implemented. Removing it fixes the SSH timeout issue. * Remove unused API server infrastructure The comments/review system was never fully implemented. The frontend components (CommentSidebar, API routes) don't exist and the system is incompatible with static export mode. * Support audits in both staging and production directories Some students may mistakenly place audits in content/textbook/audits/ instead of content/textbook/audits/staging/. Detect both locations. * Update Scratch-1 due date to February 1 * Add contributor profile: Thanushraam Suresh Kumar (#28) --------- Co-authored-by: Gyanig Kumar <gyanig.kumar@gmail.com> Co-authored-by: Thanushraam <45840572+Tr0612@users.noreply.github.com>
- Complete testing infrastructure (public & internal tests) - Solution management scripts (inject/reset) - Sanitization pipeline for public sync - GitHub Actions workflow for automated sync - Complete documentation (setup guides, references) - Example solution for scratch-1 assignment See PRIVATE_REPO_SETUP.md for complete architecture details.
BREAKING CHANGES: - Reorganized scripts into CI-critical (scripts/) vs dev helpers (scripts/dev/) - Enhanced sanitization pipeline with fail-safe validation - Added frontmatter validation for audit MDX files - Implemented Review Mode banner for PR previews Phase 1: Script Consolidation - Created scripts/dev/ for local development helpers - Moved 6 helper scripts to scripts/dev/ - Added README.md to both scripts/ and scripts/dev/ - Clear separation: CI-critical scripts stay in scripts/ Phase 2: Sanitization Pipeline Hardening - scripts/_sanitize_todos.py now fail-safe with exit codes - Added 3-step validation (pre, sanitize, post) - Enhanced error messages with line numbers - sync-to-public.yml now includes pre/post validation - Zero-tolerance for [SOLUTION] leaks Phase 3: Unified Linting for Audits - audit_linter.py validates required frontmatter fields - Checks: title, author, topic, paper (no empty/placeholder values) - Updated vla-audit.yml error messages Phase 4: Next.js Routing Cleanup - Dynamic staging prefix handling with STAGING_PR_NUMBER - Added Review Mode banner to AuditLayout.tsx - Shows PR number in preview banner - Better visual distinction for review vs production Phase 5: Repository Cleanup - Removed vercel.json (GitHub Actions/Pages deployment) - Added SOURCE OF TRUTH section to README.md - Clear repository ownership map - sanitize.sh removes scripts/dev/ Security Improvements: - Fail-safe sanitization (cannot sync if markers remain) - Pre-flight validation before sanitization - Post-flight verification after sanitization - Proper exit codes for all scripts - Repository clarity prevents accidental public pushes Documentation: - Scripts categorized and documented - README.md warns against pushing to public - Audit requirements clearly specified - REFACTOR_SUMMARY.md with complete changelog See REFACTOR_SUMMARY.md for detailed breakdown.
Technical Hardening - Tasks 1, 2, 3, & 4: Documentation (Task 1): - Create PRIVATE_OPERATIONS.md merging all instructor docs - Paper Audit Review Workflow - Server Configuration (Apache, systemd, API) - Deployment Procedures - Troubleshooting Guide - Update README.md with instructor-facing PST guide - Document Shadow CI, solution management, assignment lifecycle Hardened Sanitization (Task 2): - Enhance sanitize.sh: - Add draft block removal for MDX files - Add README.md overwrite step during sync - Renumber steps for clarity - Enhance _sanitize_todos.py: - Add multi-line [SOLUTION] comment detection - Add triple-quoted docstring [SOLUTION] detection - Improve regex patterns with proper ordering - Enhanced fail-safe verification Shadow CI (Task 3): - Create shadow-tester.yml workflow - Triggered by repository_dispatch from public repo - Fetches student code from public PR - Runs internal rigorous tests - Comments Pass/Fail status on public PR Pre-Commit Guard (Task 4): - Add executable pre-commit hook - Scans staged files for [SOLUTION] markers - Blocks commits if leaks detected in public-facing files - Provides clear fix instructions Documentation: - Create HARDENING_COMPLETE.md with full implementation report - Create BEFORE_AFTER.md documenting changes
Task 1: Prune Documentation - Consolidate PRIVATE_OPERATIONS.md, REVIEW_SYSTEM.md, PRIVATE_REPO_SETUP.md into comprehensive INSTRUCTOR.md - Delete 11 obsolete documentation files: - APACHE_CONFIG.md - BEFORE_AFTER.md - DEPLOYMENT_SUCCESS.md - HARDENING_COMPLETE.md - PRIVATE_OPERATIONS.md (merged) - PRIVATE_REPO_SETUP.md (merged) - QUICK_REFERENCE.md - REFACTOR_SUMMARY.md - REVIEW_SYSTEM.md (merged) - SETUP_COMPLETE.md - SYSTEM_COMPLETE.md Task 2: Refactor Solution Management - Rename scripts/manage_solutions.py → scripts/dev_utils.py - Add --verify-clean command: - Scans src/assignments/ for solution code leaks - Compares files against private/solutions/ using difflib - Exits with error if similarity > 80% (prevents accidental commits) - Normalizes code (removes comments/whitespace) for accurate comparison Task 3: Harden Sync Workflow - Update .github/workflows/sync-to-public.yml to use Orphan Push strategy: - git checkout --orphan temp-public-branch - git add -A && git commit - git push public temp-public-branch:main --force - Breaks ALL git history links between private and public repos - Public repo has completely independent history - Update leak detection to check for dev_utils.py instead of manage_solutions.py - Update sanitize.sh to delete dev_utils.py Task 4: Cleanup Public README - Public README already student-centric from previous hardening - No changes needed Benefits: - Single comprehensive instructor guide (INSTRUCTOR.md) - Enhanced solution leak prevention (--verify-clean) - Maximum security via orphan push (no history exposure) - Cleaner repository structure
Create Claude Code Project Skill for solution leak prevention: Task 1: Initialize Skill Structure - Create .claude/skills/vla-guard/ directory - Create .claude/skills/vla-guard/SKILL.md Task 2: Define SKILL.md Logic - Add frontmatter: - name: vla-guard - description: Final audit to prevent solution/internal test leaks - user-invocable: true - Implement 5-step audit process: 1. Identify all solutions (python3 scripts/dev_utils.py --list) 2. Scan for solution content leaks ([SOLUTION] markers) 3. Verify private/ and tests/internal/ not staged 4. Check git log for accidental solution commits 5. Check for sensitive file leaks Task 3: Create Custom Slash Command - Create .claude/commands/pre-flight.md - Invokes /vla-guard skill first - Runs scripts/sanitize.sh only if guard passes - Provides comprehensive pre-flight check before push/PR Features: - Color-coded audit reports (✅/❌/⚠️ ) - Integration with dev_utils.py --verify-clean - Fail-safe: blocks sanitization if audit fails - Clear remediation instructions on failure Usage: /vla-guard - Run security audit only /pre-flight - Run audit + sanitization pipeline Also added: - REFACTOR_COMPLETE.md - Documentation of consolidation work
Create 6 new Claude Code skills for VLA Foundations workflow automation: 1. /test-rigor - Internal grading test runner - Auto-injects solutions before testing - Runs pytest with rigor markers - Generates test reports - Auto-resets to starter code 2. /generate-fixtures - Gold standard fixture generator - Creates reference data for fidelity tests - Uses fixed random seeds (seed=42) - Generates model outputs and checkpoints - Verifies no NaNs in fixtures 3. /grade - Automated student PR grading - Fetches student code from GitHub - Runs public and internal tests - Generates detailed feedback reports - Posts comments on PRs - Updates PR labels 4. /release - Safe assignment publishing workflow - Runs VLA Guard audit (fail-fast) - Executes sanitization pipeline - Creates release tags - Monitors GitHub Actions - Verifies public repo - Checks deployment status 5. /new-assignment - Assignment scaffolding generator - Creates complete directory structure - Generates starter code with TODOs - Generates solution templates - Generates test templates - Creates MDX assignment spec 6. /sync-check - Post-release verification - Clones public repo (read-only) - Scans for solution leaks - Verifies orphan push strategy - Checks deployment status - Generates verification reports Additional changes: - Add command shortcuts in .claude/commands/ - Create directories for reports and releases - Update .gitignore to ignore generated reports - Add comprehensive README.md for skills Benefits: - Automates instructor workflows - Fail-safe protection against leaks - Comprehensive audit trails - Reduces manual errors - Speeds up grading and releases
Create comprehensive guide for AI SWE agents working with student code: - Student workflow (branch, implement, test, submit) - Public testing philosophy - Assignment structure and TODOs - Git hygiene (rebase-only workflow) - Semantic line breaks in MDX - Common issues and solutions - Engineering standards and grading rubric Key sections: - Complete assignment workflow example - Commands useful for students - Testing with public tests - PR submission process - Resources and documentation links Student-focused: - No references to private repo or solutions - Only public tests documented - Clear submission guidelines - Common troubleshooting tips - Resources for help Fixes from template: - Removed manage_solutions.py references (private only) - Removed audit_linter.py references (doesn't exist) - Fixed Google search link placeholders - Added actual file paths - Clarified student permissions (can't merge own PRs)
Create detailed guide for AI SWE agents working in private repo: - Dual-repository architecture explanation - Complete Claude Code skills documentation - Solution management workflow (dev_utils.py) - Testing philosophy (public vs internal) - Sanitization pipeline details - Security boundaries and leak prevention - Typical workflows for instructors - Shadow CI explanation - Orphan push strategy Key sections: - 7 Claude Code skills with usage examples - Commands useful in development - Pre-release checklist - Git hygiene and rebase-only workflow - File map with actual paths (not Google search links) Fixes: - Corrected manage_solutions.py → dev_utils.py - Added missing Claude Code skills section - Included Shadow CI documentation - Added security boundaries section - Removed Google search link placeholders
- Replace backbone_solution.py with corrected version matching student template - Uses combined qkv_proj (not separate q/k/v projections) - Uses F.silu() (SwiGLU) activation - Implements all 4 TODOs correctly - Fixes tensor contiguity issue in loss computation - Add gold standard test fixtures - private/fixtures/scratch1_gold_output.pt - private/fixtures/scratch1_attention_fixture.pt - private/fixtures/scratch1_rmsnorm_fixture.pt - Generator script for reproducible fixtures - Update test infrastructure - Add load_gold_standard and sample_batch fixtures - Mark DINOv2 tests as mastery (skipped for core assignment) - Fix subprocess calls to use sys.executable - Add mastery marker to pytest.ini - Add uv package management - pyproject.toml with torch, numpy, pytest dependencies - uv.lock for reproducible environments - Update claude.md with uv usage instructions Test results: 7 passed, 2 skipped (mastery tests) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The orphan push strategy was destroying all student work on the public repo. Now uses merge strategy that: - Fetches current public repo state - Merges sanitized content into main - Preserves all student branches and work
Files that document the sanitization process (README, INSTRUCTOR.md, etc.) mention [SOLUTION] markers as part of explaining how the system works. These are not actual solution code and should be excluded from the check. Also excludes: - .claude/ directory (skill definitions) - .github/ directory (workflow definitions)
YAML syntax error on line 224 caused by unquoted multi-line string. Using heredoc pattern for proper bash multi-line string handling.
Previous issue: Averaging 7 joint deltas into single action caused massive information loss. Model achieved only ~4.6 loss (barely better than random ~5.5). New action encoding: - Direction: 8 octants (±X, ±Y, ±Z) → 3 bits - Magnitude: Distance to target, 32 bins → 5 bits - Total: 8 * 32 = 256 discrete actions This is LEARNABLE because: - Model sees state (joint angles + end-effector position) - Can compute error vector toward target - Can predict corresponding action deterministically Additional improvements: - Reduced noise (0.05 → 0.02) for clearer patterns - Stronger gradient component toward target (0.01 → 0.05) - More deterministic motion generation Expected result: Loss should converge significantly below previous 4.6.
## Changes ### PyTorch CUDA 11.8 Support - Updated pyproject.toml to use CUDA 11.8 index - Constrained Python to 3.10-3.13 (CUDA wheels limitation) - Now supports P6000 GPUs (compute capability 6.1) ### Training Improvements - Training completes in 41s on GPU (vs 7+ hours on CPU) - Updated training_run.py to use GPU when available - Removed overly restrictive compute capability check ### Data Generation Fix (Previously Committed) - Structured action encoding: direction (8 octants) + magnitude (32 bins) - Actions now learnable from state (error vector) - Loss improved from ~4.6 to ~1.97 ## Results **Training Performance:** - Loss: 3.27 → 1.96 (consistent convergence) - Time: 41 seconds on P6000 - All 7 internal rigor tests passing **vs Previous (averaged actions):** - Loss: ~4.6 (barely better than random) - Model couldn't learn meaningful patterns ## Test Results ``` 7 passed, 2 deselected (mastery features) - test_training_convergence ✓ - test_attention_gradient_flow ✓ - test_causal_mask_prevents_future_leakage ✓ - test_rmsnorm_numerics ✓ - test_model_output_distribution ✓ - test_loss_computation_correctness ✓ - test_overfitting_on_single_batch ✓ ``` Solution ready for grading student submissions.
**Problem**: Assignment required loss < 1.0, but actual correct implementation achieves ~1.9-2.0. **Changes**: - Updated Pass Level to require "clear convergence" not arbitrary threshold - Added expected loss ranges: Initial ~3-4, Final ~1.9-2.2 - Added FAQ explaining what loss to expect - Emphasize learning trajectory over absolute value **Rationale**: - Action encoding (direction + magnitude) is learnable but not trivial - Random guessing: ~5.5 (log(256)) - Structured learning: ~1.9-2.0 (significant improvement) - Internal tests only verify loss decreases, not absolute threshold This aligns assignment expectations with reality.
Changed 'EOF' to EOF (without quotes) to allow ${TAG_NAME} and
${RELEASE_DATE} to be interpolated in the commit message.
Heredoc inside YAML run block caused parsing errors. Using simple multiline string assignment instead.
Previous approach with multiline string caused YAML parsing errors. Using git commit's multiple -m flag feature instead - each -m creates a new paragraph in the commit message.
Changes: - Verify PUBLIC_REPO_TOKEN is set before proceeding - Use git credential helper instead of embedding token in URL - More secure credential handling - Better error messages if token is missing
The credential helper approach wasn't working in GitHub Actions. Reverting to the simpler, proven approach of embedding the token directly in the remote URL.
Updated to use PUBLIC_REPO_TOKEN_2 as the original token was deleted.
Added: - Show token prefix to verify it's set - Test remote access with ls-remote before pushing - Better error messages for auth failures
The actions/checkout was configuring git with the default GITHUB_TOKEN, which was overriding our PUBLIC_REPO_TOKEN_2 during push operations. Changes: - Set persist-credentials: false on checkout action - Explicitly unset credential helpers before using our token - Set credential.useHttpPath to prevent token reuse across repos This ensures only PUBLIC_REPO_TOKEN_2 is used for pushing to public repo.
…c sync Added removal of: - INSTRUCTOR.md, INSTRUCTOR_GUIDE.md, API_SETUP.md - SETUP_WITH_GH_CLI.md, QUICK_START_SSH.md - .claude/ directory (instructor workflow automation) - .github/workflows/sync-to-public.yml (the sync workflow itself) Added validation checks to ensure these files are removed before push.
Sanitized content from private repository. This release includes updated assignment materials. Changes: - Updated assignment templates - Fixed bugs and improvements - Documentation updates
- New /course/top-reviewers page with podium, charts (recharts), and reviewer grid - Data collection script fetches review comments from audit PRs via gh API - Heuristic quality scoring (technical depth, constructiveness, clarifications) - Weekly GitHub Actions cron (Monday 7am MST) refreshes data and triggers deploy - Deploy workflow now runs npm install and accepts workflow_dispatch
Show instructor with distinct rose-colored "Instructor" badge and separate card above the student podium. Student charts and rankings remain instructor-free. Total comments now 217 (was 114 without instructor).
Explains scoring rules (+3 technical, +2 constructive, etc.) and tier bands. Disclaimer with crh-bot avatar notes scores are auto-generated heuristics and do not reflect the instructors explicit views.
Old formula averaged per-comment, penalizing prolific reviewers. New formula sums weighted points across all comments, then normalizes with sqrt scaling against the class max. lorinachey (7 technical depth comments, 22 total) now scores 10.0 Exemplary instead of 3.1.
Student scores now normalized against instructor's total contribution using sqrt scaling. Tier thresholds adjusted: Exemplary 5+ (50% of instructor), Strong 3.5+, Solid 2+, Developing <2. lorinachey is the sole Exemplary student at 5.5/10.
github-actions[bot] cannot push directly to protected main branch. Instead, create a short-lived branch and auto-merge via PR with --admin.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
The assignment source files live under src/assignments/, not content/course/assignments/. https://claude.ai/code/session_01CQ4UYMQR6egdQLC62KdSYn
Student assignments are submitted under content/course/submissions/, not src/assignments/ which holds starter code and tests. https://claude.ai/code/session_01CQ4UYMQR6egdQLC62KdSYn
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Add resource-ledger.pdf for each capstone student (Hhy903, Soorej30, Tr0612, aritrach, jt7347, krusnim, lorinachey, yuni-wyx) and update their Report.mdx files with a resource_ledger frontmatter field and a download link section. Co-authored-by: Chris Heckman <chhe5305@colorado.edu>
There was a problem hiding this comment.
Pull request overview
This PR significantly expands the course infrastructure by adding a structured pytest-based grading/public test suite for Scratch-1, updating Scratch-1 synthetic action generation for improved learnability, and introducing new site features (textbook index styling, audit review-mode UI, KaTeX handling, and a “Top Reviewers” dashboard) along with CI/deploy workflow changes.
Changes:
- Add public + internal pytest suites (markers, fixtures, documentation) for Scratch-1 grading and student self-checks.
- Update Scratch-1 dataset generation to use a structured 256-bin action encoding (direction octant + magnitude bin).
- Add site/UI features for audits/textbook and a reviewer leaderboard backed by generated GitHub stats.
Reviewed changes
Copilot reviewed 56 out of 70 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/README.md | New documentation for public/internal test suites and usage. |
| tests/public/test_scratch1_basic.py | Adds public-facing Scratch-1 smoke tests for data gen and provided modules. |
| tests/public/init.py | Declares public tests package. |
| tests/internal/test_scratch1_rigor.py | Adds internal rigorous grading tests (imports, attention, RoPE, training). |
| tests/internal/init.py | Declares internal tests package. |
| tests/conftest.py | Shared pytest fixtures + attempted solution injection/reset logic. |
| tests/init.py | Declares overall tests package and structure. |
| pytest.ini | Adds pytest discovery + marker definitions. |
| src/assignments/scratch-1/generate_data.py | Updates action encoding + trajectory generation logic for learnability. |
| src/assignments/scratch-1/backbone.py | Minor typing import formatting change (trailing whitespace). |
| scripts/README.md | Documents CI/CD scripts and expectations. |
| scripts/collect_reviewer_data.py | New script to collect GitHub PR comment stats and generate reviewer leaderboard JSON. |
| scripts/audit_linter.py | Adds stronger MDX frontmatter validation (required fields + non-placeholder checks). |
| data/reviewer-stats.json | Adds generated reviewer statistics used by the new dashboard. |
| package.json | Adds recharts dependency for charts. |
| app/course/top-reviewers/ReviewerCharts.tsx | Client-side Recharts visualizations for reviewer stats. |
| app/course/top-reviewers/page.tsx | Server page reading reviewer-stats.json and rendering leaderboard. |
| app/course/page.tsx | Adds link/section entry to the “Top Reviewers” page. |
| app/course/assignments/capstone/[handle]/page.tsx | Adds dynamic capstone report page rendering MDX per student handle. |
| content/course/submissions/human-in-the-loop/.gitkeep | Adds placeholder directory for human-in-the-loop submissions. |
| content/course/assignments/scratch-1.mdx | Updates Scratch-1 assignment expectations (loss convergence guidance, removes draft banner). |
| content/course/assignments/capstone/zlaouar/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/yuni-wyx/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/yi-shiuan-tung/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/Tr0612/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/Soorej30/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/lorinachey/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/krusnim/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/kalhamilton/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/jt7347/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/jdvakil/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/himanshugupta1009/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/Hhy903/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/gyanigk/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/carson-jay/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/aritrach/Report.mdx | Adds capstone running log page. |
| content/course/assignments/capstone/antony-zhao/Report.mdx | Adds capstone running log page. |
| app/textbook/page.tsx | Adds a styled textbook index/landing page. |
| components/textbook/Sidebar.tsx | Updates textbook sidebar styling and adds “Living Textbook” link. |
| components/textbook/TextbookLayout.tsx | Updates textbook layout right-side TOC styling. |
| components/audit/AuditLayout.tsx | Adds audit review-mode banner and restyles audit layout. |
| app/textbook/audits/[...slug]/page.tsx | Adds review-mode wiring, KaTeX handling, and restyles audit header/banner. |
| components/KatexStyles.tsx | Adds client-side KaTeX CSS injection via CDN. |
| app/layout.tsx | Reorders KaTeX CSS import to precede globals.css. |
| app/globals.css | Adds KaTeX and prose styling improvements (code blocks, headings, tables, etc.). |
| tailwind.config.ts | Extends typography CSS overrides (links, headings, code, KaTeX display, etc.). |
| README.md | Removes a large “Repository Structure / Workflow” section. |
| pyproject.toml | Adds Python project metadata, dependencies, and pytest config (duplicated with pytest.ini). |
| .gitignore | Replaces “private repo files” ignores with narrower ignores and claude output patterns. |
| .continueignore | Adds ignore patterns for Continue/agent tooling. |
| .github/workflows/vla-audit.yml | Updates audit CI guidance text; removes shadow test dispatch job. |
| .github/workflows/shadow-tester.yml | Adds repository_dispatch-driven workflow to run internal tests from a private repo. |
| .github/workflows/refresh-reviewer-data.yml | Adds scheduled workflow to refresh reviewer stats, open PR, auto-merge, and trigger deploy. |
| .github/workflows/deploy.yml | Adjusts deploy trigger rules and remote deployment details (host/path/npm install). |
| .github/workflows/deploy-staging.yml | Adjusts staging deploy host and paths-ignore rules. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| [rehypeKatex, { | ||
| strict: false, // Don't fail on unknown LaTeX commands | ||
| trust: true, // Allow some advanced LaTeX features | ||
| throwOnError: false, // Gracefully handle errors | ||
| }] |
There was a problem hiding this comment.
rehype-katex is configured with trust: true, which enables KaTeX trusted commands (e.g., \href / HTML-like macros). Since audit content is effectively user-authored (students), this increases XSS/phishing surface. Consider keeping trust disabled, or providing a restrictive trust callback that only allows a minimal safe subset of commands.
| seq_len = x.shape[1] | ||
| causal_mask = torch.tril(torch.ones(seq_len, seq_len, device=x.device)) | ||
|
|
There was a problem hiding this comment.
The causal mask passed to student attention implementations is always a float 0/1 matrix. Some correct implementations expect a boolean mask (and will do ~mask), which would raise a TypeError and make this test harness unnecessarily brittle. Consider trying both dtypes (e.g., bool then float) or normalizing to the starter-code convention before calling forward().
| import math | ||
| from typing import Optional, Tuple | ||
|
|
||
| from typing import Optional, Tuple |
There was a problem hiding this comment.
There are trailing spaces on this import line. Please remove them to avoid formatting/lint noise and accidental diffs.
| ## Resources | ||
|
|
||
| ### Documentation |
There was a problem hiding this comment.
The PR title suggests a narrow change (“Human in the loop assignment submission”), but this diff includes large additions/changes across testing infrastructure, workflows, deployment, UI styling, and new course pages. Please update the PR title/description to reflect the actual scope, or split into smaller PRs to make review and rollback safer.
| 'use client'; | ||
|
|
||
| import { useEffect } from 'react'; | ||
|
|
||
| export function KatexStyles() { | ||
| useEffect(() => { | ||
| // Ensure KaTeX CSS is loaded | ||
| const link = document.createElement('link'); | ||
| link.rel = 'stylesheet'; | ||
| link.href = 'https://cdn.jsdelivr.net/npm/katex@0.16.27/dist/katex.min.css'; | ||
| link.integrity = 'sha384-yp+jpRNKIa0xGrYaVtwImDXkFq7ZOCV5kJZVDg/uAFfYPmtFcKr0sxhVJy1HqnWD'; | ||
| link.crossOrigin = 'anonymous'; | ||
|
|
||
| // Check if already loaded | ||
| const existing = document.querySelector('link[href*="katex"]'); | ||
| if (!existing) { | ||
| document.head.appendChild(link); | ||
| } | ||
| }, []); | ||
|
|
There was a problem hiding this comment.
KaTeX CSS is already imported globally in app/layout.tsx. Loading it again from a CDN here adds an external runtime dependency and risks duplicate styles / version skew (the existing selector won’t match Next’s bundled CSS href). Prefer removing this component and relying on the bundled KaTeX CSS import.
| 'use client'; | |
| import { useEffect } from 'react'; | |
| export function KatexStyles() { | |
| useEffect(() => { | |
| // Ensure KaTeX CSS is loaded | |
| const link = document.createElement('link'); | |
| link.rel = 'stylesheet'; | |
| link.href = 'https://cdn.jsdelivr.net/npm/katex@0.16.27/dist/katex.min.css'; | |
| link.integrity = 'sha384-yp+jpRNKIa0xGrYaVtwImDXkFq7ZOCV5kJZVDg/uAFfYPmtFcKr0sxhVJy1HqnWD'; | |
| link.crossOrigin = 'anonymous'; | |
| // Check if already loaded | |
| const existing = document.querySelector('link[href*="katex"]'); | |
| if (!existing) { | |
| document.head.appendChild(link); | |
| } | |
| }, []); | |
| export function KatexStyles() { | |
| // KaTeX CSS is imported globally in app/layout.tsx; no runtime work is needed here. |
| dev_utils_path = Path(__file__).parent.parent / "scripts" / "dev_utils.py" | ||
| if not dev_utils_path.exists(): | ||
| # Skip injection in student/grading environment | ||
| print("\n=== Skipping solution injection (dev_utils.py not found) ===") | ||
| yield |
There was a problem hiding this comment.
This fixture tries to inject/reset solutions via scripts/dev_utils.py, but that file does not exist in the repository, so solution injection will always be skipped. Either add scripts/dev_utils.py (or switch to an existing script) or adjust the injection logic/docs to reflect the real workflow.
| [tool.pytest.ini_options] | ||
| markers = [ | ||
| "internal: internal grading tests (never public)", | ||
| "rigor: rigorous grading tests", | ||
| "gradient: gradient flow tests", |
There was a problem hiding this comment.
Pytest configuration is duplicated between this [tool.pytest.ini_options] block and the new pytest.ini. Pytest will only use one config source, so the duplicate can drift and confuse contributors. Prefer consolidating into a single source of truth (either keep pytest.ini or keep the pyproject section).
| # Expected: (dim * hidden_dim) + hidden_dim + (hidden_dim * dim) + dim | ||
| expected = (dim * hidden_dim) + hidden_dim + (hidden_dim * dim) + dim | ||
|
|
||
| assert num_params == expected, f"Expected {expected} parameters, got {num_params}" |
There was a problem hiding this comment.
This parameter-count expectation assumes both Linear layers have biases (it adds hidden_dim + dim), but the provided FeedForward implementation in src/assignments/scratch-1/backbone.py constructs both nn.Linear(..., bias=False). As written, this test will fail on the starter code. Update the expected count to match the actual layer definitions (or compute it dynamically based on whether biases are enabled).
| ### Production Scripts | ||
|
|
||
| - **`manage_solutions.py`** - Inject/reset assignment solutions (used in testing) | ||
| - **`sanitize.sh`** - Main sanitization pipeline for public sync | ||
| - **`_sanitize_todos.py`** - Remove solution hints from code | ||
| - **`audit_linter.py`** - Validate paper audit MDX files |
There was a problem hiding this comment.
This README references scripts/manage_solutions.py, but that script isn’t present in the repository (scripts/ currently contains audit_linter.py, collect_reviewer_data.py, deploy.sh, grade_scratch1.py, review-prs.sh). Either add manage_solutions.py or update this documentation and any callers (e.g., tests/conftest.py) to point to the actual solution-management script.
| @@ -0,0 +1,139 @@ | |||
| name: Shadow Tester | |||
|
|
|||
| on: | |||
There was a problem hiding this comment.
This workflow only runs on repository_dispatch (type run-shadow-tests), but this repo no longer contains any workflow/job that emits that dispatch event (the prior trigger in vla-audit.yml was removed). If shadow testing is meant to run automatically, consider re-adding a safe dispatch trigger or adding workflow_dispatch / pull_request triggers; otherwise this file will never run in normal CI.
| on: | |
| on: | |
| workflow_dispatch: |
No description provided.