Human in the loop assignment submission: jdvakil by Jdvakil · Pull Request #82 · arpg/vla-foundations

Jdvakil · 2026-03-23T15:12:08Z

No description provided.

* Enhance audit pages with textbook-style layout - Add left sidebar with chapter navigation - Add right sidebar for table of contents - Use AuditLayout component to match textbook styling - Improves visual consistency across the site * Add FAQ section to Paper Audit assignment Integrate comprehensive FAQ covering: - Scope and technical focus (primary paper, interface auditing) - Format and objectives (argue vs summarize, Amazon Principle) - Presentation logistics (audit vs presentation, time allocation) Students now have clear guidance on what the audit should accomplish and how it differs from a traditional paper summary. * Fix staging deployment SSH timeout issue - Add stdin redirect (< /dev/null) to properly detach nohup - Add sleep before starting new server to ensure old process is killed - Remove PID echo that could cause hanging This should prevent the SSH action from being terminated with SIGTERM. * Fix PR preview paths for different content types - Detect whether PR contains audit or contributor profile - Generate correct preview URL based on content type - Show appropriate checklist (audit vs contributor) - Extract actual filename for accurate path Fixes incorrect preview URLs for contributor profile PRs like #25. * Fix content type detection in PR preview workflow - Fetch base branch explicitly for git diff - Store FILES output to avoid repeated git diff calls - Use fetch-depth: 0 to ensure full git history available This should properly detect contributor vs audit files. * Add contributor profile: Gyanig Kumar (#25) Co-authored-by: Christoffer Heckman <christoffer.heckman@colorado.edu> * Fix SSH timeout with setsid and explicit exit - Use setsid instead of nohup for better process daemonization - Add explicit exit 0 after starting background process - Move success message before starting background job - Increase sleep to 2s to ensure pkill completes This should prevent the SSH session from hanging and getting terminated. * Add debug output to staging deployment script * Split deployment into two SSH sessions Separate static site deployment from API server startup to avoid SSH timeout. Each SSH session is now shorter and focused on a single task. * Remove unused API server from staging deployment The API server was for a comments/review system that is not currently implemented. Removing it fixes the SSH timeout issue. * Remove unused API server infrastructure The comments/review system was never fully implemented. The frontend components (CommentSidebar, API routes) don't exist and the system is incompatible with static export mode. * Support audits in both staging and production directories Some students may mistakenly place audits in content/textbook/audits/ instead of content/textbook/audits/staging/. Detect both locations. * Update Scratch-1 due date to February 1 * Add contributor profile: Thanushraam Suresh Kumar (#28) --------- Co-authored-by: Gyanig Kumar <gyanig.kumar@gmail.com> Co-authored-by: Thanushraam <45840572+Tr0612@users.noreply.github.com>

- Complete testing infrastructure (public & internal tests) - Solution management scripts (inject/reset) - Sanitization pipeline for public sync - GitHub Actions workflow for automated sync - Complete documentation (setup guides, references) - Example solution for scratch-1 assignment See PRIVATE_REPO_SETUP.md for complete architecture details.

BREAKING CHANGES: - Reorganized scripts into CI-critical (scripts/) vs dev helpers (scripts/dev/) - Enhanced sanitization pipeline with fail-safe validation - Added frontmatter validation for audit MDX files - Implemented Review Mode banner for PR previews Phase 1: Script Consolidation - Created scripts/dev/ for local development helpers - Moved 6 helper scripts to scripts/dev/ - Added README.md to both scripts/ and scripts/dev/ - Clear separation: CI-critical scripts stay in scripts/ Phase 2: Sanitization Pipeline Hardening - scripts/_sanitize_todos.py now fail-safe with exit codes - Added 3-step validation (pre, sanitize, post) - Enhanced error messages with line numbers - sync-to-public.yml now includes pre/post validation - Zero-tolerance for [SOLUTION] leaks Phase 3: Unified Linting for Audits - audit_linter.py validates required frontmatter fields - Checks: title, author, topic, paper (no empty/placeholder values) - Updated vla-audit.yml error messages Phase 4: Next.js Routing Cleanup - Dynamic staging prefix handling with STAGING_PR_NUMBER - Added Review Mode banner to AuditLayout.tsx - Shows PR number in preview banner - Better visual distinction for review vs production Phase 5: Repository Cleanup - Removed vercel.json (GitHub Actions/Pages deployment) - Added SOURCE OF TRUTH section to README.md - Clear repository ownership map - sanitize.sh removes scripts/dev/ Security Improvements: - Fail-safe sanitization (cannot sync if markers remain) - Pre-flight validation before sanitization - Post-flight verification after sanitization - Proper exit codes for all scripts - Repository clarity prevents accidental public pushes Documentation: - Scripts categorized and documented - README.md warns against pushing to public - Audit requirements clearly specified - REFACTOR_SUMMARY.md with complete changelog See REFACTOR_SUMMARY.md for detailed breakdown.

Technical Hardening - Tasks 1, 2, 3, & 4: Documentation (Task 1): - Create PRIVATE_OPERATIONS.md merging all instructor docs - Paper Audit Review Workflow - Server Configuration (Apache, systemd, API) - Deployment Procedures - Troubleshooting Guide - Update README.md with instructor-facing PST guide - Document Shadow CI, solution management, assignment lifecycle Hardened Sanitization (Task 2): - Enhance sanitize.sh: - Add draft block removal for MDX files - Add README.md overwrite step during sync - Renumber steps for clarity - Enhance _sanitize_todos.py: - Add multi-line [SOLUTION] comment detection - Add triple-quoted docstring [SOLUTION] detection - Improve regex patterns with proper ordering - Enhanced fail-safe verification Shadow CI (Task 3): - Create shadow-tester.yml workflow - Triggered by repository_dispatch from public repo - Fetches student code from public PR - Runs internal rigorous tests - Comments Pass/Fail status on public PR Pre-Commit Guard (Task 4): - Add executable pre-commit hook - Scans staged files for [SOLUTION] markers - Blocks commits if leaks detected in public-facing files - Provides clear fix instructions Documentation: - Create HARDENING_COMPLETE.md with full implementation report - Create BEFORE_AFTER.md documenting changes

Task 1: Prune Documentation - Consolidate PRIVATE_OPERATIONS.md, REVIEW_SYSTEM.md, PRIVATE_REPO_SETUP.md into comprehensive INSTRUCTOR.md - Delete 11 obsolete documentation files: - APACHE_CONFIG.md - BEFORE_AFTER.md - DEPLOYMENT_SUCCESS.md - HARDENING_COMPLETE.md - PRIVATE_OPERATIONS.md (merged) - PRIVATE_REPO_SETUP.md (merged) - QUICK_REFERENCE.md - REFACTOR_SUMMARY.md - REVIEW_SYSTEM.md (merged) - SETUP_COMPLETE.md - SYSTEM_COMPLETE.md Task 2: Refactor Solution Management - Rename scripts/manage_solutions.py → scripts/dev_utils.py - Add --verify-clean command: - Scans src/assignments/ for solution code leaks - Compares files against private/solutions/ using difflib - Exits with error if similarity > 80% (prevents accidental commits) - Normalizes code (removes comments/whitespace) for accurate comparison Task 3: Harden Sync Workflow - Update .github/workflows/sync-to-public.yml to use Orphan Push strategy: - git checkout --orphan temp-public-branch - git add -A && git commit - git push public temp-public-branch:main --force - Breaks ALL git history links between private and public repos - Public repo has completely independent history - Update leak detection to check for dev_utils.py instead of manage_solutions.py - Update sanitize.sh to delete dev_utils.py Task 4: Cleanup Public README - Public README already student-centric from previous hardening - No changes needed Benefits: - Single comprehensive instructor guide (INSTRUCTOR.md) - Enhanced solution leak prevention (--verify-clean) - Maximum security via orphan push (no history exposure) - Cleaner repository structure

Create Claude Code Project Skill for solution leak prevention: Task 1: Initialize Skill Structure - Create .claude/skills/vla-guard/ directory - Create .claude/skills/vla-guard/SKILL.md Task 2: Define SKILL.md Logic - Add frontmatter: - name: vla-guard - description: Final audit to prevent solution/internal test leaks - user-invocable: true - Implement 5-step audit process: 1. Identify all solutions (python3 scripts/dev_utils.py --list) 2. Scan for solution content leaks ([SOLUTION] markers) 3. Verify private/ and tests/internal/ not staged 4. Check git log for accidental solution commits 5. Check for sensitive file leaks Task 3: Create Custom Slash Command - Create .claude/commands/pre-flight.md - Invokes /vla-guard skill first - Runs scripts/sanitize.sh only if guard passes - Provides comprehensive pre-flight check before push/PR Features: - Color-coded audit reports (✅/❌/⚠️) - Integration with dev_utils.py --verify-clean - Fail-safe: blocks sanitization if audit fails - Clear remediation instructions on failure Usage: /vla-guard - Run security audit only /pre-flight - Run audit + sanitization pipeline Also added: - REFACTOR_COMPLETE.md - Documentation of consolidation work

Create 6 new Claude Code skills for VLA Foundations workflow automation: 1. /test-rigor - Internal grading test runner - Auto-injects solutions before testing - Runs pytest with rigor markers - Generates test reports - Auto-resets to starter code 2. /generate-fixtures - Gold standard fixture generator - Creates reference data for fidelity tests - Uses fixed random seeds (seed=42) - Generates model outputs and checkpoints - Verifies no NaNs in fixtures 3. /grade - Automated student PR grading - Fetches student code from GitHub - Runs public and internal tests - Generates detailed feedback reports - Posts comments on PRs - Updates PR labels 4. /release - Safe assignment publishing workflow - Runs VLA Guard audit (fail-fast) - Executes sanitization pipeline - Creates release tags - Monitors GitHub Actions - Verifies public repo - Checks deployment status 5. /new-assignment - Assignment scaffolding generator - Creates complete directory structure - Generates starter code with TODOs - Generates solution templates - Generates test templates - Creates MDX assignment spec 6. /sync-check - Post-release verification - Clones public repo (read-only) - Scans for solution leaks - Verifies orphan push strategy - Checks deployment status - Generates verification reports Additional changes: - Add command shortcuts in .claude/commands/ - Create directories for reports and releases - Update .gitignore to ignore generated reports - Add comprehensive README.md for skills Benefits: - Automates instructor workflows - Fail-safe protection against leaks - Comprehensive audit trails - Reduces manual errors - Speeds up grading and releases

Create comprehensive guide for AI SWE agents working with student code: - Student workflow (branch, implement, test, submit) - Public testing philosophy - Assignment structure and TODOs - Git hygiene (rebase-only workflow) - Semantic line breaks in MDX - Common issues and solutions - Engineering standards and grading rubric Key sections: - Complete assignment workflow example - Commands useful for students - Testing with public tests - PR submission process - Resources and documentation links Student-focused: - No references to private repo or solutions - Only public tests documented - Clear submission guidelines - Common troubleshooting tips - Resources for help Fixes from template: - Removed manage_solutions.py references (private only) - Removed audit_linter.py references (doesn't exist) - Fixed Google search link placeholders - Added actual file paths - Clarified student permissions (can't merge own PRs)

Create detailed guide for AI SWE agents working in private repo: - Dual-repository architecture explanation - Complete Claude Code skills documentation - Solution management workflow (dev_utils.py) - Testing philosophy (public vs internal) - Sanitization pipeline details - Security boundaries and leak prevention - Typical workflows for instructors - Shadow CI explanation - Orphan push strategy Key sections: - 7 Claude Code skills with usage examples - Commands useful in development - Pre-release checklist - Git hygiene and rebase-only workflow - File map with actual paths (not Google search links) Fixes: - Corrected manage_solutions.py → dev_utils.py - Added missing Claude Code skills section - Included Shadow CI documentation - Added security boundaries section - Removed Google search link placeholders

- Replace backbone_solution.py with corrected version matching student template - Uses combined qkv_proj (not separate q/k/v projections) - Uses F.silu() (SwiGLU) activation - Implements all 4 TODOs correctly - Fixes tensor contiguity issue in loss computation - Add gold standard test fixtures - private/fixtures/scratch1_gold_output.pt - private/fixtures/scratch1_attention_fixture.pt - private/fixtures/scratch1_rmsnorm_fixture.pt - Generator script for reproducible fixtures - Update test infrastructure - Add load_gold_standard and sample_batch fixtures - Mark DINOv2 tests as mastery (skipped for core assignment) - Fix subprocess calls to use sys.executable - Add mastery marker to pytest.ini - Add uv package management - pyproject.toml with torch, numpy, pytest dependencies - uv.lock for reproducible environments - Update claude.md with uv usage instructions Test results: 7 passed, 2 skipped (mastery tests) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The orphan push strategy was destroying all student work on the public repo. Now uses merge strategy that: - Fetches current public repo state - Merges sanitized content into main - Preserves all student branches and work

Files that document the sanitization process (README, INSTRUCTOR.md, etc.) mention [SOLUTION] markers as part of explaining how the system works. These are not actual solution code and should be excluded from the check. Also excludes: - .claude/ directory (skill definitions) - .github/ directory (workflow definitions)

YAML syntax error on line 224 caused by unquoted multi-line string. Using heredoc pattern for proper bash multi-line string handling.

Previous issue: Averaging 7 joint deltas into single action caused massive information loss. Model achieved only ~4.6 loss (barely better than random ~5.5). New action encoding: - Direction: 8 octants (±X, ±Y, ±Z) → 3 bits - Magnitude: Distance to target, 32 bins → 5 bits - Total: 8 * 32 = 256 discrete actions This is LEARNABLE because: - Model sees state (joint angles + end-effector position) - Can compute error vector toward target - Can predict corresponding action deterministically Additional improvements: - Reduced noise (0.05 → 0.02) for clearer patterns - Stronger gradient component toward target (0.01 → 0.05) - More deterministic motion generation Expected result: Loss should converge significantly below previous 4.6.

## Changes ### PyTorch CUDA 11.8 Support - Updated pyproject.toml to use CUDA 11.8 index - Constrained Python to 3.10-3.13 (CUDA wheels limitation) - Now supports P6000 GPUs (compute capability 6.1) ### Training Improvements - Training completes in 41s on GPU (vs 7+ hours on CPU) - Updated training_run.py to use GPU when available - Removed overly restrictive compute capability check ### Data Generation Fix (Previously Committed) - Structured action encoding: direction (8 octants) + magnitude (32 bins) - Actions now learnable from state (error vector) - Loss improved from ~4.6 to ~1.97 ## Results **Training Performance:** - Loss: 3.27 → 1.96 (consistent convergence) - Time: 41 seconds on P6000 - All 7 internal rigor tests passing **vs Previous (averaged actions):** - Loss: ~4.6 (barely better than random) - Model couldn't learn meaningful patterns ## Test Results ``` 7 passed, 2 deselected (mastery features) - test_training_convergence ✓ - test_attention_gradient_flow ✓ - test_causal_mask_prevents_future_leakage ✓ - test_rmsnorm_numerics ✓ - test_model_output_distribution ✓ - test_loss_computation_correctness ✓ - test_overfitting_on_single_batch ✓ ``` Solution ready for grading student submissions.

**Problem**: Assignment required loss < 1.0, but actual correct implementation achieves ~1.9-2.0. **Changes**: - Updated Pass Level to require "clear convergence" not arbitrary threshold - Added expected loss ranges: Initial ~3-4, Final ~1.9-2.2 - Added FAQ explaining what loss to expect - Emphasize learning trajectory over absolute value **Rationale**: - Action encoding (direction + magnitude) is learnable but not trivial - Random guessing: ~5.5 (log(256)) - Structured learning: ~1.9-2.0 (significant improvement) - Internal tests only verify loss decreases, not absolute threshold This aligns assignment expectations with reality.

Changed 'EOF' to EOF (without quotes) to allow ${TAG_NAME} and ${RELEASE_DATE} to be interpolated in the commit message.

Heredoc inside YAML run block caused parsing errors. Using simple multiline string assignment instead.

Previous approach with multiline string caused YAML parsing errors. Using git commit's multiple -m flag feature instead - each -m creates a new paragraph in the commit message.

Changes: - Verify PUBLIC_REPO_TOKEN is set before proceeding - Use git credential helper instead of embedding token in URL - More secure credential handling - Better error messages if token is missing

The credential helper approach wasn't working in GitHub Actions. Reverting to the simpler, proven approach of embedding the token directly in the remote URL.

Updated to use PUBLIC_REPO_TOKEN_2 as the original token was deleted.

Added: - Show token prefix to verify it's set - Test remote access with ls-remote before pushing - Better error messages for auth failures

The actions/checkout was configuring git with the default GITHUB_TOKEN, which was overriding our PUBLIC_REPO_TOKEN_2 during push operations. Changes: - Set persist-credentials: false on checkout action - Explicitly unset credential helpers before using our token - Set credential.useHttpPath to prevent token reuse across repos This ensures only PUBLIC_REPO_TOKEN_2 is used for pushing to public repo.

…c sync Added removal of: - INSTRUCTOR.md, INSTRUCTOR_GUIDE.md, API_SETUP.md - SETUP_WITH_GH_CLI.md, QUICK_START_SSH.md - .claude/ directory (instructor workflow automation) - .github/workflows/sync-to-public.yml (the sync workflow itself) Added validation checks to ensure these files are removed before push.

Sanitized content from private repository. This release includes updated assignment materials. Changes: - Updated assignment templates - Fixed bugs and improvements - Documentation updates

…cal branches

- New /course/top-reviewers page with podium, charts (recharts), and reviewer grid - Data collection script fetches review comments from audit PRs via gh API - Heuristic quality scoring (technical depth, constructiveness, clarifications) - Weekly GitHub Actions cron (Monday 7am MST) refreshes data and triggers deploy - Deploy workflow now runs npm install and accepts workflow_dispatch

Show instructor with distinct rose-colored "Instructor" badge and separate card above the student podium. Student charts and rankings remain instructor-free. Total comments now 217 (was 114 without instructor).

Explains scoring rules (+3 technical, +2 constructive, etc.) and tier bands. Disclaimer with crh-bot avatar notes scores are auto-generated heuristics and do not reflect the instructors explicit views.

Old formula averaged per-comment, penalizing prolific reviewers. New formula sums weighted points across all comments, then normalizes with sqrt scaling against the class max. lorinachey (7 technical depth comments, 22 total) now scores 10.0 Exemplary instead of 3.1.

Student scores now normalized against instructor's total contribution using sqrt scaling. Tier thresholds adjusted: Exemplary 5+ (50% of instructor), Strong 3.5+, Solid 2+, Developing <2. lorinachey is the sole Exemplary student at 5.5/10.

…A-D)

github-actions[bot] cannot push directly to protected main branch. Instead, create a short-lived branch and auto-merge via PR with --admin.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

https://claude.ai/code/session_01CQ4UYMQR6egdQLC62KdSYn

The assignment source files live under src/assignments/, not content/course/assignments/. https://claude.ai/code/session_01CQ4UYMQR6egdQLC62KdSYn

Student assignments are submitted under content/course/submissions/, not src/assignments/ which holds starter code and tests. https://claude.ai/code/session_01CQ4UYMQR6egdQLC62KdSYn

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Add resource-ledger.pdf for each capstone student (Hhy903, Soorej30, Tr0612, aritrach, jt7347, krusnim, lorinachey, yuni-wyx) and update their Report.mdx files with a resource_ledger frontmatter field and a download link section. Co-authored-by: Chris Heckman <chhe5305@colorado.edu>

Copilot

Pull request overview

This PR significantly expands the course infrastructure by adding a structured pytest-based grading/public test suite for Scratch-1, updating Scratch-1 synthetic action generation for improved learnability, and introducing new site features (textbook index styling, audit review-mode UI, KaTeX handling, and a “Top Reviewers” dashboard) along with CI/deploy workflow changes.

Changes:

Add public + internal pytest suites (markers, fixtures, documentation) for Scratch-1 grading and student self-checks.
Update Scratch-1 dataset generation to use a structured 256-bin action encoding (direction octant + magnitude bin).
Add site/UI features for audits/textbook and a reviewer leaderboard backed by generated GitHub stats.

Reviewed changes

Copilot reviewed 56 out of 70 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
tests/README.md	New documentation for public/internal test suites and usage.
tests/public/test_scratch1_basic.py	Adds public-facing Scratch-1 smoke tests for data gen and provided modules.
tests/public/init.py	Declares public tests package.
tests/internal/test_scratch1_rigor.py	Adds internal rigorous grading tests (imports, attention, RoPE, training).
tests/internal/init.py	Declares internal tests package.
tests/conftest.py	Shared pytest fixtures + attempted solution injection/reset logic.
tests/init.py	Declares overall tests package and structure.
pytest.ini	Adds pytest discovery + marker definitions.
src/assignments/scratch-1/generate_data.py	Updates action encoding + trajectory generation logic for learnability.
src/assignments/scratch-1/backbone.py	Minor typing import formatting change (trailing whitespace).
scripts/README.md	Documents CI/CD scripts and expectations.
scripts/collect_reviewer_data.py	New script to collect GitHub PR comment stats and generate reviewer leaderboard JSON.
scripts/audit_linter.py	Adds stronger MDX frontmatter validation (required fields + non-placeholder checks).
data/reviewer-stats.json	Adds generated reviewer statistics used by the new dashboard.
package.json	Adds `recharts` dependency for charts.
app/course/top-reviewers/ReviewerCharts.tsx	Client-side Recharts visualizations for reviewer stats.
app/course/top-reviewers/page.tsx	Server page reading reviewer-stats.json and rendering leaderboard.
app/course/page.tsx	Adds link/section entry to the “Top Reviewers” page.
app/course/assignments/capstone/[handle]/page.tsx	Adds dynamic capstone report page rendering MDX per student handle.
content/course/submissions/human-in-the-loop/.gitkeep	Adds placeholder directory for human-in-the-loop submissions.
content/course/assignments/scratch-1.mdx	Updates Scratch-1 assignment expectations (loss convergence guidance, removes draft banner).
content/course/assignments/capstone/zlaouar/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/yuni-wyx/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/yi-shiuan-tung/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/Tr0612/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/Soorej30/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/lorinachey/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/krusnim/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/kalhamilton/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/jt7347/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/jdvakil/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/himanshugupta1009/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/Hhy903/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/gyanigk/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/carson-jay/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/aritrach/Report.mdx	Adds capstone running log page.
content/course/assignments/capstone/antony-zhao/Report.mdx	Adds capstone running log page.
app/textbook/page.tsx	Adds a styled textbook index/landing page.
components/textbook/Sidebar.tsx	Updates textbook sidebar styling and adds “Living Textbook” link.
components/textbook/TextbookLayout.tsx	Updates textbook layout right-side TOC styling.
components/audit/AuditLayout.tsx	Adds audit review-mode banner and restyles audit layout.
app/textbook/audits/[...slug]/page.tsx	Adds review-mode wiring, KaTeX handling, and restyles audit header/banner.
components/KatexStyles.tsx	Adds client-side KaTeX CSS injection via CDN.
app/layout.tsx	Reorders KaTeX CSS import to precede globals.css.
app/globals.css	Adds KaTeX and prose styling improvements (code blocks, headings, tables, etc.).
tailwind.config.ts	Extends typography CSS overrides (links, headings, code, KaTeX display, etc.).
README.md	Removes a large “Repository Structure / Workflow” section.
pyproject.toml	Adds Python project metadata, dependencies, and pytest config (duplicated with pytest.ini).
.gitignore	Replaces “private repo files” ignores with narrower ignores and claude output patterns.
.continueignore	Adds ignore patterns for Continue/agent tooling.
.github/workflows/vla-audit.yml	Updates audit CI guidance text; removes shadow test dispatch job.
.github/workflows/shadow-tester.yml	Adds repository_dispatch-driven workflow to run internal tests from a private repo.
.github/workflows/refresh-reviewer-data.yml	Adds scheduled workflow to refresh reviewer stats, open PR, auto-merge, and trigger deploy.
.github/workflows/deploy.yml	Adjusts deploy trigger rules and remote deployment details (host/path/npm install).
.github/workflows/deploy-staging.yml	Adjusts staging deploy host and paths-ignore rules.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T15:20:55Z

app/textbook/audits/[...slug]/page.tsx

+              [rehypeKatex, {
+                strict: false, // Don't fail on unknown LaTeX commands
+                trust: true, // Allow some advanced LaTeX features
+                throwOnError: false, // Gracefully handle errors
+              }]


rehype-katex is configured with trust: true, which enables KaTeX trusted commands (e.g., \href / HTML-like macros). Since audit content is effectively user-authored (students), this increases XSS/phishing surface. Consider keeping trust disabled, or providing a restrictive trust callback that only allows a minimal safe subset of commands.

Copilot · 2026-03-23T15:20:55Z

tests/internal/test_scratch1_rigor.py

+    seq_len = x.shape[1]
+    causal_mask = torch.tril(torch.ones(seq_len, seq_len, device=x.device))
+


The causal mask passed to student attention implementations is always a float 0/1 matrix. Some correct implementations expect a boolean mask (and will do ~mask), which would raise a TypeError and make this test harness unnecessarily brittle. Consider trying both dtypes (e.g., bool then float) or normalizing to the starter-code convention before calling forward().

Copilot · 2026-03-23T15:20:56Z

src/assignments/scratch-1/backbone.py

 import math
-from typing import Optional, Tuple
-
+from typing import Optional, Tuple    


There are trailing spaces on this import line. Please remove them to avoid formatting/lint noise and accidental diffs.

Copilot · 2026-03-23T15:20:56Z

README.md

 ## Resources

 ### Documentation


The PR title suggests a narrow change (“Human in the loop assignment submission”), but this diff includes large additions/changes across testing infrastructure, workflows, deployment, UI styling, and new course pages. Please update the PR title/description to reflect the actual scope, or split into smaller PRs to make review and rollback safer.

Copilot · 2026-03-23T15:20:56Z

components/KatexStyles.tsx

+'use client';
+
+import { useEffect } from 'react';
+
+export function KatexStyles() {
+  useEffect(() => {
+    // Ensure KaTeX CSS is loaded
+    const link = document.createElement('link');
+    link.rel = 'stylesheet';
+    link.href = 'https://cdn.jsdelivr.net/npm/katex@0.16.27/dist/katex.min.css';
+    link.integrity = 'sha384-yp+jpRNKIa0xGrYaVtwImDXkFq7ZOCV5kJZVDg/uAFfYPmtFcKr0sxhVJy1HqnWD';
+    link.crossOrigin = 'anonymous';
+
+    // Check if already loaded
+    const existing = document.querySelector('link[href*="katex"]');
+    if (!existing) {
+      document.head.appendChild(link);
+    }
+  }, []);
+


KaTeX CSS is already imported globally in app/layout.tsx. Loading it again from a CDN here adds an external runtime dependency and risks duplicate styles / version skew (the existing selector won’t match Next’s bundled CSS href). Prefer removing this component and relying on the bundled KaTeX CSS import.

Suggested change

'use client';

import { useEffect } from 'react';

export function KatexStyles() {

useEffect(() => {

// Ensure KaTeX CSS is loaded

const link = document.createElement('link');

link.rel = 'stylesheet';

link.href = 'https://cdn.jsdelivr.net/npm/katex@0.16.27/dist/katex.min.css';

link.integrity = 'sha384-yp+jpRNKIa0xGrYaVtwImDXkFq7ZOCV5kJZVDg/uAFfYPmtFcKr0sxhVJy1HqnWD';

link.crossOrigin = 'anonymous';

// Check if already loaded

const existing = document.querySelector('link[href*="katex"]');

if (!existing) {

document.head.appendChild(link);

}

}, []);

export function KatexStyles() {

// KaTeX CSS is imported globally in app/layout.tsx; no runtime work is needed here.

Copilot · 2026-03-23T15:20:57Z

tests/conftest.py

+    dev_utils_path = Path(__file__).parent.parent / "scripts" / "dev_utils.py"
+    if not dev_utils_path.exists():
+        # Skip injection in student/grading environment
+        print("\n=== Skipping solution injection (dev_utils.py not found) ===")
+        yield


This fixture tries to inject/reset solutions via scripts/dev_utils.py, but that file does not exist in the repository, so solution injection will always be skipped. Either add scripts/dev_utils.py (or switch to an existing script) or adjust the injection logic/docs to reflect the real workflow.

Copilot · 2026-03-23T15:20:58Z

pyproject.toml

+[tool.pytest.ini_options]
+markers = [
+    "internal: internal grading tests (never public)",
+    "rigor: rigorous grading tests",
+    "gradient: gradient flow tests",


Pytest configuration is duplicated between this [tool.pytest.ini_options] block and the new pytest.ini. Pytest will only use one config source, so the duplicate can drift and confuse contributors. Prefer consolidating into a single source of truth (either keep pytest.ini or keep the pyproject section).

Copilot · 2026-03-23T15:20:58Z

tests/public/test_scratch1_basic.py

+        # Expected: (dim * hidden_dim) + hidden_dim + (hidden_dim * dim) + dim
+        expected = (dim * hidden_dim) + hidden_dim + (hidden_dim * dim) + dim
+
+        assert num_params == expected, f"Expected {expected} parameters, got {num_params}"


This parameter-count expectation assumes both Linear layers have biases (it adds hidden_dim + dim), but the provided FeedForward implementation in src/assignments/scratch-1/backbone.py constructs both nn.Linear(..., bias=False). As written, this test will fail on the starter code. Update the expected count to match the actual layer definitions (or compute it dynamically based on whether biases are enabled).

Copilot · 2026-03-23T15:20:58Z

scripts/README.md

+### Production Scripts
+
+- **`manage_solutions.py`** - Inject/reset assignment solutions (used in testing)
+- **`sanitize.sh`** - Main sanitization pipeline for public sync
+- **`_sanitize_todos.py`** - Remove solution hints from code
+- **`audit_linter.py`** - Validate paper audit MDX files


This README references scripts/manage_solutions.py, but that script isn’t present in the repository (scripts/ currently contains audit_linter.py, collect_reviewer_data.py, deploy.sh, grade_scratch1.py, review-prs.sh). Either add manage_solutions.py or update this documentation and any callers (e.g., tests/conftest.py) to point to the actual solution-management script.

Copilot · 2026-03-23T15:20:59Z

.github/workflows/shadow-tester.yml

@@ -0,0 +1,139 @@
+name: Shadow Tester
+
+on:


This workflow only runs on repository_dispatch (type run-shadow-tests), but this repo no longer contains any workflow/job that emits that dispatch event (the prior trigger in vla-audit.yml was removed). If shadow testing is meant to run automatically, consider re-adding a safe dispatch trigger or adding workflow_dispatch / pull_request triggers; otherwise this file will never run in normal CI.

Suggested change

on:

on:

workflow_dispatch:

crheckman and others added 30 commits January 23, 2026 15:08

Add setup completion guide

4ac4bd5

Merge staging into main: security hardening and Shadow CI

eb1227a

Remove draft watermark, add import

58e6e67

Remove draft watermark, add import

d9b2c8f

fix(sync): use safe merge strategy to preserve student branches

4d74523

The orphan push strategy was destroying all student work on the public repo. Now uses merge strategy that: - Fetches current public repo state - Merges sanitized content into main - Preserves all student branches and work

fix(workflow): use heredoc for multi-line commit message

8bb00b6

YAML syntax error on line 224 caused by unquoted multi-line string. Using heredoc pattern for proper bash multi-line string handling.

fix(workflow): allow variable interpolation in heredoc

0e76674

Changed 'EOF' to EOF (without quotes) to allow ${TAG_NAME} and ${RELEASE_DATE} to be interpolated in the commit message.

fix(workflow): replace heredoc with multiline string variable

8c21ad4

Heredoc inside YAML run block caused parsing errors. Using simple multiline string assignment instead.

fix(workflow): use multiple -m flags for multiline commit message

fffa5a8

Previous approach with multiline string caused YAML parsing errors. Using git commit's multiple -m flag feature instead - each -m creates a new paragraph in the commit message.

fix(workflow): add token validation and use git credential helper

7ffab82

Changes: - Verify PUBLIC_REPO_TOKEN is set before proceeding - Use git credential helper instead of embedding token in URL - More secure credential handling - Better error messages if token is missing

fix(workflow): revert to direct token authentication in remote URL

a2f7d94

The credential helper approach wasn't working in GitHub Actions. Reverting to the simpler, proven approach of embedding the token directly in the remote URL.

fix(workflow): use PUBLIC_REPO_TOKEN_2 secret

6d0742b

Updated to use PUBLIC_REPO_TOKEN_2 as the original token was deleted.

fix(workflow): add token debugging and remote access test

c581d2f

Added: - Show token prefix to verify it's set - Test remote access with ls-remote before pushing - Better error messages for auth failures

Release: release-scratch-1 (2026-01-28 17:22:36 UTC)

6aaecc4

Sanitized content from private repository. This release includes updated assignment materials. Changes: - Updated assignment templates - Fixed bugs and improvements - Documentation updates

crheckman and others added 25 commits February 16, 2026 19:57

fix: train for 100 steps, pass threshold is 20% loss drop

5475ec6

fix: train for 1000 steps in convergence test

cba1cd1

fix: auto-commit grading reports to student branches, handle stale lo…

b2960ac

…cal branches

fix: add pandas to dependencies for student code compatibility

7dc1004

fix: use --allow-empty for bot commits to handle re-grading

b2fd1ad

feat: include instructor (crheckman) on top reviewers page

29571e0

Show instructor with distinct rose-colored "Instructor" badge and separate card above the student podium. Student charts and rankings remain instructor-free. Total comments now 217 (was 114 without instructor).

feat: add quality score methodology and crh-bot disclaimer

407a629

Explains scoring rules (+3 technical, +2 constructive, etc.) and tier bands. Disclaimer with crh-bot avatar notes scores are auto-generated heuristics and do not reflect the instructors explicit views.

feat: add capstone per-project report scaffolds (16 projects, Groups …

142a683

…A-D)

feat: add Spring 2026 project roster to capstone.mdx

7c562ea

chore: update uv.lock resolution markers

45409e4

feat: prettify capstone project table with linked report pages

e8532c1

fix: escape bare less-than in kalhamilton Report.mdx (MDX parse error)

38fc74b

fix: use PR-based merge for weekly reviewer data refresh

802f8c9

github-actions[bot] cannot push directly to protected main branch. Instead, create a short-lived branch and auto-merge via PR with --admin.

fix: use BYPASS_TOKEN for auto-merge if available

ca2765d

chore: refresh reviewer data (2026-03-05) (#64)

203c88a

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

chore: refresh reviewer data (2026-03-09) (#65)

05b2a5a

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Add human-in-the-loop assignment directory

6ca0335

https://claude.ai/code/session_01CQ4UYMQR6egdQLC62KdSYn

Move human-in-the-loop directory to src/assignments

f61b46d

The assignment source files live under src/assignments/, not content/course/assignments/. https://claude.ai/code/session_01CQ4UYMQR6egdQLC62KdSYn

Move human-in-the-loop to content/course/submissions

0271fb0

Student assignments are submitted under content/course/submissions/, not src/assignments/ which holds starter code and tests. https://claude.ai/code/session_01CQ4UYMQR6egdQLC62KdSYn

Merge pull request #66 from arpg/claude/create-hitl-directory-w2s6n

67df312

chore: refresh reviewer data (2026-03-16) (#75)

dc41138

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 23, 2026 15:12

Copilot started reviewing on behalf of Jdvakil March 23, 2026 15:12 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Add HITL assignment

bda9759

Jdvakil changed the title ~~Human in the loop assignment submission~~ Human in the loop assignment submission: jdvakil Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Human in the loop assignment submission: jdvakil#82

Human in the loop assignment submission: jdvakil#82
Jdvakil wants to merge 96 commits intostagingfrom
jdvakil/hitl

Jdvakil commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

		seq_len = x.shape[1]
		causal_mask = torch.tril(torch.ones(seq_len, seq_len, device=x.device))

Conversation

Jdvakil commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants