Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
5929fbc
Staging (#29)
crheckman Jan 23, 2026
0474c6f
Initial commit: Private VLA Foundations with solution infrastructure
crheckman Jan 24, 2026
4ac4bd5
Add setup completion guide
crheckman Jan 24, 2026
d6a216b
refactor: Senior Staff infrastructure hardening
crheckman Jan 24, 2026
3013b99
feat: implement technical hardening suite
crheckman Jan 24, 2026
eb1227a
Merge staging into main: security hardening and Shadow CI
crheckman Jan 24, 2026
8b4ceef
refactor: consolidate docs and harden repo sync
crheckman Jan 25, 2026
9d3548e
feat: add VLA Guard skill and pre-flight command
crheckman Jan 25, 2026
f0a5a71
feat: add comprehensive Claude Code skills for course management
crheckman Jan 25, 2026
d37ee03
docs: add student-facing claude.md development guide
crheckman Jan 26, 2026
cc1ff71
docs: add comprehensive claude.md development guide
crheckman Jan 26, 2026
58e6e67
Remove draft watermark, add import
Jan 27, 2026
d9b2c8f
Remove draft watermark, add import
Jan 27, 2026
434fa82
feat(scratch-1): complete solution, fixtures, and uv integration
Jan 27, 2026
4d74523
fix(sync): use safe merge strategy to preserve student branches
Jan 27, 2026
fe8434b
fix(sanitize): exclude documentation files from SOLUTION marker check
Jan 27, 2026
8bb00b6
fix(workflow): use heredoc for multi-line commit message
Jan 28, 2026
abd6fd8
fix(data): create learnable action encoding for convergence
Jan 28, 2026
812230a
feat(gpu): enable CUDA 11.8 training and verify convergence
Jan 28, 2026
0b25eec
docs(scratch-1): update loss expectations to realistic values
Jan 28, 2026
0e76674
fix(workflow): allow variable interpolation in heredoc
Jan 28, 2026
8c21ad4
fix(workflow): replace heredoc with multiline string variable
Jan 28, 2026
fffa5a8
fix(workflow): use multiple -m flags for multiline commit message
Jan 28, 2026
7ffab82
fix(workflow): add token validation and use git credential helper
Jan 28, 2026
a2f7d94
fix(workflow): revert to direct token authentication in remote URL
Jan 28, 2026
6d0742b
fix(workflow): use PUBLIC_REPO_TOKEN_2 secret
Jan 28, 2026
c581d2f
fix(workflow): add token debugging and remote access test
Jan 28, 2026
9f93d3c
fix(workflow): prevent checkout action from persisting credentials
Jan 28, 2026
df8de1d
fix(sanitize): remove instructor docs and Claude workflows from publi…
Jan 28, 2026
6aaecc4
Release: release-scratch-1 (2026-01-28 17:22:36 UTC)
actions-user Jan 28, 2026
433b091
Merge branch 'release-release-scratch-1' into public-main
actions-user Jan 28, 2026
4bff6f0
fix(workflows): remove conflicting paths and paths-ignore
Jan 28, 2026
c3f833c
fix(workflow): handle rename/delete merge conflicts properly
Jan 28, 2026
2b913b6
Release: release-scratch-1 (2026-01-28 17:57:07 UTC)
actions-user Jan 28, 2026
7662ad1
Merge release release-scratch-1: resolved with release content
actions-user Jan 28, 2026
29e50cc
fix(shadow-tester): fix authentication for cross-repo testing
Jan 29, 2026
15ed8fb
Release: release-scratch-1 (2026-01-29 04:24:04 UTC)
actions-user Jan 29, 2026
9fc7f16
Merge release release-scratch-1: resolved with release content
actions-user Jan 29, 2026
651b8a4
Fix GitHub Actions: update SSH host to direct.ristoffer.ch
arpg-bot Feb 1, 2026
e1058a8
Fix GitHub Actions: update SSH host to direct.ristoffer.ch (#33)
crheckman Feb 1, 2026
218827a
Fix capstone.mdx: remove unclosed div tag
arpg-bot Feb 2, 2026
7d81d46
fix: resolve LaTeX double-rendering and improve audit page styling
crheckman Feb 3, 2026
59ab9c8
fix: proper KaTeX CSS loading for static exports with client component
crheckman Feb 3, 2026
588a0ed
Revise capstone project details and objectives
crheckman Feb 3, 2026
a94fe24
fix: resolve LaTeX double-rendering and improve audit page styling
crheckman Feb 3, 2026
8fcf131
fix: proper KaTeX CSS loading for static exports with client component
crheckman Feb 3, 2026
fae9903
initial commit
Zaaler Jan 21, 2026
bb8d15e
Implemented RMSNorm from Zhang, Sennrich
Zaaler Feb 1, 2026
9945813
ignore the generated data pkl files
Zaaler Feb 1, 2026
d72eb9f
cleaning up and making debug information togglable
Zaaler Feb 1, 2026
1312189
understanding the data better
Zaaler Feb 1, 2026
081ef47
implemented main loop and train_epoch function, added some comments t…
Zaaler Feb 4, 2026
0058a24
completed training framework with saved loss curve
Zaaler Feb 5, 2026
d875eda
added in the attention map component of the submisssion. Still need t…
Zaaler Feb 5, 2026
936492a
finalized discussion of attention maps and the casaul mask removal au…
Zaaler Feb 6, 2026
5b11d49
fix: add missing matplotlib dependency (-1 pt for programmatic fix)
crh-bot Feb 10, 2026
c5bccfa
[crh-bot] Automated grading report for Scratch-1
crh-bot Feb 17, 2026
68245a1
Merge origin/staging into scratch-1-Zaaler
crheckman Apr 16, 2026
33cf9a2
Merge origin/staging into scratch-1-Zaaler
crheckman Apr 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions content/course/submissions/scratch-1/Zaaler.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
title: "Scratch-1 Submission: Zaaler"
student: "Zack Allen"
date: "2026-01-21"
---

# Scratch-1: The Transformer Backbone

## Loss Curve

<img
src="./images/loss_curves.png"
alt="Train and Validation Loss Curves with Best Model Shown"
width="500"
/>

The model converged after 5358 iterations after 19 epochs with final loss of 1.9523. This was determined when the best model performed better than the last 5 models on the validation set. The best model is marked by the green star.

## Attention Visualization

The following trajectory is the one that corresponds with the produced attention maps.

<img
src="./images/trajectory_0_end_effector.png"
alt="Example Trajectory"
width="500"
/>

I thought it would be important to understand the trajectory and what makes it unique when looking at the attention heads and layers.

Here is the attention maps for all 8 heads at layer 0 of the model.

<img
src="./images/trajectory_0_all_heads_layer_0.png"
alt="All Attention Heads within Layer 0"
width="500"
/>

This figures show the different aspects that each attention head was keying in on. The average of all these attention heads is shown below.

<img
src="./images/trajectory_0_attention_layer_0.png"
alt="All Attention Heads within Layer 0"
width="500"
/>

This shows the summation of all the individual heads. Overall, this first layer focuses on information primarily in the 5 to 10 timestamps following the current state (joints, position) in the trajectory.

<img
src="./images/trajectory_0_attention_layer_1.png"
alt="All Attention Heads within Layer 1"
width="500"
/>

This shows the summation of all the individual heads on layer 2 of the model. Overall, this second layer shows that as we move through the state, we consider things more gradually through longer amounts of the previous information.

<img
src="./images/trajectory_0_attention_layer_2.png"
alt="All Attention Heads within Layer 2"
width="500"
/>

This shows the summation of all the individual heads on layer 3 of the model. Similarly, this third layer shows that as we move through the state, we consider things more gradually through longer amounts of the previous information with slightly more importance on the 5 most recent timestamps.

<img
src="./images/trajectory_0_attention_layer_3.png"
alt="All Attention Heads within Layer 3"
width="500"
/>

This shows the summation of all the individual heads on layer 4 of the model. The fourth layer centers its attention on the time stamps following the current timestamp. Trying to recover information from the time directly proceding the current state.

## The Audit: Removing the Causal Mask

When I removed the causal mask, the following happened:

The validation loss was able to reach much lower, all the way down to 0.0573.

<img
src="./images/causal_mask_removed_loss_curves.png"
alt="Train and Validation Loss Curves with Best Model Shown"
width="500"
/>

The attention maps show that the layers are finding correlations between the current state and future states that haven't occurred yet. The model is learning is gathering information from future events that haven't occurred yet.

<img
src="./images/mask_removed_trajectory_0_all_heads_layer_0.png"
alt="All Attention Heads within Layer 0"
width="500"
/>

### Why the Model "Cheats"

The model cheats because it can now see all the future states of the system. Therefore, it can drastically reduce its learning loss by learning the patterns in the trajectories. Essentially, it easily predicts the best next step because it knows where the trajectory was headed in the following timestamps. Therefore, it never learned the actual task, it just learned how to predict when it could copy from the future.

## Code Highlights

No special implementation highlights outside of some additional debugging stuff I did to visualize data better. Can add it by changing debug_info to true at top of backbone.py.

## Challenges and Solutions

Failed KV Caching Implementation and Inference Speed Comparison

Attempt is logged in git history and commits but removed from final PR.

This attempt is likely incorrect. I tried to explain the produced plots. I was able to learn about KV caching and see in principle how it would have been valuable to reduce inference time.

<img
src="./images/kv_cache_benchmark.png"
alt="Comparison of Nominal versus KV Cache Inference Speed"
width="1000"
/>

The figure above shows the difference in inference times with and without KV Caching. Here we see that preventing the recomputation of all previous timestamps K and V values and simply requiring a single computation of the current timestamps K and V values is incredibly advantageous. The furthest left plot shows inference time versus generation length with a fixed prompt of 5 tokens. The plot shows basically no speed increase for trajectory gernation lengths up to 50 points. The middle plot talks about inference time versus prompt length. Since our trajcetories were only 50 in length, I constrained them here. You can see that the amount of time to predict the remaining trajectory decreases as prompt length increases which makes sense (more provided, less to predict). The final plot shows the combination of computation gains based on prompt length and generation length.

## References

- [RMSNorm: Root Mean Square Layer Normalization](https://arxiv.org/abs/1910.07467) - Zhang & Sennrich, 2019
- [RMSNorm Implementation](https://github.com/bzhangGo/rmsnorm) - Reference implementation
- [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) - Su et al., 2021
- [Rotary Embeddings: A Relative Revolution](https://blog.eleuther.ai/rotary-embeddings/) - EleutherAI
- [Let's build GPT: from scratch, in code, spelled out](https://www.youtube.com/watch?v=kCc8FmEb1nY) - Andrej Karpathy
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading