Autoencoder Lab (AE + GM-VAE)

Autoencoder Lab is a local-first, artifact-driven web app for training and exploring:

Supervised Autoencoder (ae)
Gaussian Mixture VAE (gmvae)

The backend computes training/projections/neighbors and writes run artifacts to disk. The frontend renders those artifacts (plus live SSE updates) without recomputing ML outputs.

Current functionality

Run creation from UI with configurable AE/GM-VAE hyperparameters
Live run monitoring via SSE (run.*, train.*, artifact.created, projection.progress)
Epoch playback for metrics and reconstructions
Latent Explorer route: /runs/[id]/latent
Projection methods:
- PCA 2D (default per epoch)
- PCA 3D (per epoch + latest; generated during training for new runs, backfillable for older runs)
- UMAP 2D (on-demand enable/backfill + future epochs)
- UMAP 3D (on-demand/backfill + future epochs, same UMAP settings with n_components=3)
2D and 3D latent rendering with epoch/method/dimension controls
Category legend toggles (show/hide labels) in 2D and 3D
Selectable points with dataset sample preview (image + dataset index + label)
Per-run persisted point-size controls in 2D and 3D for:
- sample points
- centroids
- GM-VAE cluster centers
Latent Explorer UMAP controls (same behavior as run page):
- UMAP quality selector (Standard 1024, High 2048)
- compute UMAP directly from latent explorer when UMAP artifacts are missing
Sampling panel (GM-VAE, artifact-driven):
- posterior-local sampling for selected point (q(z|x_i) decode grids)
- component-conditional prior sampling (p(z|c) decode grids)
- deterministic seed by default, optional randomized seed
- cached artifact reuse and sample debug readout (path, seed, compute_ms)
Neighbors / Extent panel (PAIR-style):
- Focus modes: Selected, Random 25, Random 100, All
- k slider (2..50)
- extent slider (0..100%)
- edge modes: Gated by extent (PAIR) and Always show top-k
- optional radii and edge toggles
- debug panel with scale/radius/alpha/monotonicity counters
2D overlay modes in Latent Explorer:
- Off
- Density (GM-VAE only): Posterior/Prior source, opacity control, density debug stats
- Distortion: metric-ratio heatmap (mean d_proj / mean d_latent over kNN), opacity control, distortion debug stats
- Supplemental legends in 2D:
  - centroid and cluster-center marker legend (shown only when those layers are enabled)
  - density/distortion gradient legend (shown only for the active overlay mode)
- UMAP density guidance note shown in UI to explain nonlinear embedding distortion
Mixture Health panel (GM-VAE):
- reads empirical prior stats from artifacts
- shows Neff, Neff/K, pmax, entropy, and collapse-status badge
- student interpretation line (Healthy / Moderate collapse / Severe collapse)
On-demand backfill for legacy runs:
- UMAP projections
- PCA3 projections
- density artifacts (pca or umap)
- distortion artifacts (pca or umap)
- neighbors artifacts (pca or umap)
Artifact browser for each run
Run cancellation and resumable/reloadable run history
Insights artifacts and UI panel on run detail
GM-VAE anti-collapse training options in run config:
- categorical KL annealing (categorical_kl_anneal*)
- entropy bonus schedule (entropy_bonus_*)
- emitted metrics include cat_kl_weight, entropy_bonus_weight, qc_entropy_mean, qc_neff, qc_pmax, qc_argmax_counts
Network Architecture / Feature Map Explorer (artifact-driven):
- architecture diagram and node details from network/architecture*.json
- single-image forward-pass artifacts by epoch/index/layer (network/forward/...)
- channel inspector artifacts (network/inspect/...) with:
  - channel map, heat overlay, fixed overlay fallback
  - optional (x,y) probe and approximate receptive-field mapping back to input (rf)
  - RF spotlight (darkened outside region), hover zoom crop, and probe marker on channel map
  - single-convolution-step explainer for conv1-like layers (C_in=1, 3x3) with patch/kernel/product math and model consistency check
- latent perturb + decode artifacts (network/perturb/...) for decodable vectors:
  - AE: z
  - GM-VAE: mu, z_sample
- GM-VAE forward reconstruction sample controls:
  - lock sample / resample (sample_id)
  - forward sampling metadata artifact (forward_meta.json)
- kernel artifacts for selected conv channels (network/kernels/...):
  - conv1 direct 3x3 kernel display (C_in=1)
  - deeper conv aggregated kernel (L2 over C_in) + top input-channel kernel slices
- top activating examples for encoder spatial layer+channel (network/topk/...), with clickable thumbnails that set panel dataset index
  - score metric is backend-selected (max by default in UI; mean supported by API)
- compare mode:
  - synchronized layer/channel/probe/opacity controls across both panels
  - shared channel-level panels (kernel summary + top activating examples) shown once in full width

Architecture

Backend: FastAPI (backend.app.main:app) + PyTorch
Frontend: Next.js App Router + TypeScript
Transport: SSE for live events; filesystem artifacts for deterministic reloads
Storage root: runs/<run_id>/...

Canonical contracts live in:

AGENTS.md (operating rules)
SPEC.md (architecture/artifact semantics)
http://localhost:8000/openapi.json (authoritative API schema)

Repo layout

backend/ FastAPI routes, run manager, training runner, projections/backfills, artifacts, tests
frontend/ Next.js UI (/, /runs, /runs/[id], /runs/[id]/latent)
models/ PyTorch model definitions
train_ae.py, train_vae.py legacy CLI entry points
app.py legacy Streamlit app (kept for incremental migration)
runs/ generated artifacts (gitignored)

Run artifact structure

Each run is persisted under:

runs/<run_id>/
  config.json
  events.jsonl
  summary.json
  checkpoints/
  samples/
  projections/
  neighbors/
  density/
  distortion/
  insights/
  network/

Projection artifact naming:

PCA 2D:
- projections/latent_2d_epoch_<N>.json
- projections/latent_2d_latest.json
PCA 3D:
- projections/latent_3d_epoch_<N>_pca.json
- projections/latent_3d_latest_pca.json
UMAP 2D:
- projections/latent_2d_epoch_<N>_umap.json
- projections/latent_2d_latest_umap.json
UMAP 3D:
- projections/latent_3d_epoch_<N>_umap.json
- projections/latent_3d_latest_umap.json

Neighbors artifact naming (k_max=50, latent-space Euclidean kNN):

PCA:
- neighbors/knn_pca_epoch_<N>.json
- neighbors/knn_pca_latest.json
UMAP:
- neighbors/knn_umap_epoch_<N>.json
- neighbors/knn_umap_latest.json

Density artifact naming (2D only):

PCA:
- density/posterior_2d_epoch_<N>_pca.json
- density/posterior_2d_latest_pca.json
- density/prior_2d_epoch_<N>_pca.json
- density/prior_2d_latest_pca.json
- density/prior_empirical_2d_epoch_<N>_pca.json
- density/prior_empirical_2d_latest_pca.json
UMAP:
- density/posterior_2d_epoch_<N>_umap.json
- density/posterior_2d_latest_umap.json
- density/prior_2d_epoch_<N>_umap.json
- density/prior_2d_latest_umap.json
- density/prior_empirical_2d_epoch_<N>_umap.json
- density/prior_empirical_2d_latest_umap.json

Sampling artifact naming (GM-VAE, on demand):

Posterior-local:
- samples/posterior_local_epoch_<N>_idx__m_<M>.png
- samples/posterior_local_latest_idx__m_<M>.png
- samples/posterior_local_epoch_<N>_idx__m_<M>.json
Prior-component:
- samples/prior_component_epoch_<N>_c_<c>_m_<M>.png
- samples/prior_component_latest_c_<c>_m_<M>.png
- samples/prior_component_epoch_<N>_c_<c>_m_<M>.json

Metric distortion artifact naming (2D only):

PCA:
- distortion/metric_ratio_2d_epoch_<N>_pca.json
- distortion/metric_ratio_2d_latest_pca.json
UMAP:
- distortion/metric_ratio_2d_epoch_<N>_umap.json
- distortion/metric_ratio_2d_latest_umap.json

Network explorer artifact naming (on demand):

Forward pass:
- network/forward/epoch_<N>/idx_/input.png
- network/forward/epoch_<N>/idx_/recon.png
- network/forward/epoch_<N>/idx_/layers.json
- network/forward/epoch_<N>/idx_/latent.json
- network/forward/epoch_<N>/idx_/forward_meta.json
Layer summary/image:
- network/forward/epoch_<N>/idx_/layer_<layer_id>/summary.json
- network/forward/epoch_<N>/idx_/layer_<layer_id>/grid.png (spatial)
- network/forward/epoch_<N>/idx_/layer_<layer_id>/vector.png (vector)
Channel inspector:
- network/inspect/epoch_<N>/idx_/layer_<layer_id>/ch_<c>/inspect.json
- network/inspect/epoch_<N>/idx_/layer_<layer_id>/ch_<c>/channel.png
- network/inspect/epoch_<N>/idx_/layer_<layer_id>/ch_<c>/heat_rgba.png
- network/inspect/epoch_<N>/idx_/layer_<layer_id>/ch_<c>/overlay.png
Perturb + decode:
- network/perturb/epoch_<N>/idx_/vector_<key>/dim_<d>/delta_<tag>/recon.png
- network/perturb/epoch_<N>/idx_/vector_<key>/dim_<d>/delta_<tag>/meta.json
Top activating examples:
- network/topk/epoch_<N>/layer_<layer_id>/ch_<c>/metric_<metric>/subset_<n>/k_<k>/topk.json
- network/topk/epoch_<N>/layer_<layer_id>/ch_<c>/metric_<metric>/subset_<n>/k_<k>/input_<dataset_idx>.png
Kernel artifacts:
- network/kernels/epoch_<N>/layer_<layer_id>/out_<c>/kernel.json (conv1)
- network/kernels/epoch_<N>/layer_<layer_id>/out_<c>/kernel.png (conv1 optional)
- network/kernels/epoch_<N>/layer_<layer_id>/out_<c>/kernel_agg.json (C_in>1)
- network/kernels/epoch_<N>/layer_<layer_id>/out_<c>/kernel_agg.png (optional)
- network/kernels/epoch_<N>/layer_<layer_id>/out_<c>/top_in_channels.json (C_in>1)
- network/kernels/epoch_<N>/layer_<layer_id>/out_<c>/kernel_in_<in_channel>.png (optional)
Single convolution step:
- network/conv_step/epoch_<N>/idx_/layer_<layer_id>/out_<c>/x_<x>_y_<y>/conv_step.json

Local development

Prerequisites

Python 3.13
uv
Node.js + npm

1) Start backend

From repo root:

uv sync
uv run uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

Backend environment (local dev):

Put local backend overrides in backend/.env (auto-loaded on backend startup via python-dotenv).
Keep secrets out of git; use backend/.env.example as the committed template.
Tutor-related keys commonly used:
- TUTOR_LLM_ENABLED=false
- TUTOR_PROVIDER=openai
- TUTOR_MODEL=gpt-4o-mini
- TUTOR_TEMPERATURE=0.2
- TUTOR_MAX_TOKENS=500
- OPENAI_API_KEY=...
- OPENAI_BASE_URL=https://api.openai.com/v1

2) Start frontend

From frontend/:

npm install
npm run dev

Optional backend override:

NEXT_PUBLIC_BACKEND_URL=http://localhost:8000 npm run dev

For frontend-only local overrides, Next.js reads frontend/.env.local.

3) Open the app

Frontend: http://localhost:3000
OpenAPI docs: http://localhost:8000/docs
OpenAPI JSON: http://localhost:8000/openapi.json
Health: http://localhost:8000/api/health

Typical workflow

Create a run from /.
Monitor live training on /runs/<run_id>.
Open /runs/<run_id>/latent for method/dimension/epoch exploration.
If needed for older runs:
- click Compute PCA 3D for this run
- enable/recompute UMAP on run detail or latent explorer
- in Neighbors panel click Compute <METHOD> neighbors for this run
- in Projection controls set overlay to Density or Distortion; use compute buttons if artifacts are missing
Use point selection + sample preview, Sampling panel, Neighbors/Extent controls, Mixture Health, and overlay modes for exploration.

Network Explorer workflow

Open /runs/<run_id>/network.
Choose epoch, layer, and dataset index in the Forward Pass / Feature Map Explorer.
For GM-VAE, use Lock sample + Resample to control sampled-z reconstruction deterministically (sample_id).
In Channel Inspector (spatial layers):
- click activation tiles or channel map to inspect channel responses
- click activation-map coordinates to probe (x,y) and view receptive-field overlay on input
- for conv1-like layers, inspect Single convolution step to connect a probed activation to patch/kernel/product math
- use Top activating examples to compute/show top-K dataset items for a selected encoder channel
In latent/vector views, use Perturb + Decode to compare baseline (delta=0) vs perturbed reconstructions.

Interpreting overlays

Density and distortion overlays are visualization-space diagnostics.
In PCA space, geometry is linear and often easier to interpret globally.
In UMAP space, geometry is nonlinear:
- UMAP is not a coordinate transform of latent probability space.
- UMAP does not preserve volume or Gaussian structure.
- UMAP primarily preserves local neighborhood relationships.
Practical guidance:
- compare PCA vs UMAP overlays side-by-side
- treat UMAP overlays as local-structure aids, not direct probability maps

API quick reference

Base prefix: /api

GET /health
POST /runs
GET /runs
GET /runs/{run_id}
POST /runs/{run_id}/cancel
GET /runs/{run_id}/events (SSE with Last-Event-ID replay)
GET /runs/{run_id}/artifacts
GET /runs/{run_id}/artifacts/{artifact_path}
GET /runs/{run_id}/dataset-samples/{sample_idx}
POST /runs/{run_id}/projections/umap (enable/backfill/recompute UMAP)
POST /runs/{run_id}/projections/pca3 (backfill PCA3)
POST /runs/{run_id}/neighbors/backfill?method=pca|umap
POST /runs/{run_id}/density/backfill?method=pca|umap
POST /runs/{run_id}/distortion/backfill?method=pca|umap
POST /runs/{run_id}/sample/posterior_local
POST /runs/{run_id}/sample/prior_component
POST /runs/{run_id}/network/forward
POST /runs/{run_id}/network/inspect
POST /runs/{run_id}/network/conv_step
POST /runs/{run_id}/network/kernel
POST /runs/{run_id}/network/perturb
POST /runs/{run_id}/network/topk

Troubleshooting

CORS / fetch errors:
- ensure frontend is on http://localhost:3000 or http://127.0.0.1:3000
- ensure backend is on http://localhost:8000
Neighbors artifact not ready yet for this method/epoch:
- use the Neighbors panel backfill button or POST /runs/{id}/neighbors/backfill
Density artifact not ready / Distortion artifact not ready:
- use the corresponding compute button in Latent Explorer
- or call POST /runs/{id}/density/backfill / POST /runs/{id}/distortion/backfill
UMAP options unavailable:
- enable UMAP from run detail or latent explorer (POST /runs/{id}/projections/umap)
3D PCA unavailable for old runs:
- trigger PCA3 backfill (POST /runs/{id}/projections/pca3)
command-not-found issues:
- run uv sync for Python deps
- run npm install in frontend/

Verification commands

Backend tests:

uv run pytest -q backend/tests

Frontend typecheck:

cd frontend && npx tsc --noEmit

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
README.md		README.md
SPEC.md		SPEC.md
app.py		app.py
pyproject.toml		pyproject.toml
scratch.py		scratch.py
train_ae.py		train_ae.py
train_vae.py		train_vae.py
utils.py		utils.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autoencoder Lab (AE + GM-VAE)

Current functionality

Architecture

Repo layout

Run artifact structure

Local development

Prerequisites

1) Start backend

2) Start frontend

3) Open the app

Typical workflow

Network Explorer workflow

Interpreting overlays

API quick reference

Troubleshooting

Verification commands

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autoencoder Lab (AE + GM-VAE)

Current functionality

Architecture

Repo layout

Run artifact structure

Local development

Prerequisites

1) Start backend

2) Start frontend

3) Open the app

Typical workflow

Network Explorer workflow

Interpreting overlays

API quick reference

Troubleshooting

Verification commands

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages