gpsaggese · twood1 · Apr 1, 2026 · May 7, 2026 · May 7, 2026 · May 8, 2026
diff --git a/...605/Spring2026/projects/UmdTask454_DATA605_Spring2026_Kubernetes_TurboQuant/.dockerignore b/...605/Spring2026/projects/UmdTask454_DATA605_Spring2026_Kubernetes_TurboQuant/.dockerignore
@@ -0,0 +1,143 @@
+# Exclude files from Docker build context. This prevents unnecessary files from
+# being sent to Docker daemon, reducing build time and image size.
+
+# Python artifacts
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+*.egg-info/
+
+# Virtual environments
+venv/
+.venv/
+env/
+.env
+.envrc
+client_venv.helpers/
+ENV/
+
+# Jupyter
+.ipynb_checkpoints/
+.jupyter/
+
+# Build artifacts
+build/
+dist/
+*.eggs/
+.eggs/
+
+# Cache and temporary files
+*.log
+*.tmp
+*.cache
+.pytest_cache/
+.mypy_cache/
+.coverage
+htmlcov/
+
+# Git and version control
+.git/
+.gitignore
+.gitattributes
+.github/
+
+# Docker build scripts (not needed at runtime)
+docker_build.sh
+docker_push.sh
+docker_clean.sh
+docker_exec.sh
+docker_cmd.sh
+docker_bash.sh
+docker_jupyter.sh
+docker_name.sh
+run_jupyter.sh
+Dockerfile.*
+.dockerignore
+
+# Documentation
+README.md
+README.admin.md
+docs/
+*.md
+CHANGELOG.md
+LICENSE
+
+# Configuration and secrets
+.env.*
+.env.local
+.env.development
+.env.production
+.DS_Store
+Thumbs.db
+
+# Shell configuration
+.bashrc
+.bash_history
+.zshrc
+
+# Large data files (mount via volume instead)
+data/
+*.csv
+*.pkl
+*.h5
+*.parquet
+*.feather
+*.arrow
+*.npy
+*.npz
+
+# Generated images
+*.png
+*.jpg
+*.jpeg
+*.gif
+*.svg
+*.pdf
+
+# Test files and examples
+tests/
+test_*
+*_test.py
+tutorials/
+examples/
+
+# IDE and editor files
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+.project
+.pydevproject
+.settings/
+*.iml
+.sublime-project
+.sublime-workspace
+
+# Node and frontend (if applicable)
+node_modules/
+npm-debug.log
+yarn-error.log
+.npm
+
+# Requirements management
+requirements.in
+Pipfile
+Pipfile.lock
+poetry.lock
+setup.py
+setup.cfg
+
+# CI/CD configuration
+.gitlab-ci.yml
+.travis.yml
+Jenkinsfile
+.circleci/
+
+# Miscellaneous
+*.bak
+.venv.bak/
+*.whl
+*.tar.gz
+*.zip
diff --git a/...ata605/Spring2026/projects/UmdTask454_DATA605_Spring2026_Kubernetes_TurboQuant/.gitignore b/...ata605/Spring2026/projects/UmdTask454_DATA605_Spring2026_Kubernetes_TurboQuant/.gitignore
@@ -0,0 +1,13 @@
+# Generated data — written by ingester pods, fetched for visualization
+data/
+
+# Jupyter
+.ipynb_checkpoints/
+
+# Python
+__pycache__/
+*.py[cod]
+
+# OS
+.DS_Store
+Thumbs.db
diff --git a/...ata605/Spring2026/projects/UmdTask454_DATA605_Spring2026_Kubernetes_TurboQuant/Dockerfile b/...ata605/Spring2026/projects/UmdTask454_DATA605_Spring2026_Kubernetes_TurboQuant/Dockerfile
@@ -0,0 +1,22 @@
+FROM python:3.12-slim
+
+ENV DEBIAN_FRONTEND=noninteractive
+
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        ca-certificates \
+    && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /work
+COPY requirements.txt /tmp/requirements.txt
+
+# Install CPU-only torch + torchvision together so versions match
+# (open_clip pulls torchvision; default index would yield a CUDA-built one).
+RUN pip install --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/cpu \
+    && pip install --no-cache-dir jupyterlab -r /tmp/requirements.txt
+
+EXPOSE 8888
+CMD ["jupyter", "lab", \
+     "--ip=0.0.0.0", "--port=8888", \
+     "--no-browser", "--allow-root", \
+     "--ServerApp.token=", "--ServerApp.password=", \
+     "--ServerApp.root_dir=/work"]
diff --git a/...ring2026/projects/UmdTask454_DATA605_Spring2026_Kubernetes_TurboQuant/README.md b/...ring2026/projects/UmdTask454_DATA605_Spring2026_Kubernetes_TurboQuant/README.md
@@ -0,0 +1,97 @@
+# Kubernetes TurboQuant
+
+A CLIP image-embedding service deployed on Kubernetes, with TurboQuant
+compression on the resulting vectors to shrink the in-memory index ~8x.
+
+The setup imitates a production embedding service: an `embedder` Deployment
+serves `POST /embed`, an `ingester` Job reads a dataset and posts batches to
+it, and the HPA scales embedder pods up under load.
+
+## Repo layout
+
+- `services/embedder/` — FastAPI + ONNX CLIP. Scaled by the HPA.
+- `services/ingester/` — pulls the image tar from Hugging Face, batches the files, POSTs to the embedder.
+- `services/compression/` — TurboQuant ADC index used by the demo notebook.
+- `k8s/` — manifests (namespace, embedder Deployment/Service/HPA, ingester Job).
+- `Dockerfile` — demo container (JupyterLab + analysis deps).
+- `setup_cluster.sh` — builds the embedder/ingester images, loads them into
+  minikube, applies the static manifests.
+- `demo.ipynb` — end-to-end demo (polls for shards, runs TurboQuant, shows recall + visual results).
+
+## Local installs
+
+- [Docker](https://docs.docker.com/engine/install/)
+- [minikube](https://minikube.sigs.k8s.io/docs/start/)
+- [`kubectl`](https://kubernetes.io/docs/tasks/tools/)
+- `bash` — Git Bash works on Windows ([git-scm.com](https://git-scm.com/))
+
+## Running it
+
+From this directory:
+
+```bash
+minikube start
+minikube addons enable metrics-server
+./setup_cluster.sh
+```
+
+The first build pulls torch + open_clip into the embedder image and takes a
+while; subsequent builds are cached.
+
+In a second terminal, mount the project's `data/` dir into the cluster and
+**leave it running**.
+
+```bash
+minikube mount "$(pwd)/data:/data"
+```
+
+Then build and start the demo container:
+
+```bash
+docker build -t tq-demo .
+docker run --rm -it -p 8888:8888 -v "$(pwd):/work" tq-demo
+```
+
+Open `http://localhost:8888`, then `demo.ipynb`.
+
+In a third terminal, kick off the ingester:
+
+```bash
+kubectl apply -f k8s/ingester-job.yaml
+```
+
+Four ingester pods download the image tar, POST batches to the embedder
+service, and the HPA scales the embedder up under the load. Watch it
+happen:
+
+```bash
+kubectl get hpa -n turboquant -w
+```
+
+When all four shards land in `data/embeddings/`, the notebook continues
+through TurboQuant compression, a Recall@10 benchmark, and a grid of the
+top results.
+
+## Switching dataset size
+
+By default the ingester pulls a 1,000-image tar from Hugging Face. Larger
+options are available — to use them, edit `k8s/ingester-job.yaml` and
+change `DATASET_URL` to one of:
+
+- `https://huggingface.co/datasets/twood1/turboquant-art-1k/resolve/main/images_1k.tar` (default)
+- `https://huggingface.co/datasets/twood1/turboquant-art-5k/resolve/main/images_5k.tar`
+- `https://huggingface.co/datasets/twood1/turboquant-art-50k/resolve/main/images_50k.tar`
+
+## Tearing down
+
+```bash
+kubectl delete namespace turboquant
+minikube stop
+```
+
+## Embedder API
+
+- `POST /embed` — multipart upload, takes `files=[...]`, returns
+  `{"embeddings": [[...], ...]}`.
+- `GET /healthz` — readiness probe.
+- `GET /metrics` — Prometheus metrics.
diff --git a/...26/projects/UmdTask454_DATA605_Spring2026_Kubernetes_TurboQuant/architecture.md b/...26/projects/UmdTask454_DATA605_Spring2026_Kubernetes_TurboQuant/architecture.md
@@ -0,0 +1,63 @@
+# Architecture
+
+Three Docker images:
+
+- **`tq-embedder`** — FastAPI service that takes a batch of JPEGs and returns CLIP embeddings. Deployed as a K8s Deployment + HPA (scales 1→5 on CPU).
+- **`tq-ingester`** — One-shot K8s Job that downloads the dataset, POSTs batches to the embedder, and writes embedding shards to `/data/embeddings/`.
+- **`tq-demo`** — JupyterLab container running the notebook: combines the shards, runs TurboQuant compression, computes recall, and visualizes results.
+
+The host's `./data/` dir is shared with the cluster via `minikube mount` and with the demo container via a bind mount, so all three see the same files.
+
+## TurboQuant compression
+
+Two-stage scalar quantization:
+
+1. **Rotate** each vector by a random orthogonal matrix. Coordinates then follow a known Beta distribution where scalar quantization is near-optimal.
+2. **MSE stage** — quantize each rotated coordinate to `bits-1` bits using a precomputed codebook (centroids minimize MSE under the Beta distribution).
+3. **QJL stage** — encode the residual at 1 bit per dim via a random-sign projection. Recovers precision the MSE stage lost.
+
+Stored per vector: stage-1 indices, stage-2 signs, original norm, residual norm. At 3 bits/dim on 512-d embeddings, ~10× smaller than FP32.
+
+## Asymmetric distance (ADC)
+
+Queries stay FP32 (rotated the same way as the DB). So, how do we calculate distance between a DB compressed vector and a non compressed query vector?
+
+1. for each rotated query coord, multiply by the centroid the DB stored. This will be a vector where each position (index) is the value of the centroid.
+2. Add a correction term: project the query through the QJL sign matrix and dot with the stored signs.
+3. Multiply by the stored norm.
+
+"asymmetric" because the query is full-precision and the DB is compressed. we do not decompress the vectors to perform distance calc.
+
+### Worked example
+
+```
+# original vectors (fp32)
+x = [+1, −1, +1, +1]            (DB vector)
+q = [+0.3, +0.2, +0.4, −0.1]    (query)
+
+
+# NAIVE — dot q with x directly
+⟨q, x⟩ = 0.3·(+1) + 0.2·(−1) + 0.4·(+1) + (−0.1)·(+1)
+       = +0.30 − 0.20 + 0.40 − 0.10
+       = 0.40
+
+
+# ADC — we never keep x. we keep:
+codebook = [−0.5, +0.5]   # 2 allowed values, shared by every DB vector
+norm     = 2.0            # ||x||
+indices  = [1, 0, 1, 1]   # per dim, which codebook entry is closest to x[d]/norm
+
+# per dim, multiply q[d] by the codebook value the index points to
+dim 0:  q[0] · codebook[indices[0]] = +0.3 · (+0.5) = +0.15
+dim 1:  q[1] · codebook[indices[1]] = +0.2 · (−0.5) = −0.10
+dim 2:  q[2] · codebook[indices[2]] = +0.4 · (+0.5) = +0.20
+dim 3:  q[3] · codebook[indices[3]] = −0.1 · (+0.5) = −0.05
+
+partial = +0.15 − 0.10 + 0.20 − 0.05 = +0.20
+
+# multiply by stored norm
+⟨q, x⟩ = norm · partial = 2.0 · 0.20 = 0.40
+
+
+# both paths land on 0.40 — ADC never read x.
+```