Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Exclude files from Docker build context. This prevents unnecessary files from
# being sent to Docker daemon, reducing build time and image size.

# Python artifacts
__pycache__/
*.pyc
*.pyo
*.pyd
*.egg-info/

# Virtual environments
venv/
.venv/
env/
.env
.envrc
client_venv.helpers/
ENV/

# Jupyter
.ipynb_checkpoints/
.jupyter/

# Build artifacts
build/
dist/
*.eggs/
.eggs/

# Cache and temporary files
*.log
*.tmp
*.cache
.pytest_cache/
.mypy_cache/
.coverage
htmlcov/

# Git and version control
.git/
.gitignore
.gitattributes
.github/

# Docker build scripts (not needed at runtime)
docker_build.sh
docker_push.sh
docker_clean.sh
docker_exec.sh
docker_cmd.sh
docker_bash.sh
docker_jupyter.sh
docker_name.sh
run_jupyter.sh
Dockerfile.*
.dockerignore

# Documentation
README.md
README.admin.md
docs/
*.md
CHANGELOG.md
LICENSE

# Configuration and secrets
.env.*
.env.local
.env.development
.env.production
.DS_Store
Thumbs.db

# Shell configuration
.bashrc
.bash_history
.zshrc

# Large data files (mount via volume instead)
data/
*.csv
*.pkl
*.h5
*.parquet
*.feather
*.arrow
*.npy
*.npz

# Generated images
*.png
*.jpg
*.jpeg
*.gif
*.svg
*.pdf

# Test files and examples
tests/
test_*
*_test.py
tutorials/
examples/

# IDE and editor files
.vscode/
.idea/
*.swp
*.swo
*~
.project
.pydevproject
.settings/
*.iml
.sublime-project
.sublime-workspace

# Node and frontend (if applicable)
node_modules/
npm-debug.log
yarn-error.log
.npm

# Requirements management
requirements.in
Pipfile
Pipfile.lock
poetry.lock
setup.py
setup.cfg

# CI/CD configuration
.gitlab-ci.yml
.travis.yml
Jenkinsfile
.circleci/

# Miscellaneous
*.bak
.venv.bak/
*.whl
*.tar.gz
*.zip
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Generated data — written by ingester pods, fetched for visualization
data/

# Jupyter
.ipynb_checkpoints/

# Python
__pycache__/
*.py[cod]

# OS
.DS_Store
Thumbs.db
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
FROM python:3.12-slim

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /work
COPY requirements.txt /tmp/requirements.txt

# Install CPU-only torch + torchvision together so versions match
# (open_clip pulls torchvision; default index would yield a CUDA-built one).
RUN pip install --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/cpu \
&& pip install --no-cache-dir jupyterlab -r /tmp/requirements.txt

EXPOSE 8888
CMD ["jupyter", "lab", \
"--ip=0.0.0.0", "--port=8888", \
"--no-browser", "--allow-root", \
"--ServerApp.token=", "--ServerApp.password=", \
"--ServerApp.root_dir=/work"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Kubernetes TurboQuant

A CLIP image-embedding service deployed on Kubernetes, with TurboQuant
compression on the resulting vectors to shrink the in-memory index ~8x.

The setup imitates a production embedding service: an `embedder` Deployment
serves `POST /embed`, an `ingester` Job reads a dataset and posts batches to
it, and the HPA scales embedder pods up under load.

## Repo layout

- `services/embedder/` — FastAPI + ONNX CLIP. Scaled by the HPA.
- `services/ingester/` — pulls the image tar from Hugging Face, batches the files, POSTs to the embedder.
- `services/compression/` — TurboQuant ADC index used by the demo notebook.
- `k8s/` — manifests (namespace, embedder Deployment/Service/HPA, ingester Job).
- `Dockerfile` — demo container (JupyterLab + analysis deps).
- `setup_cluster.sh` — builds the embedder/ingester images, loads them into
minikube, applies the static manifests.
- `demo.ipynb` — end-to-end demo (polls for shards, runs TurboQuant, shows recall + visual results).

## Local installs

- [Docker](https://docs.docker.com/engine/install/)
- [minikube](https://minikube.sigs.k8s.io/docs/start/)
- [`kubectl`](https://kubernetes.io/docs/tasks/tools/)
- `bash` — Git Bash works on Windows ([git-scm.com](https://git-scm.com/))

## Running it

From this directory:

```bash
minikube start
minikube addons enable metrics-server
./setup_cluster.sh
```

The first build pulls torch + open_clip into the embedder image and takes a
while; subsequent builds are cached.

In a second terminal, mount the project's `data/` dir into the cluster and
**leave it running**.

```bash
minikube mount "$(pwd)/data:/data"
```

Then build and start the demo container:

```bash
docker build -t tq-demo .
docker run --rm -it -p 8888:8888 -v "$(pwd):/work" tq-demo
```

Open `http://localhost:8888`, then `demo.ipynb`.

In a third terminal, kick off the ingester:

```bash
kubectl apply -f k8s/ingester-job.yaml
```

Four ingester pods download the image tar, POST batches to the embedder
service, and the HPA scales the embedder up under the load. Watch it
happen:

```bash
kubectl get hpa -n turboquant -w
```

When all four shards land in `data/embeddings/`, the notebook continues
through TurboQuant compression, a Recall@10 benchmark, and a grid of the
top results.

## Switching dataset size

By default the ingester pulls a 1,000-image tar from Hugging Face. Larger
options are available — to use them, edit `k8s/ingester-job.yaml` and
change `DATASET_URL` to one of:

- `https://huggingface.co/datasets/twood1/turboquant-art-1k/resolve/main/images_1k.tar` (default)
- `https://huggingface.co/datasets/twood1/turboquant-art-5k/resolve/main/images_5k.tar`
- `https://huggingface.co/datasets/twood1/turboquant-art-50k/resolve/main/images_50k.tar`

## Tearing down

```bash
kubectl delete namespace turboquant
minikube stop
```

## Embedder API

- `POST /embed` — multipart upload, takes `files=[...]`, returns
`{"embeddings": [[...], ...]}`.
- `GET /healthz` — readiness probe.
- `GET /metrics` — Prometheus metrics.
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Architecture

Three Docker images:

- **`tq-embedder`** — FastAPI service that takes a batch of JPEGs and returns CLIP embeddings. Deployed as a K8s Deployment + HPA (scales 1→5 on CPU).
- **`tq-ingester`** — One-shot K8s Job that downloads the dataset, POSTs batches to the embedder, and writes embedding shards to `/data/embeddings/`.
- **`tq-demo`** — JupyterLab container running the notebook: combines the shards, runs TurboQuant compression, computes recall, and visualizes results.

The host's `./data/` dir is shared with the cluster via `minikube mount` and with the demo container via a bind mount, so all three see the same files.

## TurboQuant compression

Two-stage scalar quantization:

1. **Rotate** each vector by a random orthogonal matrix. Coordinates then follow a known Beta distribution where scalar quantization is near-optimal.
2. **MSE stage** — quantize each rotated coordinate to `bits-1` bits using a precomputed codebook (centroids minimize MSE under the Beta distribution).
3. **QJL stage** — encode the residual at 1 bit per dim via a random-sign projection. Recovers precision the MSE stage lost.

Stored per vector: stage-1 indices, stage-2 signs, original norm, residual norm. At 3 bits/dim on 512-d embeddings, ~10× smaller than FP32.

## Asymmetric distance (ADC)

Queries stay FP32 (rotated the same way as the DB). So, how do we calculate distance between a DB compressed vector and a non compressed query vector?

1. for each rotated query coord, multiply by the centroid the DB stored. This will be a vector where each position (index) is the value of the centroid.
2. Add a correction term: project the query through the QJL sign matrix and dot with the stored signs.
3. Multiply by the stored norm.

"asymmetric" because the query is full-precision and the DB is compressed. we do not decompress the vectors to perform distance calc.

### Worked example

```
# original vectors (fp32)
x = [+1, −1, +1, +1] (DB vector)
q = [+0.3, +0.2, +0.4, −0.1] (query)


# NAIVE — dot q with x directly
⟨q, x⟩ = 0.3·(+1) + 0.2·(−1) + 0.4·(+1) + (−0.1)·(+1)
= +0.30 − 0.20 + 0.40 − 0.10
= 0.40


# ADC — we never keep x. we keep:
codebook = [−0.5, +0.5] # 2 allowed values, shared by every DB vector
norm = 2.0 # ||x||
indices = [1, 0, 1, 1] # per dim, which codebook entry is closest to x[d]/norm

# per dim, multiply q[d] by the codebook value the index points to
dim 0: q[0] · codebook[indices[0]] = +0.3 · (+0.5) = +0.15
dim 1: q[1] · codebook[indices[1]] = +0.2 · (−0.5) = −0.10
dim 2: q[2] · codebook[indices[2]] = +0.4 · (+0.5) = +0.20
dim 3: q[3] · codebook[indices[3]] = −0.1 · (+0.5) = −0.05

partial = +0.15 − 0.10 + 0.20 − 0.05 = +0.20

# multiply by stored norm
⟨q, x⟩ = norm · partial = 2.0 · 0.20 = 0.40


# both paths land on 0.40 — ADC never read x.
```
Loading