Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
"editor.formatOnSave": false,
},
"python.testing.pytestArgs": [
"src"
Expand Down
106 changes: 97 additions & 9 deletions POSITRONIC_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,13 @@ In src/openpi/training/config.py modify config with new dataset name.

```python
TrainConfig(
name="pi0_positronic_lowmem",
# Here is an example of loading a pi0-FAST model for LoRA finetuning.
# For setting action_dim, action_horizon, and max_token_len, see the comments above.
model=pi0.Pi0Config(paligemma_variant="gemma_2b_lora", action_expert_variant="gemma_300m_lora"),
name="pi05_positronic_lowmem",
# Pi05 model with LoRA finetuning for low memory usage on Positronic dataset.
model=pi0.Pi0Config(
pi05=True,
paligemma_variant="gemma_2b_lora",
action_expert_variant="gemma_300m_lora",
),

data=LeRobotPositronicDataConfig(
repo_id="<PUT YOUR DATASET NAME HERE>",
Expand All @@ -25,21 +28,106 @@ In src/openpi/training/config.py modify config with new dataset name.
Run script to compute normalization stats for your dataset.

```bash
HF_LEROBOT_HOME="path/to/dataset" uv run scripts/compute_norm_stats.py --config-name pi0_positronic_lowmem
HF_LEROBOT_HOME="path/to/dataset" uv run scripts/compute_norm_stats.py --config-name pi05_positronic_lowmem
```

### Run train

Train policy.

```bash
HF_LEROBOT_HOME="path/to/dataset" XLA_PYTHON_CLIENT_MEM_FRACTION=0.995 uv run scripts/train.py pi0_positronic_lowmem --exp-name=<expriment_name> --overwrite
HF_LEROBOT_HOME="path/to/dataset" XLA_PYTHON_CLIENT_MEM_FRACTION=0.995 uv run scripts/train.py pi05_positronic_lowmem --exp-name=<expriment_name> --overwrite
```

### Serve policy

After training you could find weights in `checkpoints/pi05_positronic_lowmem/<experiment name>/`. Run policy server with:

```bash
uv run scripts/serve_policy.py policy:checkpoint --policy.config=pi05_positronic_lowmem --policy.dir checkpoints/pi05_positronic_lowmem/<experiment name>/29999/
```

By default, the server is served from port 8000, so if you serve it on Nebius or other cloud provider, you need to have this port open:
```bash
ssh -L 8000:localhost:8000 <YOUR-MACHINE-IP>
```

Then on the local machine, you will run the inference script (from `positronic` repository root):
```bash
python -m positronic.run_inference sim_pi0 --output_dir=../datasets/inference/ --show_gui --num_iterations=10 --simulation_time=15
```

## Using Docker

### Build and Push Docker Image

Build and push the training image to Nebius Container Registry:

```bash
cd docker
make push
```

This will:
1. Build the Docker image with all source code baked in
2. Tag it with version, git SHA, and `latest`
3. Push to the Nebius registry at `cr.eu-north1.nebius.cloud/e00a0ahqzcp9x0xczz/openpi-training`

### Run Training in Docker

All commands can be run using the cloud Docker image. The code is already baked into the image, so no source code mounting is needed:

```bash
docker run --rm -it --gpus all --pull=always \
-v /datasets:/datasets \
-v /outputs:/outputs \
cr.eu-north1.nebius.cloud/e00a0ahqzcp9x0xczz/openpi-training \
<your command>
```

#### Docker Examples

**Compute norm stats:**
```bash
docker run --rm -it --gpus all --pull=always \
-v /datasets:/datasets \
-v /outputs/openpi/assets:/openpi/assets \
-e HF_LEROBOT_HOME=/datasets/<path to your dataset> \
cr.eu-north1.nebius.cloud/e00a0ahqzcp9x0xczz/openpi-training \
python -m scripts.compute_norm_stats --config-name pi05_positronic_lowmem
```

**Run train:**
```bash
docker run --rm -it --gpus all --pull=always \
-v /datasets:/datasets \
-v /outputs/openpi/assets:/openpi/assets \
-v /outputs/checkpoints:/openpi/checkpoints \
-e HF_LEROBOT_HOME=/datasets/<path to your dataset> \
cr.eu-north1.nebius.cloud/e00a0ahqzcp9x0xczz/openpi-training \
python -m scripts.train pi05_positronic_lowmem --exp-name=<experiment_name>
```

**Serve policy:**
```bash
docker run --rm -it --gpus all --pull=always \
-v /outputs/openpi/checkpoints:/openpi/checkpoints \
-p 8000:8000 \
cr.eu-north1.nebius.cloud/e00a0ahqzcp9x0xczz/openpi-training \
python -m scripts.serve_policy policy:checkpoint \
--policy.config=pi05_positronic_lowmem \
--policy.dir=/openpi/checkpoints/pi05_positronic_lowmem/<experiment_name>/29999/
```

### Serve polciy
### Local Development (Optional)

After training you could find weights in `checkpoints/pi0_positronic_lowmem/<experiment name>/`. Run policy server with:
For local development with live code editing, use `Dockerfile.training`:

```bash
uv run scripts/serve_policy.py policy:checkpoint --policy.config=pi0_positronic_lowmem --policy.dir checkpoints/pi0_positronic_lowmem/<experiment name>/29999/
docker build -f docker/Dockerfile.training -t openpi-training-dev .
docker run --rm -it --gpus all \
-v $PWD:/openpi \
-v /datasets:/datasets \
openpi-training-dev \
bash -lc 'uv pip install --no-deps -e /openpi && <your command>'
```
41 changes: 41 additions & 0 deletions docker/Dockerfile.training
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Local development Dockerfile with bind-mount support.
# The virtualenv and dependencies are pre-installed in the image, then at runtime you mount
# your local source code over /openpi and reinstall in editable mode for live code changes.
#
# Build:
# docker build -f Dockerfile.training -t openpi-training .
#
# Run:
# docker run --rm -it --gpus all \
# -v $PWD:/openpi \
# -v /datasets:/datasets \
# openpi-training \
# bash -lc 'python -m pip install --no-deps -e /openpi && python scripts/your_script.py'
#
# Note: After mounting your code, reinstall with 'python -m pip install --no-deps -e /openpi'
# to update package metadata while keeping dependencies from the image.
FROM nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
RUN apt-get update && apt-get install -y \
build-essential \
curl \
git \
ffmpeg \
libturbojpeg \
&& rm -rf /var/lib/apt/lists/*
ENV UV_LINK_MODE=copy
ENV PIP_DEFAULT_TIMEOUT=600
ENV UV_HTTP_TIMEOUT=600
RUN uv venv --python 3.11
ENV VIRTUAL_ENV=/.venv
ENV PATH=/$VIRTUAL_ENV/bin:$PATH
COPY pyproject.toml uv.lock ./
# Copy workspace packages needed for dependency resolution
COPY packages ./packages
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --extra dev --no-install-project
COPY . /openpi
WORKDIR /openpi
RUN uv pip install --no-deps .
ENV PYTHONPATH=/openpi
CMD ["bash"]
50 changes: 50 additions & 0 deletions docker/Dockerfile.training.cloud
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
FROM nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04

# Copy uv for fast Python package management
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

# Install minimal system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
git \
ffmpeg \
libturbojpeg \
&& rm -rf /var/lib/apt/lists/*

# Configure uv and pip timeouts
ENV UV_LINK_MODE=copy
ENV PIP_DEFAULT_TIMEOUT=600
ENV UV_HTTP_TIMEOUT=600

# Create Python virtual environment
RUN uv venv --python 3.11

# Activate venv for docker run
ENV VIRTUAL_ENV=/.venv
ENV PATH=/$VIRTUAL_ENV/bin:$PATH

# Copy project metadata and lock for reproducible deps
COPY pyproject.toml uv.lock ./
# Copy workspace packages needed for dependency resolution
COPY packages ./packages
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --extra dev --no-install-project

# Copy source code into the image
COPY . /openpi

# Set working directory
WORKDIR /openpi

# Install the project so runtime metadata (e.g. importlib.metadata) is present.
# For local dev containers that bind-mount the repo, run
# `uv pip install --no-deps -e /openpi` once after start so edits reflect
# immediately while preserving metadata.
RUN uv pip install --no-deps .

# Set Python path
ENV PYTHONPATH=/openpi

# Default command
CMD ["bash"]
89 changes: 89 additions & 0 deletions docker/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
.PHONY: all build tag push clean prune help

# Image configuration
IMAGE_NAME := openpi-training

# Extract version from pyproject.toml
VERSION := $(shell grep '^version = ' ../pyproject.toml | sed 's/version = "\(.*\)"/\1/')

# Get git commit SHA (short)
GIT_SHA := $(shell git rev-parse --short HEAD 2>/dev/null || echo "unknown")

# Registry URL - can be overridden via environment variable or command line
# Example: make push REGISTRY_URL=cr.eu-north1.nebius.cloud/e00a0ahqzcp9x0xczz
REGISTRY_URL ?= cr.eu-north1.nebius.cloud/e00a0ahqzcp9x0xczz

# Image tags
TAG_LATEST := $(REGISTRY_URL)/$(IMAGE_NAME):latest
TAG_VERSION := $(REGISTRY_URL)/$(IMAGE_NAME):v$(VERSION)
TAG_SHA := $(REGISTRY_URL)/$(IMAGE_NAME):$(GIT_SHA)
LOCAL_TAG := $(IMAGE_NAME):local

help:
@echo "OpenPI Training Container - Makefile"
@echo ""
@echo "Configuration:"
@echo " Image Name: $(IMAGE_NAME)"
@echo " Version: $(VERSION)"
@echo " Git SHA: $(GIT_SHA)"
@echo " Registry URL: $(REGISTRY_URL)"
@echo ""
@echo "Targets:"
@echo " make build Build the training image with baked source code"
@echo " make tag Tag the image (depends on build)"
@echo " make push Push all tags to registry (depends on tag)"
@echo " make all Build, tag, and push (alias for push)"
@echo " make clean Remove local training images"
@echo " make prune Remove dangling/unused Docker images"
@echo " make help Show this help message"
@echo ""
@echo "To override registry URL:"
@echo " make push REGISTRY_URL=your-registry-url"

build:
@echo "Building $(IMAGE_NAME) with source code baked in..."
@if [ -z "$(VERSION)" ]; then \
echo "Error: Could not extract version from pyproject.toml"; \
exit 1; \
fi
docker build -f Dockerfile.training.cloud -t $(LOCAL_TAG) ..

tag: build
@echo "Tagging image with multiple tags..."
@if [ -z "$(REGISTRY_URL)" ]; then \
echo "Error: REGISTRY_URL is not set."; \
echo "Please set it via: make push REGISTRY_URL=your-registry-url"; \
exit 1; \
fi
docker tag $(LOCAL_TAG) $(TAG_LATEST)
docker tag $(LOCAL_TAG) $(TAG_VERSION)
docker tag $(LOCAL_TAG) $(TAG_SHA)
@echo "Tagged with:"
@echo " - $(TAG_LATEST)"
@echo " - $(TAG_VERSION)"
@echo " - $(TAG_SHA)"

push: tag
@echo "Pushing images to Container Registry..."
docker push $(TAG_LATEST)
docker push $(TAG_VERSION)
docker push $(TAG_SHA)
@echo ""
@echo "Successfully pushed to Container Registry!"
@echo "Pull with:"
@echo " docker pull $(TAG_LATEST)"

all: push

clean:
@echo "Removing local training images..."
-docker rmi $(LOCAL_TAG)
-docker rmi $(TAG_LATEST)
-docker rmi $(TAG_VERSION)
-docker rmi $(TAG_SHA)
@echo "Local images removed."

prune:
@echo "Pruning dangling and unused Docker images..."
docker image prune -f
@echo "Prune complete."
2 changes: 1 addition & 1 deletion packages/openpi-client/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ requires-python = ">=3.7"
dependencies = [
"dm-tree>=0.1.8",
"msgpack>=1.0.5",
"numpy>=1.22.4,<2.0.0",
"numpy>=1.22.4", # ,<2.0.0
"pillow>=9.0.0",
"tree>=0.2.4",
"websockets>=11.0",
Expand Down
Loading
Loading