Fictionarry · jmanhype · Mar 7, 2025 · Mar 7, 2025 · Mar 7, 2025 · Mar 14, 2025
diff --git a/.github/workflows/docker-build.yml b/.github/workflows/docker-build.yml
@@ -0,0 +1,62 @@
+name: Docker Build
+
+on:
+  push:
+    branches: [ bilingual-docs ]
+    paths:
+      - 'Dockerfile'
+      - 'Dockerfile.ci'
+      - '.github/workflows/docker-build.yml'
+  pull_request:
+    branches: [ bilingual-docs ]
+    paths:
+      - 'Dockerfile'
+      - 'Dockerfile.ci'
+  workflow_dispatch:
+
+jobs:
+  build-ci:
+    name: CI Optimized Build
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v3
+        with:
+          submodules: false
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v2
+
+      - name: Build CI-optimized Docker image
+        uses: docker/build-push-action@v4
+        with:
+          context: .
+          file: ./Dockerfile.ci
+          push: false
+          tags: instag:ci
+          cache-from: type=gha,scope=ci
+          cache-to: type=gha,mode=max,scope=ci
+
+  build-full:
+    name: Full Production Build
+    runs-on: ubuntu-latest
+    needs: build-ci
+    if: ${{ github.event_name == 'workflow_dispatch' }}
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v3
+        with:
+          submodules: true
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v2
+
+      - name: Build full Docker image
+        uses: docker/build-push-action@v4
+        with:
+          context: .
+          file: ./Dockerfile
+          push: false
+          tags: instag:latest
+          cache-from: type=gha,scope=full
+          cache-to: type=gha,mode=max,scope=full
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,26 @@
+# InsTaG Framework Commands and Guidelines
+
+## Common Commands
+- **Build Environment**: `conda env create --file environment.yml`
+- **Process Video**: `python data_utils/process.py data/<ID>/<ID>.mp4`
+- **Generate Teeth Mask**: `python data_utils/easyportrait/create_teeth_mask.py ./data/<ID>`
+- **Extract Audio Features**: `python data_utils/deepspeech_features/extract_ds_features.py --input data/<n>.wav`
+- **Pre-training**: `bash scripts/pretrain_con.sh data/pretrain output/<project_name> <GPU_ID>`
+- **Fine-tuning**: `bash scripts/train_xx_few.sh data/<ID> output/<project_name> <GPU_ID>`
+- **Synthesis**: `python synthesize_fuse.py -S data/<ID> -M output/<project_name> --audio <path> --audio_extractor <type>`
+- **Docker Commands**: Use `./docker-run.sh` with various subcommands (see README_docker.md)
+
+## Code Style Guidelines
+- **Python Version**: 3.9 for main code, 3.10 for Sapiens
+- **Formatting**: Follow existing style in files (indentation, line breaks)
+- **Imports**: Group standard library, third-party, and local imports
+- **Naming**: Use snake_case for variables/functions, CamelCase for classes
+- **Error Handling**: Use try/except blocks for file operations and external calls
+- **Documentation**: Add docstrings for new functions and classes
+
+## Project Structure
+- `/data`: Input videos and processed data
+- `/output`: Generated models and results
+- `/data_utils`: Processing utilities for various modalities
+- `/scene`: Core rendering and modeling code
+- `/utils`: Helper functions for audio, image, and graphics processing
diff --git a/DOCUMENTATION_CN.md b/DOCUMENTATION_CN.md
@@ -0,0 +1,67 @@
+# Docker Setup for InsTaG Training Framework
+
+## English
+
+This pull request provides a complete Docker-based environment for the InsTaG training framework. It addresses several setup challenges documented in the issues by providing a consistent, containerized environment.
+
+### Key Features:
+
+1. **Dual Container Architecture:**
+   - Main container (CUDA 11.7, Python 3.9) for training and inference
+   - Separate Sapiens container (CUDA 12.1, Python 3.10) for geometry priors
+
+2. **Helper Scripts:**
+   - `docker-run.sh` - Simplifies common operations
+   - `setup-docker.sh` - Automates initial setup and dependency installation
+
+3. **Comprehensive Documentation:**
+   - Complete workflow examples
+   - Detailed troubleshooting guidance
+   - Support for different audio feature extractors (DeepSpeech, Wav2Vec, AVE, HuBERT)
+
+4. **Automated Setup:**
+   - OpenFace integration for facial AU extraction
+   - EasyPortrait model download
+   - Sapiens model download
+
+5. **Workflow Improvements:**
+   - No manual environment conflicts
+   - Simplified audio feature extraction
+   - Streamlined teeth mask generation
+   - Container-based geometry prior generation
+
+The documentation includes examples for both short-video adaptation (with geometry priors) and long-video training, making it easier to use the framework in various scenarios.
+
+---
+
+## 中文
+
+此 Pull Request 为 InsTaG 训练框架提供了完整的基于 Docker 的环境。它通过提供一致的容器化环境解决了 issues 中记录的几个设置挑战。
+
+### 主要特点：
+
+1. **双容器架构：**
+   - 主容器（CUDA 11.7，Python 3.9）用于训练和推理
+   - 单独的 Sapiens 容器（CUDA 12.1，Python 3.10）用于几何先验生成
+
+2. **辅助脚本：**
+   - `docker-run.sh` - 简化常见操作
+   - `setup-docker.sh` - 自动化初始设置和依赖安装
+
+3. **全面的文档：**
+   - 完整的工作流示例
+   - 详细的故障排除指南
+   - 支持不同的音频特征提取器（DeepSpeech、Wav2Vec、AVE、HuBERT）
+
+4. **自动化设置：**
+   - OpenFace 集成用于面部 AU 提取
+   - EasyPortrait 模型下载
+   - Sapiens 模型下载
+
+5. **工作流改进：**
+   - 没有手动环境冲突
+   - 简化的音频特征提取
+   - 简化的牙齿遮罩生成
+   - 基于容器的几何先验生成
+
+文档包括短视频适应（带几何先验）和长视频训练的示例，使框架在各种场景中更易于使用。 
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,158 @@
+# Version: 1.3.0 (Production Ready)
+ARG BASE_IMAGE=nvcr.io/nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04
+FROM $BASE_IMAGE
+
+VOLUME [ "/instag" ]
+
+# Install system dependencies
+RUN apt-get update -yq --fix-missing \
+ && DEBIAN_FRONTEND=noninteractive apt-get install -yq --no-install-recommends \
+    git \
+    wget \
+    cmake \
+    build-essential \
+    libboost-all-dev \
+    libopenblas-dev \
+    liblapack-dev \
+    libx11-dev \
+    libopencv-dev \
+    libgtk-3-dev \
+    pkg-config \
+    libavcodec-dev \
+    libavformat-dev \
+    libswscale-dev \
+    ffmpeg \
+    libsm6 \
+    libxext6 \
+    libgl1-mesa-glx \
+    libglib2.0-0 \
+    libsndfile1 \
+    portaudio19-dev \
+    ninja-build \
+    git-lfs \
+    vim \
+    curl \
+    libopenexr-dev \
+    openexr \
+    python3-dev \
+    libffi-dev \
+    libeigen3-dev \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Miniconda
+RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
+ && bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \
+ && rm Miniconda3-latest-Linux-x86_64.sh
+
+# Add conda to PATH
+ENV PATH="/opt/conda/bin:${PATH}"
+
+# Initialize conda in bash
+RUN conda init bash
+
+# Clone InsTaG repository
+RUN git lfs install \
+ && git clone https://github.com/Fictionarry/InsTaG.git /instag \
+ && cd /instag \
+ && git submodule update --init --recursive
+
+# Set up conda environment for InsTaG
+WORKDIR /instag
+RUN conda config --append channels conda-forge \
+ && conda config --append channels nvidia \
+ && conda create -n instag python=3.9 cudatoolkit=11.7 pytorch=1.13.1 torchvision=0.14.1 torchaudio -c pytorch -c nvidia -y \
+ && echo "source activate instag" > ~/.bashrc
+
+# Print debug information
+RUN conda run -n instag python -c "import torch; print('PyTorch version:', torch.__version__); print('CUDA available:', torch.cuda.is_available()); print('CUDA version:', torch.version.cuda if torch.cuda.is_available() else 'N/A')"
+
+# Install dependencies for InsTaG
+RUN conda run -n instag pip install -r requirements.txt
+
+# Install MMCV with specific CUDA version
+RUN conda run -n instag pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13.0/index.html
+
+# Install CUDA submodules
+RUN conda run -n instag bash -c "cd /instag/submodules/diff-gaussian-rasterization && FORCE_CUDA=1 pip install -e ."
+RUN conda run -n instag bash -c "cd /instag/submodules/simple-knn && FORCE_CUDA=1 pip install -e ."
+RUN conda run -n instag bash -c "cd /instag/gridencoder && pip install -e ."
+RUN conda run -n instag bash -c "cd /instag/shencoder && pip install -e ."
+
+# Install PyTorch3D dependencies
+RUN conda run -n instag pip install "fvcore>=0.1.5" "iopath>=0.1.7" "nvidiacub-dev"
+
+# Install PyTorch3D with maximum compatibility
+RUN conda run -n instag bash -c "\
+    pip install --no-cache-dir pytorch3d==0.7.4 || \
+    pip install --no-cache-dir 'git+https://github.com/facebookresearch/pytorch3d.git@stable' || \
+    echo 'PyTorch3D installation failed, but continuing. You can install it manually later.'"
+
+# Install TensorFlow
+RUN conda run -n instag pip install tensorflow-gpu==2.10.0
+
+# Install OpenFace (critical for training)
+# Split into multiple steps to avoid timeout issues
+RUN mkdir -p /instag/OpenFace \
+    && git clone https://github.com/TadasBaltrusaitis/OpenFace.git /tmp/OpenFace
+
+# Download models 
+RUN cd /tmp/OpenFace && bash ./download_models.sh
+
+# Build OpenFace with all cores for speed
+RUN cd /tmp/OpenFace \
+    && mkdir -p build \
+    && cd build \
+    && cmake -D CMAKE_BUILD_TYPE=RELEASE .. \
+    && make -j$(nproc) \
+    && make install
+
+# Copy binaries and libraries to our OpenFace directory
+RUN cp -r /tmp/OpenFace/build/bin /instag/OpenFace/ \
+    && cp -r /tmp/OpenFace/lib /instag/OpenFace/ \
+    && cp -r /tmp/OpenFace/build/lib /instag/OpenFace/ \
+    && rm -rf /tmp/OpenFace
+
+# Download EasyPortrait model
+RUN mkdir -p /instag/data_utils/easyportrait \
+ && conda run -n instag wget -O /instag/data_utils/easyportrait/fpn-fp-512.pth \
+    https://rndml-team-cv.obs.ru-moscow-1.hc.sbercloud.ru/datasets/easyportrait/experiments/models/fpn-fp-512.pth
+
+# Run prepare script to download required models (critical for training)
+RUN cd /instag && bash scripts/prepare.sh
+
+# Create the Sapiens lite environment
+RUN conda create -n sapiens_lite python=3.10 -y \
+ && conda run -n sapiens_lite conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=11.7 -c pytorch -c nvidia -y \
+ && conda run -n sapiens_lite pip install opencv-python tqdm json-tricks
+
+# Create directories for data and outputs
+RUN mkdir -p /instag/data /instag/output /instag/jobs
+
+# Set up environment paths
+ENV PATH="/opt/conda/bin:/instag/OpenFace/bin:${PATH}"
+
+# Create startup script to activate environment
+RUN echo '#!/bin/bash' > /instag/startup.sh \
+ && echo 'echo "Welcome to InsTaG on RunPod!"' >> /instag/startup.sh \
+ && echo 'echo ""' >> /instag/startup.sh \
+ && echo 'echo "Available environment commands:"' >> /instag/startup.sh \
+ && echo 'echo "conda activate instag    - Activate the main InsTaG environment"' >> /instag/startup.sh \
+ && echo 'echo "conda activate sapiens_lite - Activate the Sapiens environment for geometry priors"' >> /instag/startup.sh \
+ && echo 'echo ""' >> /instag/startup.sh \
+ && echo 'echo "Common workflows:"' >> /instag/startup.sh \
+ && echo 'echo "1. Process a video:        python data_utils/process.py data/<ID>/<ID>.mp4"' >> /instag/startup.sh \
+ && echo 'echo "2. Generate teeth masks:   python data_utils/easyportrait/create_teeth_mask.py ./data/<ID>"' >> /instag/startup.sh \
+ && echo 'echo "3. Run Sapiens (optional): bash data_utils/sapiens/run.sh ./data/<ID>"' >> /instag/startup.sh \
+ && echo 'echo "4. Fine-tune the model:    bash scripts/train_xx_few.sh data/<ID> output/<project_name> <GPU_ID>"' >> /instag/startup.sh \
+ && echo 'echo "5. Synthesize:            python synthesize_fuse.py -S data/<ID> -M output/<project_name> --audio <path> --audio_extractor <type>"' >> /instag/startup.sh \
+ && echo 'echo ""' >> /instag/startup.sh \
+ && echo 'source /opt/conda/etc/profile.d/conda.sh' >> /instag/startup.sh \
+ && echo 'conda activate instag' >> /instag/startup.sh \
+ && echo 'exec bash' >> /instag/startup.sh \
+ && chmod +x /instag/startup.sh
+
+# Set working directory
+WORKDIR /instag
+
+# Default command
+CMD ["/instag/startup.sh"]
diff --git a/Dockerfile.ci b/Dockerfile.ci
@@ -0,0 +1,49 @@
+# Version: 1.0.0 (CI Optimized)
+# This is a CI-optimized Dockerfile for GitHub Actions validation
+# It skips time-consuming steps while still verifying build correctness
+FROM nvcr.io/nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04
+
+# Install system dependencies (minimal set)
+RUN apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+    git wget cmake build-essential \
+    libopencv-dev ffmpeg libsm6 libxext6 libgl1-mesa-glx \
+    libsndfile1 portaudio19-dev \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Miniconda
+RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /tmp/miniconda.sh && \
+    bash /tmp/miniconda.sh -b -p /opt/conda && \
+    rm /tmp/miniconda.sh
+
+# Add conda to PATH
+ENV PATH="/opt/conda/bin:${PATH}"
+
+# Initialize conda in bash
+RUN conda init bash
+
+# Clone InsTaG repository (shallow clone to speed up)
+RUN git clone --depth 1 https://github.com/Fictionarry/InsTaG.git /instag
+
+# Set up conda environment with PyTorch
+WORKDIR /instag
+RUN conda config --append channels conda-forge && \
+    conda config --append channels nvidia && \
+    conda create -n instag python=3.9 cudatoolkit=11.7 pytorch=1.13.1 torchvision=0.14.1 torchaudio -c pytorch -c nvidia -y && \
+    echo "source activate instag" > ~/.bashrc
+
+# Install only core dependencies
+RUN conda run -n instag pip install numpy==1.24.3 pillow==9.5.0 scipy opencv-python tqdm && \
+    conda run -n instag pip install -r requirements.txt
+
+# Create mock directories and files for validating scripts
+RUN mkdir -p /instag/data /instag/output && \
+    mkdir -p /instag/OpenFace/bin && \
+    echo '#!/bin/bash\necho "OpenFace mock for CI"' > /instag/OpenFace/bin/FeatureExtraction && \
+    chmod +x /instag/OpenFace/bin/FeatureExtraction
+
+# Set up environment paths
+ENV PATH="/opt/conda/bin:/instag/OpenFace/bin:${PATH}"
+
+# Validation test command that will run in CI
+CMD ["conda", "run", "-n", "instag", "python", "-c", "import torch; print(f'PyTorch {torch.__version__} with CUDA {torch.version.cuda if torch.cuda.is_available() else \"N/A\"}'); import numpy; import cv2; print('Core imports successful')"]