LiteVPNet is a scene-split QP prediction pipeline for AV1 video encoding. This package provides a standalone, containerised deployment environment including all necessary dependencies (FFmpeg, VCA, AOM inspect, VMAF, and PyTorch/CUDA).
The initial version here (v0.1.0) is trained with videos ranging from 270p to 1080p. The model is expected to output 8 QPs for each scene, focusing on VMAF scores of {99, 97, 95, 91, 88, 85, 83, 80}.
The model here is trained for NVIDIA GPUs with AV1 encoding enabled (nvenc), which is 40 series and 50 series GPUs.
The Docker image is built on Ubuntu 22.04. However, because it runs in a container, it works on any host OS, including:
- Linux (Natively supported)
- Windows (via Docker Desktop + WSL2)
(Note: MacOS does not support NVIDIA NVENC, so it is not currently supported).
docker pull ghcr.io/sigmedia/litevpnet-deploy:latestMount your input video directory to /data and your desired output directory to /output.
docker run --gpus all \
-v /path/to/my/videos:/data \
-v /path/to/my/output:/output \
-it ghcr.io/sigmedia/litevpnet-deploy:latest \
/data/input_video.mp4 \
--output-dir /output/run1 \
--fast --cleanup --smart --chunker --inline -j 2For the --gpus all flag to work and NVENC encoding to succeed, your host machine must have the NVIDIA Container Toolkit installed.
- On Linux: Follow the NVIDIA Container Toolkit Install Guide.
- On Windows: Simply install Docker Desktop and ensure the WSL2 backend is enabled. GPU passthrough works automatically.
Note
Currently not applicable. We will update this section once we have a stable local installation.
The main entrypoint is src/scene_qp_pipeline.py (which is also the default Docker entrypoint).
positional arguments:
input_video Path to the input MP4 video file
options:
-h, --help show this help message and exit
--output-dir DIR Directory for all outputs (shots, encodes, final)
--model MODEL Model name for QP prediction (default: qptraining_8qp_270p_allres-v2)
--fast Only encode first 3 predicted QPs instead of all 8
--vmaf-target-min V Minimum VMAF target for QP selection (default: 91.0)
--vmaf-bin PATH Path to VMAF binary (default: vmaf)
--no-encode Dry run: scene detect + QP prediction only, skip encoding
--parallel N, -j N Number of scenes to process in parallel (default: 1).
Limited by NVENC sessions; 2-3 is usually safe.
--cleanup Delete scene .y4m shot files after processing
--chunker Use Python-API scene detection with 1/4 downscale and long-scene
splitting (min 2.5s, max 5s chunks) instead of scenedetect CLI
--smart With --chunker: predict QP on first chunk of each shot only,
reuse that QP for remaining chunks (skips feature extraction/prediction)
--inline With --chunker: process each chunk immediately as it is detected
(merges Phase 1 + 2).
High-speed parallel processing (Docker):
Uses smart chunking, processes chunks inline, parallelizes 2 nvenc streams, and cleans up intermediate y4m files.
docker run --gpus all \
-v /media/dataloop/media:/data \
-v /media/colourloop/emerald/mog_test/deploy:/output \
ghcr.io/sigmedia/litevpnet-deploy:latest \
/data/chimera_hevc-short.mp4 \
--output-dir /output/run1 \
--fast --cleanup --smart --chunker --inline -j 2 \
--vmaf-target-min 95Local execution:
python3 src/scene_qp_pipeline.py /media/colourloop/videos/MOGDataset/7_Uma_pedra_no_bolso.mp4 \
--output-dir /media/colourloop/emerald/mog_test/vid_62 \
--cleanup --fast -j 4 --chunker --smart --inline --vmaf-target-min 95