FaceFusionCpp uses a flexible YAML configuration system. There are two main configuration files:
config/app_config.yaml: Global application settings (hardware foundation, paths, logging, observability).config/task_config.yaml: Task-specific settings (I/O strategy, pipeline topology, algorithm parameters).
This file is usually located in the config/ directory. It defines the runtime environment and infrastructure. These settings apply globally to the entire application.
config_version: "0.34.0"
# --- Inference Infrastructure (Graphics Card Settings) ---
inference:
device_id: 0 # GPU Device ID (Default: 0. Leave as 0 if you only have one dedicated GPU).
engine_cache:
enable: true # Enable推理引擎缓存 (Default: true. Speeds up startup).
path: "./.cache/tensorrt" # Cache location (relative to root).
max_entries: 3 # LRU (Least Recently Used) Cache Limit (Default: 3. Max number of model engines to keep in VRAM; oldest are unloaded first).
idle_timeout_seconds: 60 # TTL (Time To Live) auto-release time after idle (Default: 60s. How long before unloading engine to free VRAM).
default_providers: # Default inference backend priority (Default: tensorrt > cuda > cpu).
- tensorrt
- cuda
- cpu
# --- Resource & Performance (How to manage RAM/VRAM) ---
resource:
# Memory strategy (Default: "strict")
# strict: "Throw away after use" mode. Loads models into VRAM only at the exact second of face swapping. Ideal for low VRAM machines (e.g., <8GB).
# tolerant: "Resident memory" mode. Loads everything upfront. Ideal for high-end setups (12GB+) that demand extreme processing speed.
memory_strategy: "strict"
# Adaptive Backpressure Limit (Default: "4GB").
# This serves as a critical "safety valve" mechanism.
# When memory usage nears this cap, the producer (video decoder) automatically throttles or pauses until the consumer (AI inference engine) processes the queued frames, fundamentally preventing OOM crashes.
max_memory_usage: "4GB"
# --- Logging (Where to look if things go wrong) ---
logging:
level: "info" # Log level. Supports trace, debug, info, warn, error (Default: "info").
directory: "./logs" # Where logs are saved.
rotation: "daily" # How often to create a new log file (Default: "daily").
max_files: 7 # How many old log files to keep (Default: 7).
max_total_size: "1GB" # Maximum total size for all logs combined (Default: "1GB").
# --- Metrics (Tools to plot performance charts) ---
metrics:
enable: true # Enable tracking (Default: true).
step_latency: true # Log exactly how many milliseconds each step takes (Default: true).
gpu_memory: true # Track VRAM usage curve (Default: true).
report_path: "./logs/metrics_{timestamp}.json" # Where the report is saved.
# --- Model Management ---
models:
path: "./assets/models"
# What if a model is missing? (Default: "auto" - download automatically)
# Other options: "skip" (skip and throw error, if you prefer manual downloads), "force" (force re-download everything).
download_strategy: "auto"
temp_directory: "./temp" # Temporary file directory.
# --- Default Task Settings (Fallback Mechanism) ---
# When you forget to specify these params in a face-swapping task, the program uses these defaults.
default_task_settings:
io:
output:
video_encoder: "libx264" # Video encoder (Default "libx264" for best compatibility across most players).
video_quality: 80 # Video quality (Default 80. Range 0-100, higher is clearer but file is larger).
prefix: "result_" # Prefix automatically added to the generated filename (Default "result_").
suffix: "" # Suffix automatically added to the generated filename (Default empty).
conflict_policy: "error" # What to do on filename collision (Default "error". Can use "overwrite" to replace).
audio_policy: "copy" # What to do with the video audio (Default "copy" to keep original audio. Can use "skip" to mute).This file defines the specific task you want to execute (e.g., swapping a face in a particular video). You can pass this file via the -c/--task-config command line argument.
-
task_info: Task metadata. Supportsenable_logging(independent logging, defaultfalse) andenable_resume(resume from breakpoint, defaultfalse. E.g., if a long video crashes, it can pick up where it left off). -
io: Input sources (source_paths) and targets (target_paths). Supports images, videos, and directory scanning. -
io.output:[!TIP] If you omit these parameters, they will automatically fall back to the
default_task_settingsdefaults defined inapp_config.yaml.path: Output directory (must be absolute path).prefix: Output filename prefix (Defaultresult_).suffix: Output filename suffix (Default empty).conflict_policy:overwrite,rename,error(Defaulterror).audio_policy:copy(keeps original track, Defaultcopy),skip(mutes output).
-
resource:thread_count: Concurrent task thread count (Default0, lets program decide, usually half your CPU thread count).max_queue_size: Max queue capacity, buffers to prevent VRAM overflow. (Default20. If you get constant OOM errors, drop this to 10 or 5).execution_order:sequential(Default): Low Latency Mode. Processes frame by frame in order. Best if VRAM fits all active models.batch: High Throughput Mode. Processes all frames through Processor A, buffers results, then shuts down A to load B.- Advantage: Combines with
strictmemory mode for "single-model VRAM footprint," enabling large pipelines on 4GB-8GB cards.
batch_buffer_mode:memory(saves to RAM, fast) ordisk(saves to disk to prevent RAM blowouts, slower but stable).segment_duration_seconds: Video segment processing length (Default0no segmentation. Can set to minutes for ultra-long videos).
Configures global parameters for how faces are detected and cropped. All algorithms here do not affect how the final face looks, only how accurately it is found.
config_version: "1.0"
task_info:
id: "my_first_swap"
description: "Swap face A into video B"
io:
source_paths:
- "inputs/face_a.jpg" # Source face image
target_paths:
- "inputs/video_b.mp4" # Target video
output:
path: "outputs/result.mp4"
prefix: "result_"
suffix: "_v1"
resource:
thread_count: 0
max_queue_size: 20
execution_order: "sequential"
face_analysis:
face_detector:
models: ["yoloface", "retinaface"] # Try yolo first, falback to retina.
score_threshold: 0.5 # Detection confidence (Default 0.5. Lowering to e.g. 0.3 finds blurry faces but might mistake leaves as faces. Raising to 0.8 is highly accurate but misses blurry side profiles).
face_landmarker:
model: "2dfan4" # Model to finding 68 facial keypoints. (Default 2dfan4, most stable right now).
face_recognizer:
model: "arcface_w600k_r50" # Model used to compare if two faces belong to the same person.
similarity_threshold: 0.6 # Similarity threshold (Default 0.6. Takes effect under 'reference' mode. 0.7 is extremely strict, 0.4 swaps almost any face).
face_masker:
# Mask fusion strategies. How to seamlessly stitch the face back on without wiping out blocking hair or glasses.
types: ["box", "occlusion", "region"] # box: geometric boundary; occlusion: occlusion detection; region: semantic segmentation.
occluder_model: "xseg" # Model to mask out hands/objects in front of the face.
parser_model: "bisenet_resnet_34" # Model to parse out facial features.
region: ["skin", "nose", "mouth"] # Semantic regions. Supports: left-eyebrow, neck, cloth, hair, hat, etc.
# Default "all". Beginners should leave this untouched.
---
## 5. Error Handling & Diagnostics
When the program reports an error, refer to the `E` prefix codes for quick troubleshooting:
* **E1xx (System)**: E.g., `E101` (VRAM Overflow/OOM). Try lowering `max_queue_size` or switching to `batch` execution.
* **E2xx (Config)**: E.g., `E201` (YAML Syntax Error). Check your indentation.
* **E3xx (Model)**: E.g., `E302` (Model File Missing). Check your asset repository.
* **E4xx (Runtime)**: E.g., `E403` (No Face Detected). This is a normal business flow; the frame will pass through unchanged.
> [!IMPORTANT]
> **Pass-through Mechanism**: To ensure long video tasks don't halt and keep audio-video sync, when a face isn't detected or a reference fails to match, the program will **skip processing and output the original frame** instead of stopping or dropping the frame.
The core business logic! These are the "workers" (processors) that execute the tasks. They consist of a series of steps, passing the frame sequentially down the line based on the order you configure them.
Note
If you omit parameters inside params, they will automatically inherit values from Global Pipeline Step Params or their respective hard defaults.
The most important worker: Replaces the face in the target with the source face.
model: Model name, supportsinswapper_128orinswapper_128_fp16. (Defaultinswapper_128_fp16. Anyfp16suffix runs faster on most cards with 0 visual difference).face_selector_mode: (Defaultmany)many: Swaps all detected faces in the frame.one: Only swaps the largest face in the frame (the protagonist), ignoring background actors.reference: Only swaps faces that closely match the photo provided viareference_face_path(tells the program exactly "who is who").
Restores facial details and mosaics. (Since face swapping usually outputs low-res 128x128 faces, this is a required step for HD).
model: Model name, supportscodeformer,gfpgan_1.2~1.4. (Defaultgfpgan_1.4. Trycodeformerif the source looks severely broken).blend_factor: The blend ratio between the enhanced face and original face (0.0 - 1.0). (Default0.8. 1.0 creates a flawless but artificial 3D look. 0.8 retains roughly 20% of the original photo's lighting atmosphere, looking most natural).
Corrects the stiffly cropped swapped face so that the eyeballs and micro-expressions perfectly match the original target's mood.
model: Model name, supportslive_portrait. (Defaultlive_portrait).restore_factor: Restoration ratio (0.0 - 1.0). (Default0.8. Best left at default).
Makes an entire blurry old photo or video crisp (including the background). A highly resource-intensive step.
model: Model name, supports various multipliers likereal_esrgan_x4_fp16.enhance_factor: Unsharp mask intensity on output frames. (Default1.0).- Tile Chunking Strategy: The app automatically splits the frame into tiles based on your VRAM to prevent 4K videos from OOM crashing, usually requiring zero manual tweaks.
If you want to only swap a specific person, you don't have to copy-paste the same long directory path into every single face_enhancer and face_swapper param block. You can define the default reference target globally:
# This is placed at the same indentation level as `pipeline`
global_pipeline_step_params:
# Globally specify: only look for referenced objects when selecting faces
face_selector_mode: "reference"
# Set the picture of the target object. All steps default to this configuration.
reference_face_path: "D:/assets/my_face.jpg"Enables face swapping and enhancement, exporting a high quality video encode.
io:
output:
video_quality: 95
pipeline:
- step: "face_swapper"
- step: "face_enhancer"
params:
blend_factor: 1.0
- step: "frame_enhancer" # Optional: Full frame super resolutionReduces VRAM usage via sequential execution and strict memory strategies.
# In app_config.yaml
resource:
memory_strategy: "strict"
# In task_config.yaml
resource:
execution_order: "sequential"
max_queue_size: 2 # Keep the pipeline queue smallOnly swaps a specific face within the video.
pipeline:
- step: "face_swapper"
params:
face_selector_mode: "reference"
reference_face_path: "inputs/person_to_replace.jpg"