Doppelganger/example.env at main · NotYuSheng/Doppelganger · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Copy to .env and fill in. .env is gitignored — never commit your keys.
# Every value here is OPTIONAL; with none set, ingestion still runs (the LLM
# features just stay off).

# ── Optional LLM features (quality auditor + LLM redaction) ───────────────────
# The CORE pipeline (parse -> dataset + regex sensitive-data scan) needs NONE of
# this and runs with no setup. Uncomment below to ALSO enable the LLM auditor /
# redaction.
#
# Run a LOCAL OpenAI-compatible server so your chat data never leaves your machine
# (vLLM, LM Studio, llama.cpp). Serve an open model, then uncomment:
#
# LLM_VALIDATE=true
# LLM_API_BASE_URL=http://localhost:8000/v1     # vLLM (LM Studio uses :1234/v1)
# LLM_MODEL=Qwen/Qwen2.5-7B-Instruct            # the model your server serves
# LLM_API_KEY=local                             # local servers accept any value

# ── Optional: Hugging Face ────────────────────────────────────────────────────
# Only needed to download GATED models during training (e.g. Gemma). The default
# Qwen model in configs/train_lora.yaml is open and needs no token. Read by the
# training stack (huggingface_hub), not by this repo's ingestion code.
# HF_TOKEN=