SloMR · SloMR · Apr 21, 2026 · Apr 21, 2026 · Apr 21, 2026 · Apr 21, 2026
diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@
 
 ---
 
-Works with any OpenAI-compatible endpoint — local llama.cpp servers, OpenAI, Groq, DeepSeek, OpenRouter, and more.
+Works with any OpenAI-compatible endpoint — local servers, OpenAI, Groq, DeepSeek, OpenRouter, and more.
 
 Two interfaces share the same pipeline:
 
@@ -26,8 +26,10 @@ Two interfaces share the same pipeline:
 
 ## Highlights
 
-- **Batched translation** — sends ~15 subtitle blocks at a time so small models don't drift, skip short lines, or merge split sentences.
-- **Strict validation** — every batch is checked for block count, numbering, and unchanged timestamps; failures retry with back-off.
+- **Batched translation** — sends ~10 subtitle blocks at a time so small models don't drift, skip short lines, or merge split sentences.
+- **Cast & register prepass** — a pre-scan extracts characters, recurring terms, and the written register so every batch translates names and formality consistently.
+- **Strict validation** — every batch is checked for block count, numbering, and unchanged timestamps; failures retry with back-off and recursively split on repeated failure.
+- **Auto-detect source language** — omit the source and the model infers it from the text, so mixed-language batches translate to a single target cleanly.
 - **Any OpenAI-compatible provider** — local or cloud, no vendor lock-in.
 - **Parallelism** — translate many batches per file and many files at once.
 - **Live progress** — per-file progress bars in the web app, an in-place status line (elapsed / ETA / throughput) in the CLI.
@@ -40,7 +42,7 @@ npm install
 ng serve
 ```
 
-Open http://localhost:4200, drop in one or more subtitle files, pick source/target languages and a provider, and download translated files individually or as a ZIP.
+Open http://localhost:4200, drop in one or more subtitle files, pick a target language (source defaults to Auto-detect) and a provider, and download translated files individually or as a ZIP.
 
 ## Command line
 
@@ -49,16 +51,20 @@ cd cli
 
 # Option A — pip
 pip install -r requirements.txt
-python translora.py movie.srt -s English -t Arabic \
+python translora.py movie.srt -t Arabic \
   --api-url http://127.0.0.1:8080/v1/chat/completions
 
 # Option B — uv (faster, auto-manages the venv)
 uv sync
-uv run translora.py movie.srt -s English -t Arabic \
+uv run translora.py movie.srt -t Arabic \
+  --api-url http://127.0.0.1:8080/v1/chat/completions
+
+# Explicit source language (skip auto-detect)
+python translora.py movie.srt -s English -t Arabic \
   --api-url http://127.0.0.1:8080/v1/chat/completions
 
-# Cloud provider, whole folder in parallel
-python translora.py ./subs/ -s English -t Arabic \
+# Cloud provider, whole folder in parallel (source auto-detected per file)
+python translora.py ./subs/ -t Arabic \
   --api-url https://api.openai.com/v1/chat/completions \
   --api-key sk-... --model gpt-4.1-mini -c 10 -pf 3
 ```
@@ -67,28 +73,31 @@ Frequently used flags:
 
 | Flag | Description |
 | --- | --- |
-| `-s, --source` / `-t, --target` | Source and target language names |
+| `-t, --target` | Target language name (required) |
+| `-s, --source` | Source language (optional; omit to auto-detect — useful for mixed-language batches) |
 | `--api-url` | OpenAI-compatible `/v1/chat/completions` endpoint |
 | `--api-key` | API key; use `none` for local servers |
 | `--model` | Model name (optional for local) |
-| `--batch-size` | Subtitle blocks per batch (default **15**) |
-| `-c, --concurrency` | Parallel batches per file (default **1**) |
+| `--batch-size` | Subtitle blocks per batch (default **10**) |
+| `-c, --concurrency` | Parallel batches per file (default **1** — raise for cloud providers) |
 | `-pf, --parallel-files` | Files translated in parallel (default **1**) |
 | `--max-retries` | Retries per batch (default **5**) |
 | `--force` | Re-translate even if the output exists |
+| `-v, --verbose` | Show retry/validation warnings (hidden by default) |
 | `-o, --output` | Output path (single file only) |
 
 Set `NO_COLOR=1` to disable ANSI colors; output auto-falls back to plain lines when piped.
 
 ## How it works
 
-Small and medium LLMs have known failure modes on long subtitle files: skipping one-word blocks (`"Oh!"`, `"Hmm."`), merging sentences split across two blocks for timing, and drifting mid-file. TransLora defends against that with a five-step pipeline:
+Small and medium LLMs have known failure modes on long subtitle files: skipping one-word blocks (`"Oh!"`, `"Hmm."`), merging sentences split across two blocks for timing, drifting mid-file, and switching dialect or formality between batches. TransLora defends against that with a six-step pipeline:
 
 1. Parse the subtitle file into numbered blocks with timestamps (SRT, VTT, ASS, SSA, SBV, SUB).
-2. Split blocks into batches small enough that the model can't drift.
-3. Send each batch with a structure-preserving system prompt.
-4. Validate the response: block count in = out, numbers and timestamps untouched.
-5. Retry failed batches up to `--max-retries` before flagging the file, then stitch the validated batches back in order.
+2. Pre-scan the file with one extra LLM call to extract the cast, recurring terms, and the written register (e.g. Modern Standard Arabic, peninsular Spanish, polite Japanese). The relevant slice is attached to each batch so names and formality stay consistent across the whole file.
+3. Split blocks into batches small enough that the model can't drift.
+4. Send each batch with a structure-preserving system prompt.
+5. Validate the response: block count in = out, numbers and timestamps untouched. Repeated failures recursively split the batch down to singletons before giving up.
+6. Retry failed batches up to `--max-retries` before flagging the file, then stitch the validated batches back in order.
 
 ## Providers
 
@@ -128,7 +137,6 @@ Anything else that speaks the OpenAI chat-completions protocol will work the sam
 ## Roadmap
 
 - Side-by-side preview and per-block editing in the web app
-- Translation memory for character-voice consistency across a file
 - General document/text translation beyond subtitles
 
 ## License

diff --git a/cli/core/batch_runner.py b/cli/core/batch_runner.py
@@ -1,10 +1,4 @@
-"""Per-batch HTTP call, response sanitizing, and retry loop.
-
-This is the "send one batch, get it back validated" layer. It knows how
-to talk to an OpenAI-compatible chat endpoint and how to recover from
-transient failures. Everything above this layer (translator.py) just
-asks for batches and stitches them together.
-"""
+"""Per-batch HTTP call, response sanitizing, and retry loop."""
 
 from __future__ import annotations
 
@@ -14,7 +8,8 @@
 
 import httpx
 
-from .srt_parser import SubtitleBlock, parse_srt, serialize_srt, validate_batch
+from .context_pass import FileContext
+from .srt_parser import SubtitleBlock, parse_lite, serialize_lite, validate_batch
 from .config import TranslationConfig
 from .prompt import SYSTEM_PROMPT
 
@@ -25,16 +20,11 @@
 
 
 class FileTranslationError(Exception):
-    """A batch used up all its retries — the whole file is considered failed."""
-
+    """A batch exhausted its retries; the whole file is considered failed."""
 
-# ---------------------------------------------------------------------------
-# Input sanitization — users paste URLs/keys in all kinds of shapes.
-# ---------------------------------------------------------------------------
 
 def sanitize_api_url(url: str) -> str:
-    """Drop credential query params like `?key=...` so we don't authenticate
-    twice when the user pastes a pre-keyed URL."""
+    """Drop credential query params so we don't authenticate twice."""
     url = (url or "").strip()
     if not url:
         return url
@@ -49,7 +39,6 @@ def sanitize_api_url(url: str) -> str:
 
 
 def sanitize_api_key(key: str) -> str:
-    """Strip whitespace, surrounding quotes, and any `Bearer ` prefix."""
     k = (key or "").strip()
     if (k.startswith('"') and k.endswith('"')) or \
        (k.startswith("'") and k.endswith("'")):
@@ -60,7 +49,6 @@ def sanitize_api_key(key: str) -> str:
 
 
 def strip_markdown_fences(text: str) -> str:
-    """LLMs sometimes wrap output in ```...``` despite being told not to."""
     text = text.strip()
     if text.startswith("```"):
         text = re.sub(r"^```[a-zA-Z]*\n?", "", text)
@@ -69,29 +57,23 @@ def strip_markdown_fences(text: str) -> str:
 
 
 def is_retryable_http(code: int) -> bool:
-    """Retry on timeout / rate-limit / server errors. Everything else is fatal."""
     return code in (408, 429) or code >= 500
 
 
-# ---------------------------------------------------------------------------
-# HTTP call + retry
-# ---------------------------------------------------------------------------
-
 async def call_chat_api(
     client: httpx.AsyncClient,
-    batch_srt: str,
+    system_prompt: str,
+    user_message: str,
     cfg: TranslationConfig,
-    block_count: int,
+    max_tokens: int,
 ) -> str:
-    """POST one batch to the OpenAI-compatible chat endpoint, return raw text."""
     body: dict = {
         "messages": [
-            {"role": "system", "content": SYSTEM_PROMPT},
-            {"role": "user", "content":
-                f"Translate from {cfg.source_lang} to {cfg.target_lang}:\n\n{batch_srt}"},
+            {"role": "system", "content": system_prompt},
+            {"role": "user", "content": user_message},
         ],
         "temperature": 0.1,
-        "max_tokens": max(block_count, 1) * 120,
+        "max_tokens": max(max_tokens, 1),
         "stream": False,
         "cache_prompt": True,
     }
@@ -109,25 +91,67 @@ async def call_chat_api(
     return resp.json()["choices"][0]["message"]["content"]
 
 
+def _build_user_message(
+    cfg: TranslationConfig,
+    batch_wire: str,
+    file_context: FileContext | None,
+    batch: list[SubtitleBlock],
+) -> str:
+    if cfg.source_lang:
+        header = f"Translate from {cfg.source_lang} to {cfg.target_lang}:"
+    else:
+        header = f"Translate to {cfg.target_lang}:"
+    if file_context is not None:
+        ctx = file_context.render_for_batch(batch)
+        if ctx:
+            return f"Glossary for this scene:\n{ctx}\n\n{header}\n\n{batch_wire}"
+    return f"{header}\n\n{batch_wire}"
+
+
+_ATTEMPTS_BEFORE_SPLIT = 2
+
+
 async def translate_batch_with_retry(
     client: httpx.AsyncClient,
     batch_idx: int,
     batch: list[SubtitleBlock],
     cfg: TranslationConfig,
+    file_context: FileContext | None = None,
+    _split_path: str = "",
 ) -> list[SubtitleBlock]:
-    """Translate one batch; retry on transient errors; raise on exhaustion."""
-    batch_srt = serialize_srt(batch)
-    label = f"Batch {batch_idx + 1}"
+    """Translate one batch; on repeated validation failure, halve and recurse.
+
+    Persistent count mismatches usually mean the model is deterministically
+    merging two adjacent similar-looking blocks. Halving keeps terminating
+    because at N=1 a count mismatch is impossible.
+    """
+    batch_wire = serialize_lite(batch)
+    user_msg = _build_user_message(cfg, batch_wire, file_context, batch)
+    label = f"Batch {batch_idx + 1}" + (f".{_split_path}" if _split_path else "")
     first_block = batch[0].number
 
-    for attempt in range(1, cfg.max_retries + 1):
-        tag = f"attempt {attempt}/{cfg.max_retries}"
+    can_split = len(batch) > 1
+    attempts = _ATTEMPTS_BEFORE_SPLIT if can_split else cfg.max_retries
+    hit_validation_failure = False
+
+    for attempt in range(1, attempts + 1):
+        tag = f"attempt {attempt}/{attempts}"
         try:
-            raw = await call_chat_api(client, batch_srt, cfg, len(batch))
-            output = parse_srt(strip_markdown_fences(raw))
+            raw = await call_chat_api(
+                client, SYSTEM_PROMPT, user_msg, cfg, max(len(batch), 1) * 120,
+            )
+            output = parse_lite(strip_markdown_fences(raw))
+            if len(output) == len(batch):
+                output = [
+                    SubtitleBlock(number=batch[i].number,
+                                  timestamp=batch[i].timestamp,
+                                  text=output[i].text)
+                    for i in range(len(batch))
+                ]
             check = validate_batch(batch, output)
             if check.ok:
                 return output
+            hit_validation_failure = True
             cfg.warn(f"    {label} validation failed ({tag}): {check.error}")
 
         except httpx.HTTPStatusError as e:
@@ -139,19 +163,35 @@ async def translate_batch_with_retry(
                 raise FileTranslationError(
                     f"{label} (block {first_block}) HTTP {code}: {snippet}"
                 )
-            if code == 429 and attempt < cfg.max_retries:
+            if code == 429 and attempt < attempts:
                 delay = 2 ** attempt
-                cfg.warn(f"    Rate limited — waiting {delay}s...")
+                cfg.warn(f"    Rate limited - waiting {delay}s...")
                 await asyncio.sleep(delay)
                 continue
 
-        except Exception as e:  # network error, JSON decode error, etc.
+        except Exception as e:
             cfg.warn(f"    {label} request failed ({tag}): {e}")
 
-        # Small back-off before the next attempt (1s, 2s, 3s cap).
-        if attempt < cfg.max_retries:
+        if attempt < attempts:
             await asyncio.sleep(min(attempt, 3))
 
+    if hit_validation_failure and can_split:
+        mid = len(batch) // 2
+        left, right = batch[:mid], batch[mid:]
+        cfg.warn(
+            f"    {label} splitting {len(batch)} -> {len(left)} + {len(right)} blocks"
+        )
+        left_path = (_split_path + "L") if _split_path else "L"
+        right_path = (_split_path + "R") if _split_path else "R"
+        # Sequential: parallel halves would oversubscribe the outer semaphore.
+        left_result = await translate_batch_with_retry(
+            client, batch_idx, left, cfg, file_context, left_path,
+        )
+        right_result = await translate_batch_with_retry(
+            client, batch_idx, right, cfg, file_context, right_path,
+        )
+        return left_result + right_result
+
     raise FileTranslationError(
-        f"{label} (block {first_block}) failed all {cfg.max_retries} retries"
+        f"{label} (block {first_block}) failed all {attempts} retries"
     )
diff --git a/cli/core/config.py b/cli/core/config.py
@@ -8,25 +8,26 @@
 DEFAULT_MAX_RETRIES = 5
 
 
-def _default_warn(msg: str) -> None:
+def _silent_warn(msg: str) -> None:
+    pass
+
+
+def _stderr_warn(msg: str) -> None:
     print(msg, file=sys.stderr)
 
 
 @dataclass
 class TranslationConfig:
-    """Everything a translation run needs beyond the file paths.
-
-    Bundled so we aren't threading 8+ arguments through every helper.
-    `warn` lets callers intercept retry/validation messages so they can be
-    routed around a live progress line instead of clobbering it.
-    """
-    source_lang: str
+    """Per-run config. `warn` is the retry/validation sink — silent by default,
+    rebindable by callers so it can route around a live progress line."""
+    source_lang: str  # "" means auto-detect
     target_lang: str
     api_url: str
     api_key: str
     model: str | None = None
-    batch_size: int = 15
+    batch_size: int = 10
     concurrency: int = 1
     max_retries: int = DEFAULT_MAX_RETRIES
     quiet: bool = False
-    warn: Callable[[str], None] = field(default=_default_warn)
+    verbose: bool = False
+    warn: Callable[[str], None] = field(default=_silent_warn)