vienneraphael · vienneraphael · Mar 3, 2026 · Mar 3, 2026 · Mar 3, 2026 · Mar 3, 2026
diff --git a/docs/architecture/api.md b/docs/architecture/api.md
@@ -21,8 +21,9 @@ yields `None`. Import it from `batchling`.
   strict queue key `(provider, endpoint, model)`.
 - **`dry_run` behavior**: when `dry_run=True`, requests are still intercepted, queued,
   and grouped using normal window/size triggers, but provider batch submission and polling
-  are skipped. Requests resolve with synthetic `httpx.Response` objects marked with
-  `x-batchling-dry-run: 1`.
+  are skipped. Intercepted requests raise `DryRunEarlyExit` on return instead of
+  producing synthetic provider responses. A static dry-run summary report is emitted
+  at context teardown.
 - **`cache` behavior**: when `cache=True` (default), intercepted requests are fingerprinted
   and looked up in a persistent request cache. Cache hits bypass queueing and resume polling
   from an existing provider batch when not in dry-run mode.

diff --git a/docs/architecture/context.md b/docs/architecture/context.md
@@ -10,6 +10,7 @@ a context variable.
 - Yield `None` for scope-only lifecycle control.
 - Support sync and async context manager patterns for cleanup and context scoping.
 - Start and stop optional Rich live activity display while the context is active.
+- In dry-run mode, aggregate and print a static Rich summary at teardown.
 
 ## Flow summary
 
@@ -20,9 +21,10 @@ a context variable.
 4. If `live_display=True`, the context attempts to start Rich panel rendering at
    enter-time when terminal auto-detection passes (`TTY`, non-`dumb`, non-`CI`).
    Otherwise it registers an `INFO` logging fallback that emits progress at poll-time.
-5. `__aexit__` resets the context and awaits `batcher.close()` to flush pending work.
-6. The live display listener is removed and the panel is stopped when context cleanup
-   finishes.
+5. In dry-run mode, a dedicated summary listener is also registered at enter-time.
+6. `__aexit__` resets the context and awaits `batcher.close()` to flush pending work.
+7. On teardown, the context prints one static dry-run summary report when dry-run is enabled.
+8. Display/listener cleanup runs after close completes.
 
 ## Code reference
 

diff --git a/docs/architecture/core.md b/docs/architecture/core.md
@@ -34,9 +34,11 @@ resolves futures back to callers.
 7. `close()` flushes remaining requests and cancels timers.
 
 In `dry_run` mode, step 3 and provider polling are bypassed: `_process_batch()` still
-creates `_ActiveBatch` for tracking, then resolves each request immediately with a
-synthetic `httpx.Response` (`200`) marked with `x-batchling-dry-run: 1`.
+creates `_ActiveBatch` for tracking, then resolves each request by raising
+`DryRunEarlyExit`.
 Cache lookups remain enabled in dry-run mode for hit accounting, but cache writes are disabled.
+`close()` also waits for in-flight background submission/poll tasks so teardown
+reporting has stable totals.
 
 ## Extension notes
 

diff --git a/docs/dry-run.md b/docs/dry-run.md
@@ -4,9 +4,38 @@
 
 This feature exists for users to be able to debug and better understand what WILL happen when they ultimately disable the flag, giving them the transparency required to be confident in the library.
 
-In practice, the dry run feature deactivates all batch submissions, but everything is done virtually, which means we can count incoming requests, number of batch we would have created, etc..
-
-To put it simply, it provides users with an exact breakdown of what their batched inference run would have been for real.
+In practice, dry-run deactivates all provider submissions while keeping the
+internal batching path active (queueing, windowing, and per-queue grouping).
+
+To put it simply, it provides users with an exact breakdown of what their
+batched inference run would have been for real.
+
+Sample output:
+
+```text
+╭────────────────────────────────────────────── batchling dry run summary ───────────────────────────────────────────────╮
+│ Batchable Requests: 8  -  Cache Hit Requests: 0                                                                        │
+│ ┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ │
+│ ┃ provider    ┃ endpoint                          ┃ model                       ┃ expected reques… ┃ expected batch… ┃ │
+│ ┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │
+│ │ anthropic   │ /v1/messages                      │ claude-haiku-4-5            │                1 │               1 │ │
+│ │ doubleword  │ /v1/responses                     │ openai/gpt-oss-20b          │                1 │               1 │ │
+│ │ gemini      │ /v1beta/models/gemini-2.5-flash-… │ gemini-2.5-flash-lite       │                1 │               1 │ │
+│ │ groq        │ /openai/v1/chat/completions       │ llama-3.1-8b-instant        │                1 │               1 │ │
+│ │ mistral     │ /v1/chat/completions              │ mistral-medium-2505         │                1 │               1 │ │
+│ │ openai      │ /v1/responses                     │ gpt-4o-mini                 │                1 │               1 │ │
+│ │ together    │ /v1/chat/completions              │ google/gemma-3n-E4B-it      │                1 │               1 │ │
+│ │ xai         │ /v1/chat/completions              │ grok-4-1-fast-non-reasoning │                1 │               1 │ │
+│ └─────────────┴───────────────────────────────────┴─────────────────────────────┴──────────────────┴─────────────────┘ │
+╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
+```
+
+## Avoid partial counts
+
+Dry-run exits as soon as the first intercepted request returns, which can lead
+to partial totals if requests are awaited one by one. To let batchling see the
+full request set before exit, schedule requests together and await them with
+`asyncio.gather`.
 
 ## Activating dry run