diff --git a/README.md b/README.md
index 5582f00..a85ca92 100644
--- a/README.md
+++ b/README.md
@@ -19,8 +19,9 @@ NetCopy is **not**:
   bearer token,
 - a backup tool — there is no scheduling, snapshotting, or retention.
 
-Tested on Linux. Runs on macOS and Windows too; see [Known issues](#known-issues)
-for platform caveats.
+Linux is the only platform under CI and the only one we ship release images
+for. The pure-Java parts run on macOS and Windows too, but a few platform
+quirks aren't tested on every commit — see [Known issues](#known-issues).
 
 ## Quick start
 
@@ -145,40 +146,71 @@ NetCopy splits cleanly into a **control plane** and a **data plane**.
 |     |          |                            |  data  |     |          |                            |
 |  HttpPuller  TcpPuller  (port 7778 server)  |<------>|  HttpPuller  TcpPuller  (port 7778 server)  |
 |     |          |                            |        |     |          |                            |
-|  SidecarStore (data.partial + chunks.bitmap + meta.json)                                            |
+|  SidecarStore (data.partial + chunks.bitmap + chunks.hashes + meta.json)                            |
 |  JsonJobStore (<state-dir>/jobs/<id>.json)                                                          |
 +---------------------------------------------+         +---------------------------------------------+
 ```
 
 **Control plane (HTTP + WebSocket via Javalin, port 7777):**
 
-- `GET /api/health` — liveness probe (no auth).
-- `GET /api/browse` — list a directory under one of the peer's `--shared-root`s.
-- `POST /api/manifest` — ask the peer to plan a transfer; returns a
-  `manifestId` plus a flat list of files, sizes, mtimes, and chunk plans.
-- `POST /api/transfers` — start a job locally that will pull a manifest from
-  a remote peer. Persists a `JobState` in `<state-dir>/jobs/<id>.json`.
-- `GET /api/transfers/{id}` — poll job state.
-- `WS /ws/progress` — live `ProgressEvent`s (subscribed per `transferId`).
+| Endpoint | Auth | Purpose |
+|---|---|---|
+| `GET /api/health` | no | Liveness probe (open). |
+| `GET /api/peer/info` | yes | Peer self-description: hostname, version, TCP blob port, root counts. |
+| `GET /api/browse?root=&path=` | yes | List a directory under a `--shared-root`. |
+| `GET /api/browse-local?root=&path=` | yes | Same shape, rooted under a `--receive-root` (UI uses it for the target panel). |
+| `POST /api/browse/stats` | yes | Recursive file count + byte total per path; powers the selection-stats footer. |
+| `POST /api/manifest` | yes | Plan a transfer. Returns the full manifest (entries, sizes, mtimes, chunk plans, `manifestId`). |
+| `POST /api/manifest/register` | yes | Re-register a previously-issued manifest (used by the puller after a source-side restart). |
+| `GET /api/blob/{manifestId}/{fileId}` | yes | HTTP data-plane: file bytes (with `Range` support, `X-Chunk-Hash` response header). |
+| `GET /api/hash/{manifestId}/{fileId}` | yes | Lazy XXH3-128 of a manifest entry; returns `202` while computing. |
+| `POST /api/transfers` | yes | Start a job (target host pulls from a remote source). |
+| `GET /api/transfers` | yes | List status snapshots (newest first). |
+| `GET /api/transfers/{id}` | yes | Single status snapshot, including per-file table and per-chunk metrics. |
+| `POST /api/transfers/{id}/{pause,resume,cancel}` | yes | Lifecycle controls. |
+| `DELETE /api/transfers/{id}` | yes | Dismiss a terminal-state job from the persistent store. |
+| `POST /api/relay/push` | yes | "Push from here to peer" — proxies `POST /api/transfers` to the peer using its token. |
+| `GET /api/metrics` | yes | Host metrics (CPU/RAM/disk/GC, top threads) + per-server serve metrics. |
+| `WS /ws/progress` | yes | Live `ProgressEvent` stream (subscribe per transfer or wildcard). |
 
 **Data plane (two interchangeable protocols):**
 
 - `GET /api/blob/{manifestId}/{fileId}` with HTTP `Range` headers, served by
   Javalin via `FileChannel.transferTo`.
 - A custom binary TCP protocol on port 7778: framed `[len:u32][type:u8][payload]`
-  with `HELLO`/`REQUEST`/`DATA_HEAD`/`DATA`/`DATA_END`/`ERR`/`BYE`. Designed to
-  reuse one connection across many `pullChunk` calls and avoid HTTP parsing
-  overhead at the price of a more interesting wire format.
+  with `HELLO` / `REQUEST` / `DATA_HEAD` / `DATA` / `DATA_END` / `DATA_END_V2`
+  (xxh3 trailer, single-pass; v0.3.0+) / `ERR` / `BYE`. Designed to reuse one
+  connection across many `pullChunk` calls and avoid HTTP parsing overhead at
+  the price of a more interesting wire format.
 
 The protocol is selected per job at start time. See
-[docs/protocol-comparison.md](docs/protocol-comparison.md) for benchmarks.
+[docs/protocol-comparison.md](docs/protocol-comparison.md) for design notes.
 
 **State and resume:**
 
 - Each in-progress target file owns a sidecar directory `<file>.netcopy/`
-  containing `data.partial` (sparse, written at offsets), `meta.json`
-  (size, mtime, chunk plan), and `chunks.bitmap` (1 bit per chunk, set after
-  the chunk is downloaded **and** its xxh3-128 hash verified).
+  containing four files:
+  - `data.partial` — sparse, pre-allocated to the final size, written via
+    positional FileChannel writes;
+  - `meta.json` — immutable per-file descriptor (relPath, size, sourceMtime,
+    chunk plan, `schemaVersion`);
+  - `chunks.bitmap` — one bit per chunk, set after the chunk's bytes are
+    fsynced **and** its **xxh3-128** chunk-level hash matches what the source
+    advertised on the wire;
+  - `chunks.hashes` — fixed-size array of XXH3-128 digests (16 bytes per
+    chunk), positionally written as each chunk completes. Used by the
+    selective re-verify path on full-file hash mismatch so resume re-pulls
+    only the corrupted chunks instead of the whole file.
+- Hashing has two layers:
+  - **Per-chunk** verification (and the on-the-wire `X-Chunk-Hash` /
+    `DATA_END_V2`) is **XXH3-128** — fast, ~10 GB/s on x86, allocates a small
+    per-chunk buffer.
+  - **Full-file finalize** is **SHA-256** in 256 KiB strides. Streaming
+    XXH3-128 in this codebase buffers all bytes into a `ByteArrayOutputStream`
+    that overflows the array-size limit on multi-GiB files — SHA-256 streams
+    cleanly via `MessageDigest.update`. The resulting digest lives in the
+    JSON's `hashHex` field for v0.x wire-format stability (the field name
+    will change in a future major bump).
 - After all chunks are verified, `FileFinalizer` rehashes the whole file and
   atomic-renames `data.partial` to the final target path.
 - A job's overall state lives in `<state-dir>/jobs/<id>.json` (one JSON per
diff --git a/docs/protocol-comparison.md b/docs/protocol-comparison.md
index 8e645a1..943b2de 100644
--- a/docs/protocol-comparison.md
+++ b/docs/protocol-comparison.md
@@ -1,82 +1,88 @@
-# HTTP vs TCP — protocol comparison
-
-NetCopy ships two interchangeable data-plane protocols. This document is
-the home of the quantitative comparison between them. The numbers below
-are produced by task **V5 — protocol comparison** and are placeholders
-until that pass runs.
-
-## What we are measuring
-
-The same workload runs back-to-back over both protocols, on the same two
-hosts, with the same chunk plan. Each row in the table below should report
-median and p95 of three runs.
-
-- **Throughput**: useful payload bytes per wall-clock second, averaged over
-  the whole transfer.
-- **Time to first byte (TTFB)**: from `POST /api/transfers` accepting to the
-  first `ChunkCompleted` ProgressEvent.
-- **CPU time**: server-side and client-side `getrusage` deltas, normalised
-  per GB transferred.
-- **Connection count**: peak concurrent sockets the data plane opened.
-- **Behaviour under loss**: same transfer with `tc qdisc add ... netem
-  loss 1%` applied to the receive interface — does the protocol recover
-  cleanly, and what is the throughput delta.
-
-## Test workloads
+# HTTP vs TCP — protocol design notes
+
+NetCopy ships two interchangeable data-plane protocols. The user picks one
+per transfer. This document explains the trade-offs and points to a manual
+reproduction for benchmark numbers.
+
+## What's different
+
+Both protocols carry the same byte payload (file contents, in chunks, with
+XXH3-128 chunk-level verification). They differ in framing and how the
+hash gets onto the wire:
+
+- **HTTP** — `GET /api/blob/{manifestId}/{fileId}` with a `Range:
+  bytes=START-END` header per chunk. Connection reuse via keep-alive. Server
+  pre-computes the chunk's XXH3-128, sets it as `X-Chunk-Hash` response
+  header, then streams the body via `FileChannel.transferTo` (which on Linux
+  decays to `sendfile(2)`). Pro: trivial to debug with `curl`, plays well with
+  any HTTP-aware proxy. Con: HTTP parsing overhead per chunk, and HTTP/1.1
+  connection-per-concurrent-chunk.
+- **TCP** — one long-lived connection per peer, multiplexed by `reqId`.
+  Custom binary framing (see `tasks/contracts/data-formats.md`). Versioned
+  protocol: v1 is two-pass (hash → DataHead → stream → DataEnd, identical to
+  the HTTP path conceptually); **v2 (default since v0.3.0)** streams and
+  hashes in a single pass, putting the digest in a trailing `DataEndV2`
+  frame. Pro: fewer TCP connections (one per peer), no HTTP overhead, single
+  read pass on the source-side disk. Con: needs its own port (`--tcp-port`),
+  not curl-debuggable.
+
+## Where the difference matters
+
+- **Many small files (≤ 1 MB each).** TCP wins clearly. HTTP pays a full
+  request line + headers per chunk; with thousands of files this dominates.
+- **One big file (multi-GB) on a fast disk.** Mostly identical. Both
+  protocols are CPU-bound on the hash and IO-bound on the disk; framing
+  overhead is in the noise.
+- **One big file on a cold-cache HDD.** TCP v2 is meaningfully faster
+  because it does one disk read per chunk on the source instead of two.
+  v1's two-pass design was tractable on SSDs (the second pass came from the
+  page cache) but on HDD the source ended up reading the file twice with
+  cold seeks. v0.3.0 fixed that.
+- **Lossy network.** Both rely on the kernel's TCP retransmit; the
+  application layers don't differ. NetCopy retries failed chunks with
+  exponential backoff identically.
+
+In practice the user-visible bottleneck on a LAN is almost always **the
+slower of the two disks** (source HDD seek + receiver fsync), not the
+protocol. We've measured ~50–60 MB/s sustained from a single HDD source
+with 8 parallel chunks regardless of which protocol we pick.
+
+## Reproducing a comparison by hand
+
+1. Start two daemons with identical flags except `--port`, `--tcp-port`, and
+   roots. Pin the JVM with `-XX:ActiveProcessorCount=N` if you want to
+   compare across CPU budgets.
+2. Pre-generate the workload under one daemon's `--shared-root`.
+3. From the UI on the other daemon, plan a transfer, then start it twice in
+   a row — once with `protocol: "http"`, once with `"tcp"`. Record the
+   `TransferCompleted` event's `totalDurationMs` and `avgThroughputBps`,
+   and screenshot the Performance modal's "This transfer (chunks)" tile for
+   per-chunk timings.
+4. For loss runs:
+
+   ```bash
+   sudo tc qdisc add dev <iface> root netem loss 1%
+   # ... run the transfer ...
+   sudo tc qdisc del dev <iface> root
+   ```
+
+5. Repeat with the TCP server disabled (`--tcp-port 0`) on the source side
+   to confirm the HTTP fallback works.
+
+We deliberately don't ship a canned benchmark table here: numbers from a
+single hardware setup mislead readers comparing to their own. The
+Performance modal already exposes the per-chunk timings (source latency,
+wire time, persist time, pool acquire wait) you need to identify your own
+bottleneck.
+
+## Suggested workloads
 
 | ID | Description |
 |---|---|
-| W1 | One 32 GB file (large-chunk path) |
-| W2 | 1000 small files of ~64 KB each (small-chunk path, file-parallelism dominates) |
+| W1 | One 32 GB file (large-chunk path; tests sustained throughput) |
+| W2 | 1000 small files of ~64 KB each (request count dominates) |
 | W3 | Mixed: 4 GB ISO + 50 MB of small docs (typical real-world mix) |
-| W4 | W1 again, but with `--file-parallelism=1 --chunks-per-file=1` (single-stream baseline) |
-
-Each workload runs once over HTTP (`--tcp-port 0` on the server side) and
-once over TCP (`protocol: "tcp"` in the transfer request).
-
-## Results — placeholder
+| W4 | W1 again with `--file-parallelism=1 --chunks-per-file=1` (single-stream baseline) |
 
-Filled in by V5.
-
-| Workload | Protocol | Throughput (MB/s) | TTFB (ms) | Server CPU (s/GB) | Peak conns | Loss 1% throughput |
-|---|---|---|---|---|---|---|
-| W1 | HTTP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
-| W1 | TCP  | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
-| W2 | HTTP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
-| W2 | TCP  | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
-| W3 | HTTP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
-| W3 | TCP  | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
-| W4 | HTTP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
-| W4 | TCP  | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
-
-## Provisional reasoning
-
-Until V5 produces real numbers, the design intuition is:
-
-- **W1 (one big file)**: the two protocols should be within a few percent.
-  Both are dominated by `FileChannel.transferTo` on the server and direct
-  `pwrite` on the client; the framing overhead is amortised across multi-MB
-  chunks.
-- **W2 (many small files)**: TCP should win materially. HTTP pays a full
-  request/response round-trip per chunk, plus header parsing; TCP reuses
-  one connection and sends only an 8-byte `REQUEST` frame per chunk.
-- **W3 (mixed)**: closer to W1 by byte count; closer to W2 by request count.
-  Expect TCP to be modestly ahead.
-- **W4 (single stream)**: both protocols saturate one TCP flow; the
-  bottleneck is the kernel and the NIC, not the framing.
-
-## Reproducing the benchmark
-
-V5 will publish a script under `verify/V5/` that drives both daemons in
-the same JVM (or two JVMs on the same host) using a tmpfs receive root
-to factor out disk speed. Until then, reproduce by hand:
-
-1. Start two daemons with identical flags except `--port`, `--tcp-port`,
-   and roots. Pin the JVM with `-XX:ActiveProcessorCount=N` if you want
-   to compare across CPU budgets.
-2. Pre-generate the workload under one daemon's `--shared-root`.
-3. From the UI on the other daemon, plan a transfer, then start it twice
-   in a row — once with protocol HTTP, once with TCP. Record the
-   `TransferCompleted` event's `totalDurationMs` and `avgThroughputBps`.
-4. For loss runs: `sudo tc qdisc add dev <iface> root netem loss 1%`.
-   Don't forget to `tc qdisc del` afterwards.
+W2 is the workload where TCP shows its largest advantage; W4 is where the
+two protocols converge.
diff --git a/tasks/contracts/data-formats.md b/tasks/contracts/data-formats.md
index f310247..9cb0bc4 100644
--- a/tasks/contracts/data-formats.md
+++ b/tasks/contracts/data-formats.md
@@ -2,6 +2,13 @@
 
 Все REST-payload-ы и persisted JSON-файлы используют эти структуры. Изменения требуют согласования с тимлидом.
 
+> **Note for v0.4.0+:** persisted state files (`<state-dir>/jobs/*.json` и
+> `<file>.netcopy/meta.json`) carry a `schemaVersion` field. Readers MUST refuse
+> any file with `schemaVersion > CURRENT_SCHEMA_VERSION` to avoid
+> mis-interpreting a future format. Pre-v0.4.0 files have no field — Jackson
+> defaults to 0, treated as `1` on read. Bumping the constant signals an
+> intentional break.
+
 ---
 
 ## REST — `POST /api/manifest`
@@ -10,15 +17,22 @@ Request:
 ```json
 {
   "rootIdx": 0,
+  "baseSubdir": "Media",
   "paths": ["movies/big.mkv", "docs/"]
 }
 ```
 
+`baseSubdir` (optional, since v0.2.5) is the user's source-side breadcrumb. Entry
+`relPath` values become relative to it, so the receiver lays out
+`movies/big.mkv` under its target root **without** the `Media/` prefix that the
+user navigated through.
+
 Response (`Manifest`):
 ```json
 {
   "manifestId": "9f6a2c8e-...",
   "rootIdx": 0,
+  "baseSubdir": "Media",
   "totalBytes": 5500000000,
   "totalFiles": 3,
   "entries": [
@@ -43,6 +57,19 @@ Response (`Manifest`):
 
 ---
 
+## REST — `POST /api/manifest/register`
+
+Body: a complete `Manifest` (the JSON the source returned from `POST /api/manifest`).
+Response: `204 No Content`.
+
+Used when the puller resumes a transfer whose source restarted — the in-memory
+registry on the source is gone and would otherwise reject every blob request as
+"manifest not found". Re-registration is gated server-side: every FILE entry
+must, with `NOFOLLOW_LINKS`, resolve to a regular file whose real path is still
+inside the shared root. Symlink-traversed entries are rejected with `400`.
+
+---
+
 ## REST — `GET /api/browse?root=<idx>&path=<rel>`
 
 Response:
@@ -54,10 +81,39 @@ Response:
   "entries": [
     { "name": "big.mkv", "type": "file", "size": 5368709120, "mtime": 1714000000 },
     { "name": "subs", "type": "dir" }
+  ],
+  "rootFree": { "usable": 723000000000, "total": 1000000000000 }
+}
+```
+
+`rootFree` (v0.3.1+) reflects `Files.getFileStore(root)`. May be absent on
+filesystems where the JVM cannot answer.
+
+`GET /api/browse-local` has the same shape but takes a `--receive-root` index.
+
+---
+
+## REST — `POST /api/browse/stats`
+
+Request:
+```json
+{ "rootIdx": 0, "kind": "shared", "baseSubdir": "Media", "paths": ["movies/", "docs/"] }
+```
+
+Response:
+```json
+{
+  "totalFiles": 1234,
+  "totalBytes": 56789012345,
+  "perPath": [
+    { "path": "movies/", "files": 1200, "bytes": 56000000000 },
+    { "path": "docs/",   "files": 34,   "bytes": 789012345 }
   ]
 }
 ```
 
+Walks each path with `NOFOLLOW_LINKS` and a depth cap (`maxDepth=32`, v0.4.0+).
+
 ---
 
 ## REST — `POST /api/transfers`
@@ -66,17 +122,25 @@ Request:
 ```json
 {
   "sourcePeerUrl": "http://192.168.1.10:7777",
-  "sourcePeerToken": "github_pat_xxx_or_netcopy_token",
+  "sourcePeerToken": "...",
   "sourcePeerTcpPort": 7778,
-  "manifestId": "9f6a2c8e-...",
+  "manifest": { "manifestId": "9f6a2c8e-...", "...": "as returned by POST /api/manifest" },
   "protocol": "tcp",
   "targetRootIdx": 0,
-  "targetSubdir": "from-deb1/2026-04-29",
+  "targetSubdir": "from-host-a/2026-04-29",
   "conflictPolicy": "skip",
+  "acknowledgeOverwrite": false,
   "parallelism": { "files": 4, "chunksPerFile": 8 }
 }
 ```
 
+- `manifest` MUST be the full manifest object (the puller has no copy locally;
+  passing only `manifestId` is rejected with `400`).
+- `conflictPolicy ∈ {"skip", "rename", "overwrite"}`.
+- `acknowledgeOverwrite` (v0.4.0+) MUST be `true` when `conflictPolicy = "overwrite"`,
+  otherwise the request is rejected with `400`. The UI sends what its checkbox
+  state is; a scripted client must echo the same explicit acknowledgement.
+
 Response:
 ```json
 { "transferId": "uuid", "state": "running" }
@@ -103,13 +167,25 @@ Response:
   "totalDurationMs": null,
   "files": [
     { "id": "f1", "relPath": "movies/big.mkv", "size": 5368709120,
-      "bytesDone": 1234567, "state": "downloading", "hash": null }
-  ]
+      "bytesDone": 1234567, "state": "downloading", "hashHex": null }
+  ],
+  "metrics": {
+    "sourceLatency": { "p50Ms": 1.4, "p95Ms": 5.2, "maxMs": 18.0 },
+    "wireTime":      { "p50Ms": 16.7, "p95Ms": 42.0, "maxMs": 110.0 },
+    "persistTime":   { "p50Ms": 0.6, "p95Ms": 2.0, "maxMs": 8.0 },
+    "poolAcquireWait": { "p50Ms": 0.0, "p95Ms": 0.1, "maxMs": 1.2 },
+    "succeeded": 1234, "retried": 2, "failed": 0, "inFlight": 8
+  }
 }
 ```
 
-`state ∈ {"planned","running","paused","completed","failed","cancelled"}`
-`files[].state ∈ {"pending","downloading","verifying","done","failed","skipped"}`
+- `state ∈ {"planned", "running", "paused", "completed", "failed", "cancelled"}` — **lowercase wire format**.
+- `files[].state ∈ {"pending", "downloading", "verifying", "done", "failed", "skipped"}`.
+- `files[].hashHex` — final-file digest as 64-char SHA-256 hex (the field name retains
+  its historical "hash" identifier for v0.x wire compatibility; the actual algorithm
+  is documented in `FileFinalizer.java`). Per-chunk hashes on the wire are XXH3-128.
+- `metrics` is `null` when the puller for this job is not currently in memory
+  (terminal jobs, reload after restart).
 
 ---
 
@@ -118,26 +194,36 @@ Response:
 ```json
 {
   "id": "uuid",
-  "direction": "INBOUND",
+  "direction": "inbound",
   "peerUrl": "http://192.168.1.10:7777",
   "peerTcpPort": 7778,
-  "protocol": "TCP",
-  "manifestJson": "{...full manifest as a string...}",
+  "protocol": "tcp",
+  "manifestJson": "{\"manifestId\":\"...\",\"...\":\"...\"}",
   "targetRootIdx": 0,
-  "targetSubdir": "from-deb1/2026-04-29",
-  "conflictPolicy": "SKIP",
+  "targetSubdir": "from-host-a/2026-04-29",
+  "conflictPolicy": "skip",
   "fileParallelism": 4,
   "chunksPerFile": 8,
-  "state": "RUNNING",
+  "state": "running",
   "createdAt": 1714000000000,
   "updatedAt": 1714000123000,
   "completedAt": null,
   "totalDurationMs": null,
-  "avgThroughputBps": null
+  "avgThroughputBps": null,
+  "schemaVersion": 1
 }
 ```
 
-Atomic write: `<id>.json.tmp` → fsync → `rename(<id>.json.tmp, <id>.json)`.
+All enum-valued fields (`direction`, `protocol`, `conflictPolicy`, `state`) are
+serialised **lowercase** since v0.3.0. Pre-v0.3.0 files used UPPERCASE; the
+`@JsonCreator` accepts either casing on read for backward compat.
+
+`schemaVersion` is `1` for new writes (v0.4.0+). Pre-v0.4.0 files have no
+field — readers default to 0 and treat as schema 1. Future bumps signal an
+intentional break.
+
+Atomic write: `<id>.json.tmp` → `force(true)` → `Files.move(..., ATOMIC_MOVE,
+REPLACE_EXISTING)`. POSIX perms (v0.4.0+): file `0600`, dir `0700`.
 
 ---
 
@@ -153,17 +239,54 @@ Atomic write: `<id>.json.tmp` → fsync → `rename(<id>.json.tmp, <id>.json)`.
     { "idx": 0, "offset": 0, "len": 33554432 },
     { "idx": 1, "offset": 33554432, "len": 33554432 }
   ],
-  "expectedHashHex": null
+  "expectedHashHex": null,
+  "schemaVersion": 1
 }
 ```
 
+`expectedHashHex` is hex-encoded XXH3-128 if the source pre-computed it at
+planning time, else `null`. `schemaVersion` semantics same as for `JobState`.
+
+Written exactly once on sidecar creation with `CREATE_NEW + force(true)`.
+
 ---
 
 ## Sidecar — `<file>.netcopy/chunks.bitmap`
 
-Бинарный формат. `chunkCount` бит, padded до байта. Bit i = 1 если chunk[i] завершён и hash проверен.
+Binary. `chunkCount` бит, padded до байта. Bit i = 1 если chunk[i] завершён и
+его XXH3-128 hash проверен.
 
-Атомарная запись: `chunks.bitmap.tmp` → fsync → `rename(chunks.bitmap.tmp, chunks.bitmap)`. Можно делать batch update раз в 100 мс или после fsync chunk-байтов в `data.partial`, что наступит раньше.
+Updated in place via positional `FileChannel.write(buf, 0)`. The bitmap is
+idempotent — a torn write at most loses bits, causing those chunks to be
+re-pulled on resume. A successful flush of the bitmap is always preceded by an
+fsync of the corresponding `data.partial` byte range and `chunks.hashes` slot.
+On v0.3.1+ the partial / hashes / bitmap fsync triple is batched (every 8
+chunks or 500 ms).
+
+---
+
+## Sidecar — `<file>.netcopy/chunks.hashes`
+
+Binary. Fixed length = `chunkCount × 16` bytes. Each chunk's XXH3-128 digest
+lives at byte offset `idx * 16`. Zero-filled at create time; positionally
+overwritten as each chunk completes (before the bitmap bit flips).
+
+Read only by the post-pull selective-re-verify path: on full-file SHA-256
+mismatch, the puller hashes each chunk's bytes from the partial and compares
+against this file. Only the bits of chunks whose stored hash diverges are
+cleared, so resume re-pulls only the corrupted chunks.
+
+Files written by NetCopy ≤ v0.2.4 don't have this companion; the verify-fail
+path falls back to wiping the whole bitmap.
+
+---
+
+## Sidecar — `<file>.netcopy/data.partial`
+
+Sparse, pre-allocated to the final size via `RandomAccessFile.setLength`.
+Written via positional `FileChannel.write(buf, offset)` from N parallel chunk
+workers. After all chunks are verified and the full-file SHA-256 check passes,
+`FileFinalizer` atomically renames this file onto its final target.
 
 ---
 
@@ -171,16 +294,27 @@ Atomic write: `<id>.json.tmp` → fsync → `rename(<id>.json.tmp, <id>.json)`.
 
 ```json
 { "type": "Subscribe", "transferId": "uuid" }
+{ "type": "Subscribe" }
 { "type": "Unsubscribe", "transferId": "uuid" }
 { "type": "UnsubscribeAll" }
 ```
 
+`Subscribe` без `transferId` — wildcard: клиент получает события всех transfer-ов
+на этом сервере. Cap (v0.4.0+): 256 различных подписок на сессию.
+
 ## WebSocket — сервер → клиент (`ProgressEvent`)
 
-Дискриминатор — `type`. Все события включают `transferId` и `timestamp`.
+Дискриминатор — `type`. Все события включают `transferId` (или `null` для
+"глобальных") и `timestamp`.
 
 ```json
-{ "type": "ManifestReady", "transferId": "...", "timestamp": 1714000001234,
+{ "type": "TransferRegistered", "transferId": "...", "timestamp": ...,
+  "state": "running", "protocol": "tcp", "peerUrl": "...",
+  "totalBytes": 5500000000, "totalFiles": 3 }
+
+{ "type": "TransferDismissed", "transferId": "...", "timestamp": ... }
+
+{ "type": "ManifestReady", "transferId": "...", "timestamp": ...,
   "manifestId": "...", "totalBytes": 5500000000, "totalFiles": 3 }
 
 { "type": "TransferStarted", "transferId": "...", "timestamp": ..., "protocol": "tcp" }
@@ -189,7 +323,9 @@ Atomic write: `<id>.json.tmp` → fsync → `rename(<id>.json.tmp, <id>.json)`.
   "bytesDone": 1234567, "totalBytes": 5500000000,
   "filesDone": 1, "totalFiles": 3,
   "currentThroughputBps": 102400000, "avgThroughputBps": 95000000,
-  "currentFiles": [ { "fileId": "f7", "relPath": "movies/big.mkv", "bytesDone": 1234, "size": 5368709120 } ] }
+  "currentFiles": [ { "id": "f7", "relPath": "movies/big.mkv",
+                      "bytesDone": 1234, "size": 5368709120,
+                      "state": "downloading", "verifyBytesDone": 0 } ] }
 
 { "type": "ChunkCompleted", "transferId": "...", "timestamp": ...,
   "fileId": "f7", "chunkIdx": 5, "bytes": 33554432 }
@@ -209,6 +345,13 @@ Atomic write: `<id>.json.tmp` → fsync → `rename(<id>.json.tmp, <id>.json)`.
   "reason": "source_changed", "detail": "movies/big.mkv mtime mismatch" }
 ```
 
+`TransferProgress.currentFiles[].verifyBytesDone` (v0.3.1+) climbs from 0 → size
+while the file is in the `verifying` state. Used by the UI to show a sub-progress
+bar while the post-pull SHA-256 streams.
+
+`TransferProgress` events are throttled: at most one per (session, transferId)
+every 200 ms.
+
 ---
 
 ## TCP wire — Frame layout
@@ -225,11 +368,12 @@ Atomic write: `<id>.json.tmp` → fsync → `rename(<id>.json.tmp, <id>.json)`.
 | 0x06 | DATA_END | `reqId:u32` |
 | 0x07 | ERR | `reqId:u32` `code:u16` `messageLen:u16` `message:bytes(messageLen)` |
 | 0x08 | BYE | (no payload) |
+| 0x09 | DATA_END_V2 | `reqId:u32` `xxh3:bytes(16)` |
 
 Все строки — UTF-8. Все длины — Big-Endian. UUID — 16 байт raw (8 байт MSB BE + 8 байт LSB BE).
 
 Max payload size:
-- HELLO/HELLO_OK/REQUEST/DATA_HEAD/DATA_END/ERR/BYE: 64 KB
+- HELLO/HELLO_OK/REQUEST/DATA_HEAD/DATA_END/DATA_END_V2/ERR/BYE: 64 KB
 - DATA: до 1 MB (рекомендуется 256 KB чанки внутри одного REQUEST)
 
 Если frame превышает лимит — closing connection с ERR_BAD_REQUEST.
@@ -241,4 +385,23 @@ Codes для ERR:
 - 1020 ERR_BAD_REQUEST (out-of-range offset, etc.)
 - 1500 ERR_INTERNAL
 
-Protocol version: текущая = 1.
+### Protocol versions
+
+- `protoVer = 1` — original two-pass design. Source reads the requested byte
+  range twice: once to compute XXH3-128 (which is then placed in `DATA_HEAD`
+  before the body), then again to ship it as a sequence of `DATA` frames
+  followed by a single `DATA_END`. Receiver verifies the XXH3 from
+  `DATA_HEAD`.
+- `protoVer = 2` (server default since v0.3.0) — single-pass. Source reads the
+  range once, hashing inline. `DATA_HEAD` carries an all-zero sentinel
+  `xxh3` (receiver ignores it). The real XXH3 trailer arrives in
+  `DATA_END_V2` after the last `DATA` frame. Saves one full disk read on the
+  source — meaningful on cold-cache HDDs.
+
+The negotiated version is `min(client.protoVer, server.protoVer)` and
+returned in `HELLO_OK`. Older v1-only clients keep working transparently;
+v2 clients downgrade gracefully against v1 servers.
+
+Authentication: a connection that does not deliver a valid `HELLO` within 30
+seconds is closed by the server (v0.4.0+ HELLO timeout watchdog). The server
+also caps concurrent connections at 1024.