Skip to content

config: explicit routing.mode enum replaces implicit full_cdc derivation#38

Merged
jghoman merged 2 commits into
mainfrom
jakob/viaduck-explicit-routing-mode
Jun 19, 2026
Merged

config: explicit routing.mode enum replaces implicit full_cdc derivation#38
jghoman merged 2 commits into
mainfrom
jakob/viaduck-explicit-routing-mode

Conversation

@jghoman

@jghoman jghoman commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Replaces an implicit pipeline-shape derivation (full_cdc = len(key_columns) > 0) with a required routing.mode enum: full_cdc | append_only. Validates the matrix in RoutingConfig.__post_init__ so misconfiguration fails at startup with an operator-actionable error instead of silently selecting the wrong shape.

Simultaneously rips out the append_at_least_once flag and its supporting machinery (PR #36) — that fast path lived inside apply_full_cdc and could never fire on posthog's production pipeline because posthog has always been on append_only (where key_columns: [] routes straight to apply.append_only, never touching apply_full_cdc).

Validation matrix

mode key_columns Result
full_cdc non-empty OK
full_cdc empty ConfigError("requires a non-empty key_columns list")
append_only empty OK
append_only non-empty ConfigError("forbids key_columns; the append path does not use them")
unset / unknown / wrong type * ConfigError("must be one of [...], got <value>")
mode: yes (YAML 1.1 bool coercion) * ConfigError("must be a string ... quote the value if YAML coerced it")
mode: (null value) * ConfigError("is required, no default") (coerced via or "" so it doesn't hit the type-error message)

DeliveryManager.__init__ takes mode as a keyword-only kwarg with its own ConfigError guard (belt-and-suspenders for callers bypassing config validation). Positional callers fail loudly at the call site instead of running an unrelated string past the enum check.

main.py threads mode (not the derived full_cdc bool) through _poll_cycle. The status payload reads mode directly instead of reconstructing it via "full_cdc" if full_cdc else "append_only" — the last implicit-derivation thread of the change.

What got ripped out

  • DestinationConfig.append_at_least_once field
  • _is_pure_insert_batch helper + per-batch fast-path branch in _apply_changes
  • used_append return field on the counts dict
  • viaduck_dest_apply_mode gauge + viaduck_dest_apply_fast_path_batches_total counter (verified no Grafana/vmalert references via grep across grafana-dashboards/, charts/, posthog-cloud-infra/)
  • Per-destination dict on DeliveryManager + startup log line
  • _validate_bool config helper (no remaining caller)
  • Integration test file test_append_at_least_once_integration.py (was the only one exercising the deleted path against real pyducklake)

Test plan

  • just lint clean
  • just fmt-check clean
  • just test398 unit tests passing (+3 vs main: yaml-bool-coercion, empty-mode-distinct, null-mode-routes-to-required)
  • DOCKER_HOST=unix:///Users/jakob/.orbstack/run/docker.sock just test-integration70 integration tests passing
  • HOLD MERGE until the chart PR (PostHog/charts#NNNN) lands. The configmap needs to carry routing.mode: append_only before pods restart on this image, otherwise startup fails with the "is required, no default" error.
  • Post-deploy: confirm Viaduck started: ... mode='append_only', ... shows up in pod logs and viaduck_dest_lag_snapshots continues to drain.

Review history

Three rounds of Lead-QE + Principal-SWE parallel review.

Round 1 addressed: append_at_least_once removal scope, DeliveryManager signature shape, AGENT.md rewrite.

Round 2 caught 3 integration-test blockers I'd silently broken because I didn't run just test-integration (9 stale DeliveryManager(...) call sites + 1 used_append leftover assert + 1 missing mode= in fixture), plus majors: positional mode was a footgun (made keyword-only), README still stale (rewrote "Two Modes"), _make_real_delivery re-introduced the derivation (now keyword-only required), ValueError vs ConfigError inconsistency (unified on ConfigError).

Round 3 caught one real loader bug: YAML mode: (null value) routed to the type-error "quote the value" message instead of the empty-string "is required" message. Also: stale cdc_replay row in README's seed_mode table, an unjustified in-method ConfigError import in delivery.py (no actual cycle — moved to module top), a docstring referencing a deleted symbol, and the last full_cdc re-derivation in _poll_cycle/status. All addressed.

Migration

Hard-break config change — pods restart and CrashLoop until the configmap carries routing.mode. The chart-side update is the paired PR; chart PR must merge first.

jghoman added 2 commits June 18, 2026 18:39
viaduck used to flip between two completely different pipeline shapes
based on len(key_columns) > 0. An empty/typo'd key_columns silently
switched a destination from MERGE-upsert to plain-append; an
accidentally non-empty list did the inverse. The hazard was the
implicitness, not the modes themselves.

Replace with a required routing.mode enum: full_cdc | append_only.
RoutingConfig.__post_init__ validates the matrix at config load:

  full_cdc    + non-empty key_columns  → OK
  full_cdc    + empty key_columns      → ConfigError (need join keys)
  append_only + empty key_columns      → OK
  append_only + non-empty key_columns  → ConfigError (apply path
                                                      doesn't use them)
  unset / unknown / wrong type / null  → ConfigError with operator-
                                         actionable hint

The validator orders the type-mismatch check (catches YAML 1.1 bool
coercion `mode: yes` → Python True) ahead of the empty-string check
(catches `mode: ''` and `mode:` distinct from the type-error path)
ahead of the enum check. The loader coerces `mode:` (null) → "" so
the most common operator omission shape hits the "is required, no
default" branch rather than the "quote the value" hint.

DeliveryManager.__init__ takes mode as a keyword-only kwarg with its
own ConfigError guard (belt-and-suspenders for callers bypassing
config). Positional callers fail loudly at the call site instead of
running an unrelated string past the enum check.

main.py threads mode (not the derived full_cdc bool) through
_poll_cycle; the status payload reads cfg.routing.mode directly
instead of reconstructing it. This was the last implicit derivation
the original PR carried.

Also rips append_at_least_once and supporting machinery
-------------------------------------------------------
The append_at_least_once flag (PR #36) added a per-batch fast path
inside apply_full_cdc that called tbl.append() when the batch was
100% inserts. While shipping #36 we discovered posthog had always
been on the OTHER branch (`append_only`, where key_columns: [] in
charts/argocd/viaduck/values/common.yaml routed through delivery.py
straight to apply.append_only) — the fast path could never have
fired in production because apply_full_cdc was never called.

Removed:
  - DestinationConfig.append_at_least_once field
  - _is_pure_insert_batch helper + per-batch fast-path branch
  - used_append return field on counts dict
  - viaduck_dest_apply_mode gauge (verified no Grafana/vmalert
    references via grep across grafana-dashboards/, charts/,
    posthog-cloud-infra/)
  - viaduck_dest_apply_fast_path_batches_total counter (same)
  - per-destination dict on DeliveryManager + startup log line
  - _validate_bool helper (no remaining caller)
  - integration test file (was the only one exercising the deleted
    path against real pyducklake)

If a future destination genuinely needs full_cdc-with-append-fast-
path semantics, the cleaner re-introduction is on the source-read
side (e.g. ducklake_table_insertions for known-insert-only sources
even when key_columns is set for dedup purposes) rather than as a
post-Phase-2 apply optimization.

Tests
-----
398 unit + 70 integration tests passing locally. Lint + fmt clean.
Three review rounds (Lead-QE + Principal-SWE in parallel) before
merge; round 2 found 3 integration-test blockers that round 1 had
silently broken because I didn't run `just test-integration` —
fixed and re-verified. Round 3 caught one real loader bug
(YAML `mode:` null value misrouted to type-error message), one stale
seed_mode entry in README, and three smaller cleanups; all addressed.

Migration
---------
This is a hard-break config change. RoutingConfig.mode is required
with no default — pods restart and CrashLoop until the configmap
carries `routing.mode: append_only` (or full_cdc). The chart-side
update is a separate PR; that must merge first so ArgoCD has the
new configmap before pods restart on this image.
mode=append_only routes through source.read_cdc which calls
ducklake_table_insertions — that macro structurally returns only
inserts (no change_type column synthesized). If a future viaduck
change accidentally routes a ducklake_table_changes batch (which
DOES carry change_type) into append_only(), the destination would
silently land deletes and update_postimages as plain inserts.

Cheap O(1) check: refuse the batch if `change_type` is in the
column names. Catches both classes of regression — a viaduck
dispatch bug (wrong source-read function called for the mode) AND
a hypothetical DuckLake contract change where table_insertions
starts synthesizing change_type.

The empty-batch fast-path runs before the boundary check, so an
empty batch with change_type (degenerate but reachable via routing
filter that masked everything) stays a no-op rather than a noisy
error.

Three new unit tests:
- accepts the read_cdc shape (snapshot_id + rowid present,
  change_type absent)
- rejects a read_cdc_changes-shaped batch (change_type present)
- empty batch with change_type is still a no-op

401 unit + 70 integration tests passing.
@jghoman jghoman merged commit f43234f into main Jun 19, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant