Skip to content

Classify NULL-to-non-Nullable MV CDC errors#4464

Closed
dtunikov wants to merge 3 commits into
mainfrom
dbi-863-classify-null-to-non-nullable-mv-cdc-errors
Closed

Classify NULL-to-non-Nullable MV CDC errors#4464
dtunikov wants to merge 3 commits into
mainfrom
dbi-863-classify-null-to-non-nullable-mv-cdc-errors

Conversation

@dtunikov

Copy link
Copy Markdown
Collaborator

Problem

ClickHouse CDC failures with code: 349, Cannot convert NULL value to non-Nullable type were surfacing as errorClass=OTHER / errorCode=UNKNOWN (DBI-863). This happens when a user-defined materialized view casts a nullable source column to a non-Nullable type (e.g. String), causing inserts pushed to the view to fail.

Reported failing views/columns:

  • cdc_user_api.stg_cdc_user_api__customer_address_mvstreet
  • cdc_otc_api.stg_cdc_otc_api__otc_otc_trade_mvsettlement_txn_id_incoming

The error code 349 (CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN) is not handled in the ClickHouse exception switch in classifier.go, and in this path the error is not wrapped as a ViewError/NormalizationError, so it fell through to the OTHER/UNKNOWN catch-all.

Fix

Add an explicit case for chproto.ErrCannotInsertNullInOrdinaryColumn (349) that classifies the error as NOTIFY_MV_OR_VIEW. PeerDB always creates Nullable destination columns, so a 349 only arises through a user-defined MV/view casting a column to a non-Nullable type — making NOTIFY_MV_OR_VIEW the correct, actionable classification.

Tests

Added TestCannotInsertNullInOrdinaryColumnShouldBeNotifyMV covering the real-world error message shape (including while pushing to view), asserting ErrorNotifyMVOrView with source clickhouse and code 349.

🤖 Generated with Claude Code

ClickHouse error code 349 (CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN,
"Cannot convert NULL value to non-Nullable type") was surfacing as
errorClass=OTHER / errorCode=UNKNOWN because it is not handled in the
exception switch and is not wrapped as a ViewError/NormalizationError.

PeerDB always creates Nullable destination columns, so a 349 only arises
when a user-defined MV/view casts a nullable source column to a
non-Nullable type. Classify it as NOTIFY_MV_OR_VIEW so the user is told
their MV/view is the problem.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dtunikov dtunikov requested a review from a team as a code owner June 19, 2026 07:55
@claude

claude Bot commented Jun 19, 2026

Copy link
Copy Markdown

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

1 similar comment
@claude

claude Bot commented Jun 19, 2026

Copy link
Copy Markdown

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

The mid-CDC failure surfaces at the classifier as a Temporal-serialized
string with the underlying *clickhouse.Exception type stripped, so it
fails every typed check and falls through to ErrorOther / "UNKNOWN"
(matching the reported errorClass=OTHER / errorCode=UNKNOWN). The typed
switch case for code 349 added earlier only fires when the cause is still
typed, which is not the production path.

Match the distinctive message "Cannot convert NULL value to non-Nullable
type" on the raw error string and classify it as NOTIFY_MV_OR_VIEW
(source clickhouse, code 349). PeerDB always creates Nullable destination
columns, so this only arises through a user-defined MV/view casting a
nullable source column to a non-Nullable type.

Add a test covering the serialized (string-only) production path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dtunikov

Copy link
Copy Markdown
Collaborator Author

Correction: the error is wrapped as a ViewError, and the original fix didn't actually fix it

Thanks to the review pointer (logs show Normalization Error: ... ClickHouse view error: code: 349, ...), I re-investigated. My original comment was wrong — it is wrapped as a ViewError/NormalizationError.

I verified the real behavior empirically against the classifier:

Scenario Result
Typed chain, code 349 NOTIFY_MV_OR_VIEW, code 349, source clickhouse
Typed ViewError chain, code 777 (no switch case at all) NOTIFY_MV_OR_VIEW, code 777, source clickhouse
String-only (Temporal-serialized: types stripped, string kept) OTHER / UNKNOWN / source other

Two conclusions:

  1. The existing catch-all already classifies any typed ViewError as MV/view, so the type-based case chproto.ErrCannotInsertNullInOrdinaryColumn is largely redundant on the typed path.
  2. The reported production signature (errorClass=OTHER / errorCode=UNKNOWN) is only reachable via the string-only path — the error reaches the classifier after Temporal serialization has stripped the *clickhouse.Exception Go type, so the entire errors.AsType[*clickhouse.Exception] block (including the new case) is skipped.

Updated fix

Added a string-based match on the distinctive message Cannot convert NULL value to non-Nullable type (raw err.Error()), placed before the Temporal/OTHER fallback, classifying it as NOTIFY_MV_OR_VIEW (source clickhouse, code 349). This is consistent with the existing top-level string matches in the classifier (e.g. the Mongo ones) and is what actually catches the production failure.

The type-based switch case is kept as a defensive net (catches code 349 even if ClickHouse changes the message wording). Added a test covering the serialized (string-only) production path; the typed-path test is retained.

Pushed in 58310d2.

@dtunikov dtunikov closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant