Skip to content

fix(qwp): accept empty arrays in the egress result decoder#50

Merged
bluestreak01 merged 3 commits into
mainfrom
mt_cleanup-null-empty-array
Jun 17, 2026
Merged

fix(qwp): accept empty arrays in the egress result decoder#50
bluestreak01 merged 3 commits into
mainfrom
mt_cleanup-null-empty-array

Conversation

@mtopolnik

Copy link
Copy Markdown
Contributor

What

QuestDB emits a non-null empty array (cardinality 0) on QWP egress inline with a 0-length dimension, distinct from a NULL array (carried in the null bitmap). QwpResultBatchDecoder.parseArrayColumn rejected dl < 1, so reading an empty array back failed with ARRAY dim N must be >= 1.

Changes

  • Accept dl >= 0, keeping nDims >= 1 and a per-dimension cap so a 0 in one dimension cannot slip an arbitrary value past the element-count cap. Matches the server's QwpArrayColumnCursor and the spec.
  • Flip the QwpResultBatchDecoderHardeningTest zero-dim case to assert the empty array decodes, while a hostile oversized dimension is still rejected (now via the per-dim cap).

Verification

QwpResultBatchDecoderHardeningTest (22) and QwpResultBatchDecoderColumnTypesTest (33) pass. Proven end-to-end via the matching questdb-core e2e test: a real server's empty-array egress frame is now decoded instead of rejected.

🤖 Generated with Claude Code

QuestDB emits a non-null empty array (cardinality 0) on QWP egress inline
with a 0-length dimension, distinct from a NULL array (carried in the null
bitmap). parseArrayColumn rejected dl < 1, so reading an empty array back
failed with "ARRAY dim N must be >= 1". Accept dl >= 0, keeping nDims >= 1
and a per-dimension cap so a 0 in one dimension cannot slip an arbitrary
value past the element-count cap. Flip the zero-dim hardening case to
assert the empty array decodes while a hostile oversized dim is still
rejected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mtopolnik

Copy link
Copy Markdown
Contributor Author

Review: PR #50 (level 2)

Verdict: Approve with one recommended change

The PR fixes a real, confirmed bug: the decoder rejected the server's own empty-array egress frames (dl < 1 threw ARRAY dim N must be >= 1). The fix is correct for the common case and proven end-to-end by QwpEgressBootstrapTest.testEmptyAndNullArrayColumns (1-D ARRAY[]::DOUBLE[] → non-null, nDims=1, 0 elements, kept distinct from NULL). Memory safety is preserved: dl < 0 still guards the only value that could drive a negative payload size, and the long elements product check still bounds total size and cannot overflow (elements ≤ MAX_ARRAY_ELEMENTS ≈ 2^28 before each multiply, dl ≤ 2^28, product ≤ 2^56 < 2^63).

There is one substantive issue, plus minor notes.

Moderate

1. The per-dimension cap (MAX_ARRAY_ELEMENTS) is below the server's real per-dimension limit (DIM_MAX_LEN) — the client now rejects a narrow class of empty arrays the server actually emits, and the "Matches the server" comment is false

In-diff: QwpResultBatchDecoder.java:615-617

if (dl < 0 || dl > MAX_ARRAY_ELEMENTS) {   // MAX_ARRAY_ELEMENTS = (Integer.MAX_VALUE - 1024)/8 = 268_435_327
    throw new QwpDecodeException("ARRAY dim " + d + " out of range [0, " + MAX_ARRAY_ELEMENTS + "]: " + dl);
}

The comment claims this "Matches the server's QwpArrayColumnCursor." It does not. I traced both server components that touch this wire format:

  • QwpArrayColumnCursor.of (server reader), QwpArrayColumnCursor.java:247-261: per dimension it rejects only dimSize < 0 and relies on Math.multiplyExact for the product. No per-dimension upper cap.
  • QwpResultBatchBuffer.appendArrayBytesDirect (server's actual egress encoder), QwpResultBatchBuffer.java:551-571: per dimension it rejects only dim < 0 (line 553) and checks the product against the same MAX_ARRAY_ELEMENTS constant (line 558). It then writes the raw av.getDimLen(d) (line 569) with no per-dim cap and no normalization.

Because a 0-length dimension collapses the product to 0, the server's encoder ships an empty array with sibling dimensions up to the construction limit ArrayView.DIM_MAX_LEN = (1 << 28) - 1 = 268,435,455 (ArrayView.java:224; enforced in MutableArray.setDimLen). QuestDB explicitly supports empty arrays with large shapes — MutableArray.resetToDefaultStrides documents the example (100_000_000, 100_000_000, 0) and short-circuits stride init precisely so such shapes don't error.

So the effective server contract on the wire is: each dim ∈ [0, 268_435_455], product ≤ MAX_ARRAY_ELEMENTS. The client's new cap rejects every dim in (268_435_327, 268_435_455] — a 128-value window — whenever a sibling dimension is 0. That is a fail-closed divergence: a legitimate server frame makes the user's query error out instead of returning the empty array.

Note the cap is also redundant for non-empty arrays: if all dims ≥ 1 and any dl > MAX_ARRAY_ELEMENTS, the product already exceeds the limit and the existing product check (line 620) rejects it. The per-dim cap therefore has independent effect only on empty arrays — exactly the case this PR exists to accept.

Reachability: low. Triggering it needs a multi-dimensional array with one 0 dimension and a sibling dimension specifically in the top 128-wide sliver below 2^28. The documented (100_000_000, …, 0) shape does not hit it (100M < the cap). I did not find a concrete SQL that yields a dimension in that exact window, so a production hit is unlikely. But both server components demonstrably permit and transmit such frames, and the in-code justification is provably incorrect — which will mislead the next maintainer about where the client/server boundary actually sits.

Suggested fix — cap per-dimension at the server's real limit so the client accepts everything the server can emit while keeping the hostile-input guard:

// client cairo/ColumnType.java (no DIM_MAX_LEN constant exists there yet)
public static final int ARRAY_DIM_MAX_LEN = (1 << 28) - 1; // mirror server ArrayView.DIM_MAX_LEN
if (dl < 0 || dl > ColumnType.ARRAY_DIM_MAX_LEN) {
    throw new QwpDecodeException("ARRAY dim " + d + " out of range [0, "
            + ColumnType.ARRAY_DIM_MAX_LEN + "]: " + dl);
}

This keeps the smuggling/overflow defense (a dim > 2^28-1 in an empty array is still rejected; 2^28 * 2^28 = 2^56 keeps long elements safe), still rejects the test's hostile {0, Integer.MAX_VALUE} (2.1B > 2^28-1), and makes the "matches the server" comment true. Then update the comment to reference DIM_MAX_LEN, not the element-count cap. (Alternatively, to match the server exactly, drop the per-dim upper bound entirely and keep only dl < 0 + the product check — but the DIM_MAX_LEN bound is the better choice for a decoder hardened against a hostile server.)

Minor

  • Client-side test asserts only "doesn't throw," not the decoded result. testArrayDimZeroIsEmptyArrayButHostileDimRejected verifies decode() succeeds for {0, 5} but never checks getArrayNDims/getDoubleArrayElements return the empty-array semantics. The result assertion exists only in the parent-repo e2e test (which needs a live server). A client-side unit assertion (getDoubleArrayElements(...) .length == 0, isNull == false) would be server-independent and stronger. Adding a boundary case at dl == DIM_MAX_LEN (accept) and dl == DIM_MAX_LEN + 1 (reject) would also directly pin the disputed cap value.

  • No labels on the PR. Per the repo convention, a fix(qwp) client fix should carry labels (e.g. Bug, and the ILP/client-protocol label). The PR currently has none.

  • Test javadoc phrasing ("The original concern -- that a 0 collapses…") reads as a changelog of the prior design rather than a declarative statement of current behavior. Per the global comment guideline, prefer stating the present invariant directly.

Summary

  • Tradeoff: the fix correctly trades the old over-strict dl < 1 for dl >= 0, but substitutes a per-dimension cap (MAX_ARRAY_ELEMENTS) that is 128 below the server's real per-dimension ceiling (DIM_MAX_LEN) and that the server applies nowhere. Net effect is still a clear improvement (the common empty array now works), but it does not fully achieve the PR's stated goal of matching the server, and the justifying comment is inaccurate. Recommend capping at DIM_MAX_LEN before merge.

The egress decoder bounded each array dimension by MAX_ARRAY_ELEMENTS,
the total element-count cap. That value is 128 below DIM_MAX_LEN
((1 << 28) - 1), the server's per-dimension ceiling. For an empty
array the 0-length dimension drives the element-count product to 0,
so a sibling dimension is bounded only by the per-dimension check.
The old cap therefore rejected empty arrays carrying a dimension above
MAX_ARRAY_ELEMENTS but within DIM_MAX_LEN -- shapes the server can
build and emit, since it checks each dimension against DIM_MAX_LEN at
construction and applies no per-dimension cap when encoding egress.

Add MAX_ARRAY_DIM_LEN, mirroring the server's DIM_MAX_LEN, and bound
each dimension by it. The element-count product guard is unchanged, so
the payload still fits the int-addressed buffer and the product stays
within long range. A boundary test pins acceptance at DIM_MAX_LEN and
rejection one element past it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mtopolnik

Copy link
Copy Markdown
Contributor Author

[PR Coverage check]

😍 pass : 2 / 2 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/client/cutlass/qwp/client/QwpResultBatchDecoder.java 2 2 100.00%

@bluestreak01 bluestreak01 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Verified at level 3 against the server source.

The fix is correct, memory-safe, and precisely mirrors both the server egress encoder (QwpResultBatchBuffer.appendArrayBytesDirect) and reader (QwpArrayColumnCursor): per-dimension [0, DIM_MAX_LEN] plus the unchanged order-dependent MAX_ARRAY_ELEMENTS product cap. The product guard mirrors the server exactly — shapes whose leading-dimension prefix product exceeds the cap (e.g. (30000, 30000, 0)) throw at the server encoder before any bytes hit the wire, so the client cannot under-accept a real egress frame. Downstream accessors (getArrayNDims, getDoubleArrayElements) handle the newly-accepted 0-element arrays correctly (non-null, double[0], no int overflow). Both affected test classes pass (23 + 33, 0 failures).

Non-blocking minor suggestions:

  • Reference the existing public ArrayView.DIM_MAX_LEN instead of re-deriving the literal as the private MAX_ARRAY_DIM_LEN — the Javadoc already names that constant, so referencing it makes the mirror a compile-time fact and avoids drift.
  • Add a client-side unit assertion of the empty-array result (getArrayNDims, getDoubleArrayElements().length == 0, not-null) in QwpColumnBatchViewsTest — currently only no-throw is asserted client-side; the result-level proof lives only in the parent-repo e2e test.
  • Add the repo's Bug / client-protocol labels.

@bluestreak01 bluestreak01 merged commit 38ab83c into main Jun 17, 2026
12 checks passed
@bluestreak01 bluestreak01 deleted the mt_cleanup-null-empty-array branch June 17, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants