Scan multi-line quoted scalars incrementally by jskoiz · Pull Request #59 · jskoiz/saneyaml

jskoiz · 2026-06-05T19:21:40Z

Summary

Closes a parse-time O(N²) denial-of-service in the multi-line quoted-scalar collector — the same vulnerability class previously fixed for flow collections, but the quoted-scalar path was missed.

The collector built up a multi-line quoted scalar line by line and, on every appended continuation line, re-scanned the entire accumulated buffer from byte 0 (via quoted_scalar_close_end) to locate the closing quote. That makes close detection Σ O(k) = O(N²) in the number of accumulated bytes, so a long unterminated quoted scalar took quadratic time to parse-or-reject. Before the fix, a multi-line unterminated double-quoted scalar scaled ~4× per input doubling (16k continuation lines ≈ 0.5s, growing toward tens of seconds at larger sizes); single-quoted scalars were identically quadratic.

Fix

A new private incremental scanner (QuotedScalarScan) carries the close-detection state across appended lines — the "-style escape carry-over, the first closing-quote offset (cached once found), and the trailing characteristics (first trailing char and first non-whitespace trailing char) — and inspects only the newly appended bytes instead of re-scanning from byte 0. This mirrors the incremental FlowCollectionState scanner already used for flow collections.

The scanner reproduces the previous quoted_scalar_close_end / quoted_scalar_accepted_end semantics exactly: the close offset is the byte just past the first closing quote, and an accepted close additionally requires the trailing text to be all-whitespace or a whitespace-separated comment. Because the collector evaluates the close state after every appended line, each scanned chunk always ends at the current end of the buffer — matching the original chars.peek() seeing None at the buffer end, so a lone ' at a chunk boundary closes (a '' escape can never straddle the boundary, since the first ' would already have closed the scalar and stopped the collector).

Behavior is byte-for-byte unchanged for all valid and invalid inputs: same parse trees, same error messages, same spans. The now-unused quoted_scalar_accepted_end free function is removed. The public API is unchanged.

Tests

Two regression tests added to tests/dos_hardening.rs, covering both single- and double-quoted scalars with inputs that are fully scanned to end-of-input (an unterminated quoted scalar is not a nested collection, so the nesting-depth limit never short-circuits the scan — the timing is a genuine measurement of close-detection cost):

multiline_unterminated_quoted_scalar_scales_subquadratically — doubling the line count (100k → 200k) must scale well below 4× (asserts ≤ 3× with headroom for timer/allocator noise).
large_multiline_unterminated_quoted_scalar_finishes_quickly — a >1 MB unterminated quoted scalar must parse-or-reject under 1s.

Post-fix timing is cleanly linear (50k/100k/200k lines ≈ 2.1 / 4.3 / 8.8 ms; a 1.2 MB run ≈ 26 ms). All listed suites pass: dos_hardening, yaml_test_suite, event_parity, tree_parity, parser_properties, parser_indicator_regressions, diagnostics; cargo clippy --all-features is clean; scripts/check-public-api.sh reports no diff.

The multi-line quoted-scalar collector re-scanned the entire accumulated buffer from byte 0 on every appended continuation line to find the closing quote, making close detection O(N^2) in the input size. A long unterminated quoted scalar therefore took quadratic time to parse-or-reject: a buffer of many continuation lines that grew toward tens of seconds for inputs that should be handled in milliseconds. Carry the close-detection state (escape carry-over, the first closing-quote offset, and the trailing characteristics) across appended lines and inspect only newly appended bytes, mirroring the incremental scanner already used for flow collections. Parse results, error messages, and spans are unchanged for all valid and invalid inputs; only the scanning cost drops from quadratic to linear. Add regression tests covering single- and double-quoted scalars.

…w CI

# Conflicts: # tests/dos_hardening.rs

jskoiz added 3 commits June 5, 2026 09:21

Loosen wall-clock ceiling on the large quoted-scalar DoS test for slo…

3461357

…w CI

Merge remote-tracking branch 'origin/main' into rrmerge59

abf89b2

# Conflicts: # tests/dos_hardening.rs

jskoiz merged commit 73f9479 into main Jun 5, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scan multi-line quoted scalars incrementally#59

Scan multi-line quoted scalars incrementally#59
jskoiz merged 3 commits into
mainfrom
fix/quoted-scalar-quadratic-dos

jskoiz commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jskoiz commented Jun 5, 2026

Summary

Fix

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant