Fix SizeLimitedStream cursor over-counting elements across pages#45
Conversation
Co-authored-by: Lucila Stancato <lucila@users.noreply.github.com>
|
|
||
| // Limit reached: next page is empty and exhaustive, and the inner stream is not queried again. | ||
| $result = $size_limited_stream->enumerate($page_size, $cursor); | ||
| $this->assertSame(0, $result->get_size()); |
There was a problem hiding this comment.
| $this->assertSame(0, $result->get_size()); | |
| $this->assertSame(2, $result->get_size()); |
can we make stream return 3 elements instead of 5 so that we need to fetch more pages. The last page should contain less elements => 2. because the inner stream returns 3 elements each time ,but size limited stream will truncate the results....
pagination would result in:
| page | elements | exhausted | count so far |
|1 | 3 | false | 3 |
|2 | 3 | false | 6 |
|3 | 3 | false | 9 |
|4 | 3 | false | 12 |
|5 | 3 | false | 15 |
|6 | 3 | false | 18 |
|7 | 2 | false | 20 |
There was a problem hiding this comment.
Good call — reworked the test to walk 7 pages and assert the truncated final page, matching your table: pages serve 3, 3, 3, 3, 3, 3, 2 for limit: 20, with exactly(7) inner queries.
One adjustment worth flagging: I set the requested page size to 3 (equal to the inner batch) rather than having the inner return fewer than the requested count. SizeLimitedStream computes its own exhaustion as count(derived) < $count, so if we request 5 and the inner returns 3, every short page reports is_exhaustive === true — which would contradict the "exhausted: false" rows in your table and means a real consumer would stop after page 1. Requesting 3 keeps count == derived on full pages, so they stay non-exhausted exactly as charted, and the last page still truncates 3 → 2 against the remaining budget.
(The single-page "inner returns fewer than requested" path is already covered by test_enumerate_less_count.)
lucila
left a comment
There was a problem hiding this comment.
can we adjust unit tests?
🤔 shouldn't the exhaust condition depend on the inner stream?? original code $result = $this->stream->enumerate($count, $cursor->get_inner_cursor(), $tracer, $option);
$elements = $result->get_elements();
...
// if this page size is greater than stream's limit, we don't want to keep paginating.
$is_exhausted = count($derived_elements) < $count || count($derived_elements) >= $this->limit;suggestion $result = $this->stream->enumerate($count, $cursor->get_inner_cursor(), $tracer, $option);
$elements = $result->get_elements();
...
// if this page size is greater than stream's limit, we don't want to keep paginating.
$is_exhausted = $result->is_exhaustive() || count($derived_elements) >= $this->limit; |
lucila
left a comment
There was a problem hiding this comment.
thanks for working on the fix.
Since you are improving SizeLimitedStream, it would be nice to address this early exhaustion issue: #45 (comment)
SizeLimitedStream inferred exhaustion from count(derived) < requested count, which prematurely ended pagination whenever an inner stream returned a short-but-not-exhausted page (e.g. a FilteredStream that spent its retries). Trust the inner stream's is_exhaustive() signal instead; the limit is still enforced by the >= limit clause and the balance check. Replace test_enumerate_less_count (which only covered the deleted heuristic) with test_enumerate_inner_exhausted / _inner_not_exhausted.
|
Good call — applied in fe469c7. |
What and why? 🤔
SizeLimitedStreamis meant to cap a stream atlimitelements across a paginated session, but alimit: 20served across four pages stream ever served 18: three pages of 5, a starved 4th page of 3, then exhaustion.Root cause.
SizeLimitedStreamCursor::_combine_withusedmax($a, $b) + 1. The+1is correct only for the within-page fold (StreamResult::get_combined_cursor), but a typical paginating caller also re-combines a page's already-combined cursor with the incoming request cursor — combining two aggregates — so the+1double-counts each page boundary once per page.current_sizedrifts above the true count bypages − 1:current_size(before)balance ≤ 0By page 4 the drifted size (17) leaves
balance = 20 − 17 = 3, so the page is clamped to 3 and the stream stops short of its own limit.Fix. Make the cursor's
counta true absolute cumulative position and combine idempotently:SizeLimitedStream::_enumeratetags each element withcurrent_size + $index + 1(not a constant+ 1).SizeLimitedStreamCursor::_combine_withuses plainmax(no+ 1).Both changes are required together. After the fix,
current_sizeruns5 → 10 → 15 → 20and the stream serves exactly 20.Testing Steps ✍️
vendor/bin/phpunit tests/unit/Tumblr/StreamBuilder/Streams/SizeLimitedStreamTest.phpvendor/bin/phpunit tests/unit/Tumblr/StreamBuilder/StreamCursors/SizeLimitedStreamCursorTest.phptestEnumerateReachesLimitAcrossPagespasses — it walks four equal pages and asserts the cursor's cumulative size is5 → 10 → 15 → 20(it fails on the old+ 1arithmetic with11 !== 10at page 2).You're it! 👑
@Automattic/stream-builders