Open
Conversation
Spawns the engine API as a child process, seeds 5,000 rows into a custom Stripe list test server, then runs 60 successive pipeline_sync requests with time_limit=2. Measures RSS after each iteration and asserts that post-warmup growth stays stable (slope <500 KB/iter, total <200 MB). Runs as a dedicated e2e_memory CI job — no secrets required. Made-with: Cursor Committed-By-Agent: cursor
Made-with: Cursor Committed-By-Agent: cursor
CI data shows the engine stabilizes around iteration 25 (JIT, connection pools, V8 heap growth). After that RSS is flat at ~503 MB. Bumping warmup from 10 to 25 so the regression analysis only covers the plateau. Made-with: Cursor Committed-By-Agent: cursor
CI and local data show the fix works: RSS plateaus after warmup. Minor V8 heap growth at high iteration counts is normal and should not trip the regression test. Thresholds now target the original leak pattern (>5000 KB/iter) with margin. Made-with: Cursor Committed-By-Agent: cursor
With time_limit=2, the test server responds fast enough that syncs complete before the limit fires — the leak path is never exercised. Dropping to 0.1s ensures every window is cut short mid-pagination. Verified locally: Pre-fix: slope 3259 KB/iter, 109 MB growth → FAIL Post-fix: slope 403 KB/iter, 41 MB growth → PASS Made-with: Cursor Committed-By-Agent: cursor
The test now checks that >=80% of sync windows produce an eof with reason=time_limit. Without this, a fast test server could let syncs complete before the limit fires, silently bypassing the leak path (exactly what happened with time_limit=2s). Also: parse NDJSON responses to detect the time_limit eof marker. Made-with: Cursor Committed-By-Agent: cursor
Refactors the RSS detector into a shared e2e harness and adds a fast synthetic self-test that proves the detector stays green for a stable child process and trips for an intentionally leaky one. The dedicated memory leak CI job now runs both the harness self-test and the real engine regression test. Made-with: Cursor Committed-By-Agent: cursor
Keep the synthetic harness self-test as the strict proof that the detector catches a real leak, and relax the real-engine slope threshold to reduce flake on noisy shared CI runners while still exercising the time_limit code path. Made-with: Cursor Committed-By-Agent: cursor
3033a14 to
b85d7cb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an automated memory leak regression test that validates the fix from #279.
createStripeListServerpipeline_sync?time_limit=2requestsps -o rss=New
e2e_memoryCI job runs this test independently — no secrets required.Test plan
e2e_memoryCI job passesMade with Cursor