test(issue-135): integration repro for Resource SDK search after restart by kriszyp · Pull Request #140 · HarperFast/harper-pro

kriszyp · 2026-05-12T23:17:03Z

Summary

Adds integration tests reproducing canopy issue #135 (Resource SDK tables.X.search returning subset after Harper restart while ops API returns full set), and bumps the core submodule to include the companion fix that enables multi-node cluster integration tests.

Companion PR

Depends on HarperFast/harper#523 in core (fix: register operations in worker threads so replication WS can dispatch them). Without that, Scenario B and any multi-node cluster test fails with Operation 'add_node_back' not found and connection was required to sign certificate.

Test results

Layer	Result
Unit: in-process `resetDatabases()` + indexed search	✅ Pass
Unit: real subprocess boundary (fork writer, read in parent)	✅ Pass
Unit: schema-change reindex with mid-flight interruption	✅ Pass
Integration Scenario A: single-node graceful restart (`issue135-resource-search-after-restart.test.mjs`)	✅ Pass
Integration Scenario B: 2-node, write on node 0 → replicate to node 1 → restart node 1 (`issue135-replicated-search-after-restart.test.mjs`)	✅ Pass

The bug does not reproduce in any harness scenario. This is meaningful negative data: the issue #135 fingerprint is narrower than expected (single-node restart, replication-receiving-node restart, schema-change reindex with restart — none reproduce it). Likely triggers remaining: Fabric topology (>2 nodes, multi-region), production-scale load, or a specific combination of rolling deploy_component with active replication backlog that this harness doesn't exercise.

Changes in this PR

integrationTests/cluster/issue135-resource-search-after-restart.test.mjs — Scenario A (single-node).
integrationTests/cluster/issue135-replicated-search-after-restart.test.mjs — Scenario B (multi-node, new).
integrationTests/cluster/issue135-fixture/ — fixture component (SearchCount Resource + ScoreEvidence schema).
integrationTests/cluster/ISSUE-135-FINDINGS.md — what was tried, what passes, what the result implies.
integrationTests/cluster/ISSUE-135-PLAN.md — original plan doc kept for context.
Bump @harperfast/integration-testing to ^0.3.0.
Core submodule pointer bumped to include fix: register operations in worker threads so replication WS can dispatch them harper#523.

Review attention

Two suites are split across two files because node --test runs top-level suites concurrently — cluster startup in Scenario B interferes with Scenario A's single-node restart timing when colocated. Worth confirming this is the right pattern vs. forcing --test-concurrency=1.
Scenario A now polls describe_table after the post-deploy pollHealth to avoid a race where the graphql schema hasn't finished loading and the first insert hits database 'data' does not exist. This may be a Harper Pro issue worth investigating separately.
Fixture's SearchCount resource uses for await over tables.X.search (an AsyncIterable). Worth confirming this is the canonical Resource handler pattern.

🤖 Generated with Claude Sonnet 4.6 (1M context)

Three distinct failures in tests that were never running because the glob only matched *.ts files: - cluster/fullyConnectedReplication, cluster/replicationLoad: imported targz from a relative path (../../core/integrationTests/utils/targz.ts) that no longer exists; moved to @harperfast/integration-testing which already exports it (matches how issue135 test does it) - analytics fixture: spawn() calls lacked the required name option (Harper's wrapped spawn enforces this to deduplicate long-lived processes); added unique name per child using Date.now() + index - cloneNode "three more nodes": waitForAvailableStatus only means the node is up, not that the full replication mesh has established; added a poll loop (same pattern as replicationTopology.test.mjs) that retries up to 20× with backoff before asserting all sockets are connected Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ollow-tags only pushes annotated)

…restart Adds a single-node integration test and fixture component for serent-canopy issue #135 (tables.X.search returns subset after Harper restart while ops API returns full set). Results summary: - Scenario A (single-node graceful restart): PASSES — not reproduced here. - Scenarios B/C (multi-node replication): BLOCKED — cluster JWT key generation fails in the integration test environment, same as replicationLoad/ fullyConnectedReplication. Scenario B left as test.skip with explanation. - Unit-level results (subprocess restart, schema-change reindex): all pass. Bug is scoped to multi-node replication state. The most likely mechanism: index entries for rows received via replication replay are not rebuilt on restart, leaving them invisible to tables.X.search (secondary index walk) while ops API scan of primaryStore still sees them. Files added: - integrationTests/cluster/issue135-resource-search-after-restart.test.mjs - integrationTests/cluster/issue135-fixture/ (SearchCount resource + ScoreEvidence schema) - integrationTests/cluster/ISSUE-135-PLAN.md / ISSUE-135-FINDINGS.md Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-05-12T23:19:40Z

Reviewed; no blockers found.

- Adds Scenario B (`issue135-replicated-search-after-restart.test.mjs`) which exercises the most likely repro vector for serent-canopy issue #135: insert rows on node 0 → wait for replication to drain to node 1 → restart node 1 → query node 1 via the Resource SDK (`tables.X.search`). Bug does not reproduce — both ops API and Resource SDK return the full row set. - Keeps the two scenarios in separate files because `node --test` runs top-level suites concurrently and the cluster startup of Scenario B interferes with Scenario A's single-node restart timing when colocated. - Adds a post-deploy `describe_table` poll to Scenario A — `pollHealth` alone doesn't guarantee the graphql schema has finished loading and the first insert sometimes returned "database 'data' does not exist". - Bumps `@harperfast/integration-testing` to ^0.3.0. - Bumps the core submodule pointer to include the companion fix (`fix: register operations in worker threads so replication WS can dispatch them`) which the cluster setup in Scenario B depends on. Updates ISSUE-135-FINDINGS.md to reflect the new test results. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

socket-security · 2026-05-13T01:16:52Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	@harperfast/integration-testing@0.2.0 ⏵ 0.3.0	⁺³			⁺⁵

View full report

cb1kenobi · 2026-05-13T15:25:17Z

name is not a valid spawn option.

It is with harper's spawn (where we replace the native node module with constrained child_process module), and name is required (it performs deduplication/checking if the process has already been spawned by another thread). This was based on our conversation where you said you thought it was best to preserve the existing spawn function for this functionality.
We still could add our own spawn if this creates confusion.

I'm confused. node:child_process is not Node's node:child_process? I vividly remember this conversation. Given the context, I don't think we have many options.

It is attenuated/constrained version of node:child_process, and returned(imported) for modules that import node:child_process. But maybe we should have the process de-duping version imported from harper? (but this goes back to the problem; I have never seen a module actually use node:child_process correctly. Failing fast isn't really any worse than feigning success 🤷 ).

cb1kenobi · 2026-05-21T03:46:34Z

I'm confused. node:child_process is not Node's node:child_process? I vividly remember this conversation. Given the context, I don't think we have many options.

kriszyp and others added 4 commits May 12, 2026 16:01

Reenable integration testing by matching the test files

3fc88e6

Push lightweight tag explicitly (npm version creates lightweight, --f…

0fb69db

…ollow-tags only pushes annotated)

kriszyp requested a review from a team as a code owner May 12, 2026 23:17

Base automatically changed from re-enable-integration-testing to main May 13, 2026 14:54

cb1kenobi approved these changes May 13, 2026

View reviewed changes

cb1kenobi approved these changes May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(issue-135): integration repro for Resource SDK search after restart#140

test(issue-135): integration repro for Resource SDK search after restart#140
kriszyp wants to merge 5 commits into
mainfrom
test/issue-135-replication-repro

kriszyp commented May 12, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 12, 2026

Uh oh!

socket-security Bot commented May 13, 2026

Uh oh!

cb1kenobi May 13, 2026

Uh oh!

kriszyp May 15, 2026

Uh oh!

cb1kenobi May 21, 2026

Uh oh!

kriszyp May 21, 2026

Uh oh!

cb1kenobi May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kriszyp commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Companion PR

Test results

Changes in this PR

Review attention

Uh oh!

claude Bot commented May 12, 2026

Uh oh!

socket-security Bot commented May 13, 2026

Uh oh!

cb1kenobi May 13, 2026

Choose a reason for hiding this comment

Uh oh!

kriszyp May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cb1kenobi May 21, 2026

Choose a reason for hiding this comment

Uh oh!

kriszyp May 21, 2026

Choose a reason for hiding this comment

Uh oh!

cb1kenobi May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kriszyp commented May 12, 2026 •

edited

Loading