Skip to content

Fixed multiple configNode bugs#17609

Open
Caideyipi wants to merge 9 commits into
masterfrom
config-bug
Open

Fixed multiple configNode bugs#17609
Caideyipi wants to merge 9 commits into
masterfrom
config-bug

Conversation

@Caideyipi

Copy link
Copy Markdown
Collaborator

Description

  1. In Simple mode, a successful response will be returned stating "Metadata has taken effect, but WAL has not been flushed to disk." Configuration changes will be lost after a restart.
  2. In the Simple mode, some files have the same sorting key, posing a risk of log replay in disorder.
  3. The return TSStatus of ConsensusManager.write() is not checked, and the exception is only a warn followed by proceeding to the success branch. This means that if consensus writing is rejected, redirected, or the state machine returns a failure, the caller will still receive ACCEPT, while the local heartbeat cache/metrics have already been created in advance, leading to a divergence between metadata and runtime states.
  4. The value returned by getRegionGroupCount() increases by 1 every time it is counted, and this value is used for capacity expansion judgment, display, and maxRegionGroupNum calculation, which directly affects the region scaling decision.
  5. When taking a snapshot, the regionMaintainTaskList lock is not acquired before traversing. Concurrent offer/poll may throw ConcurrentModificationException or put the maintenance queue snapshot into an inconsistent state.
  6. If the pipe listener fails to process the snapshot, the entire takeSnapshot() method will return false; however, ConfigRegionStateMachine combines and absorbs the failures of loadSnapshot and the pipe listener, which can amplify auxiliary link failures into consensus snapshot failures, and even mask the real failure of snapshot loading.

This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

@codecov

codecov Bot commented May 6, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 42.80000% with 143 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.95%. Comparing base (011bdec) to head (65ae336).

Files with missing lines Patch % Lines
...che/iotdb/confignode/manager/node/NodeManager.java 42.30% 75 Missing ⚠️
...nsensus/statemachine/ConfigRegionStateMachine.java 42.59% 62 Missing ⚠️
...gnode/persistence/executor/ConfigPlanExecutor.java 0.00% 2 Missing ⚠️
...onfignode/persistence/partition/PartitionInfo.java 71.42% 2 Missing ⚠️
...de/procedure/impl/node/AddConfigNodeProcedure.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17609      +/-   ##
============================================
+ Coverage     40.90%   40.95%   +0.05%     
  Complexity     2610     2610              
============================================
  Files          5186     5186              
  Lines        351420   351563     +143     
  Branches      44991    45018      +27     
============================================
+ Hits         143758   143995     +237     
+ Misses       207662   207568      -94     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@CRZbulabula CRZbulabula left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall

This PR fixes 6 real correctness bugs in ConfigNode (WAL ordering, log sorting, unchecked consensus writes, off-by-one, concurrent snapshot, pipe-listener failure masking). Changes are well-structured, rollback logic is carefully designed, and test coverage for core scenarios is solid.

Inline comments below cover a few items worth attention before merge.

if (persistStatus.getCode() != TSStatusCode.SUCCESS_STATUS.getStatusCode()) {
return persistStatus;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WAL replay idempotency requirement

With the new write-ahead ordering (persist → execute), if persistPlanForSimpleConsensus succeeds at line 125 but executor.executeNonQueryPlan fails at line 133, the plan is already persisted in WAL and will be replayed on restart.

This is correct WAL semantics, but it introduces a hard requirement: all ConfigPhysicalPlan implementations must be idempotent under replay. For example, a RegisterDataNodePlan replayed against an already-registered node must not fail or double-register.

Is this idempotency property currently guaranteed across all ConfigPhysicalPlan types? If not, this could cause issues on crash recovery in edge cases.

ByteBuffer buffer = plan.serializeToByteBuffer();
buffer.position(buffer.limit());
simpleLogWriter.write(buffer);
simpleLogWriter.force();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force() on every write makes scheduled flush redundant

Now that every write is synchronously forced here, the scheduled flushWALForSimpleConsensus thread (lines 447-455) becomes redundant — it calls simpleLogWriter.force() periodically, but there's nothing left to flush.

This is harmless (double-force is idempotent) but worth either:

  1. Removing the scheduled thread to avoid confusion, or
  2. Adding a brief comment explaining the intentional belt-and-suspenders approach.

For ConfigNode metadata operations (low write frequency), the per-write force() is the right durability choice.

"Persist current ConfigPhysicalPlan for ConfigNode SimpleConsensus mode failed", e);
return new TSStatus(TSStatusCode.EXECUTE_STATEMENT_ERROR.getStatusCode())
.setMessage(
"Persist ConfigNode SimpleConsensus log failed: " + String.valueOf(e.getMessage()));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: String.valueOf(e.getMessage()) is redundant — string concatenation with + already handles null by converting to "null". Can simplify to:

"Persist ConfigNode SimpleConsensus log failed: " + e.getMessage()

final List<RegionMaintainTask> copiedRegionMaintainTaskList;
synchronized (regionMaintainTaskList) {
copiedRegionMaintainTaskList = new ArrayList<>(regionMaintainTaskList);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix — synchronized copy before iterating. One minor observation: this copies the entire list under the lock. If regionMaintainTaskList can grow very large, this could cause a brief pause. In practice, for ConfigNode this list is typically small, so this is fine. Just noting for future awareness.

@CRZbulabula CRZbulabula left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

}
});
return result.getAndIncrement();
return result.get();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug?

Caideyipi added 4 commits May 22, 2026 14:02
# Conflicts:
#	iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/consensus/statemachine/ConfigRegionStateMachine.java
#	iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/procedure/impl/node/AddConfigNodeProcedure.java
luoluoyuyu

This comment was marked as outdated.

@luoluoyuyu luoluoyuyu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

Fixes several ConfigNode correctness issues including WAL persist-before-execute and rollback on execute failure. Before merge, document or test that all ConfigPhysicalPlan types are idempotent under WAL replay when persist succeeds and execute fails.

Also resolve CI failures in this PR.

protected TSStatus write(ConfigPhysicalPlan plan) {
SimpleConsensusPersistResult persistResult = null;
if (ConsensusFactory.SIMPLE_CONSENSUS.equals(CONF.getConfigNodeConsensusProtocolClass())) {
persistResult = persistPlanForSimpleConsensus(plan);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With persist before execute, a successful persistPlanForSimpleConsensus followed by a failed executeNonQueryPlan leaves the plan in WAL and it will be replayed on restart. Every ConfigPhysicalPlan executed on replay must be idempotent. Please add an audit or tests for the plan types touched by this PR.


TSStatus result;
try {
result = executor.executeNonQueryPlan(plan);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollbackFailedPlanForSimpleConsensus on execute failure is the right safety net. Ensure rollback failure paths are logged clearly enough for operators when WAL and in-memory state may diverge.

Caideyipi added 2 commits June 3, 2026 11:30
# Conflicts:
#	iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/path/PartialPath.java
@Caideyipi

Copy link
Copy Markdown
Collaborator Author

Addressed the review feedback and pushed the updates.

Changes:

  • Added WAL rollback/replay coverage: failed SimpleConsensus writes now assert that the rolled-back WAL contains no replayable plan.
  • Added idempotency coverage for replay-relevant NodeInfo plans touched by this PR: RegisterDataNodePlan, ApplyConfigNodePlan, and UpdateVersionInfoPlan.
  • Improved SimpleConsensus rollback failure logging with plan type, log file, truncate offset, and endIndexBeforeWrite; the returned status now also notes that the persisted plan may replay after restart if rollback fails.
  • Removed newly introduced TODO comments; todo-check is now passing.
  • Merged latest master and resolved the PartialPath conflict by keeping master's implementation, so the PR is mergeable again.

Verification:

  • mvn -pl iotdb-core/confignode spotless:apply
  • mvn -pl iotdb-core/node-commons spotless:apply
  • mvn -pl iotdb-core/confignode '-Dtest=ConfigRegionStateMachineTest,NodeInfoTest' test
  • git diff --check origin/master...HEAD

Current CI note: the Simple (17) failures were checkout/network failures on self-hosted runners (gnutls_handshake before tests started). I reran the failed jobs. todo-check has passed; the remaining jobs are still pending.

@sonarqubecloud

sonarqubecloud Bot commented Jun 3, 2026

Copy link
Copy Markdown

# Conflicts:
#	iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/consensus/statemachine/ConfigRegionStateMachine.java
#	iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/persistence/executor/ConfigPlanExecutor.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants