Skip to content

fix: reduce optimize lock hold time with revision check#17

Open
PhantomInTheWire wants to merge 1 commit intomasterfrom
codex/p0-incremental-optimize
Open

fix: reduce optimize lock hold time with revision check#17
PhantomInTheWire wants to merge 1 commit intomasterfrom
codex/p0-incremental-optimize

Conversation

@PhantomInTheWire
Copy link
Copy Markdown
Owner

@PhantomInTheWire PhantomInTheWire commented Apr 26, 2026

Summary

  • Clone state under read lock, rebuild optimized segments outside RwLock
  • Use revision counter to detect concurrent writes/checkpoint mutations
  • Return clear FailedPrecondition error if optimize sees stale revision (no silent skip)
  • Serialize checkpoint file writes with checkpoint_lock to avoid races
  • Add regression tests for optimize persistence and concurrent write durability

P0 Issue

Optimize is a blocking full-dataset rewrite under the collection write lock.


Open in Devin Review

- Clone state under read lock, rebuild optimized segments outside RwLock
- Use revision counter to detect concurrent writes/checkpoint mutations
- Return clear FailedPrecondition error if optimize sees stale revision
- Serialize checkpoint file writes with checkpoint_lock to avoid races
- Add regression test for optimize persistence without extra flush
- Add regression test for optimize vs concurrent write durability

P0: optimize is a blocking full-dataset rewrite
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

Warning

Rate limit exceeded

@PhantomInTheWire has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 59 minutes and 59 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 59 minutes and 59 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f8ea08e0-6d94-4b92-839a-723e7279e76d

📥 Commits

Reviewing files that changed from the base of the PR and between 1d36c34 and 4fe485a.

📒 Files selected for processing (4)
  • crates/garuda-engine/src/lib.rs
  • crates/garuda-engine/src/recovery_service.rs
  • crates/garuda-engine/src/state.rs
  • crates/garuda-engine/tests/test_collection_optimize.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/p0-incremental-optimize

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +191 to +204
let mut optimized = {
let state = self.read_state();
ensure_collection_is_writable(&state)?;
let mut optimized = state.clone();
let meta = optimized.meta.clone();
optimized.segments.optimize(
&mut optimized.next_segment_id,
optimized.options.segment_max_docs,
&optimized.schema,
|doc_id| meta.is_deleted(doc_id),
);
state.rebuild_indexes();
optimized.rebuild_indexes();
optimized
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 RwLock read guard held during expensive optimization prevents concurrent writes

The self.read_state() guard (state) at line 192 lives until the end of the block at line 204. This means the RwLock read lock is held during the expensive segments.optimize() (lines 196-201) and rebuild_indexes() (line 202) calls. Since Rust's RwLock blocks writers while any read lock is held, all concurrent write operations (insert, upsert, update, delete, delete_by_filter, DDL, flush) are blocked for the entire optimization duration.

The revision-based conflict check at line 209 was specifically introduced to detect concurrent modifications, but the long-held read lock prevents writers from making progress during optimization, defeating much of the stated purpose ("reduce optimize lock hold time"). After state.clone() at line 194, the read guard is no longer needed — only the cloned optimized is used. Dropping the read guard immediately after cloning would allow concurrent writes to proceed during the optimization computation phase.

Suggested change
let mut optimized = {
let state = self.read_state();
ensure_collection_is_writable(&state)?;
let mut optimized = state.clone();
let meta = optimized.meta.clone();
optimized.segments.optimize(
&mut optimized.next_segment_id,
optimized.options.segment_max_docs,
&optimized.schema,
|doc_id| meta.is_deleted(doc_id),
);
state.rebuild_indexes();
optimized.rebuild_indexes();
optimized
};
let mut optimized = {
let state = self.read_state();
ensure_collection_is_writable(&state)?;
state.clone()
};
let meta = optimized.meta.clone();
optimized.segments.optimize(
&mut optimized.next_segment_id,
optimized.options.segment_max_docs,
&optimized.schema,
|doc_id| meta.is_deleted(doc_id),
);
optimized.rebuild_indexes();
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant