Skip to content

If a cache entry can't be found on disk, remove the cache entry#894

Merged
mostynb merged 1 commit intobuchgr:masterfrom
mostynb:fix_ghost_lru_entries
May 6, 2026
Merged

If a cache entry can't be found on disk, remove the cache entry#894
mostynb merged 1 commit intobuchgr:masterfrom
mostynb:fix_ghost_lru_entries

Conversation

@mostynb
Copy link
Copy Markdown
Collaborator

@mostynb mostynb commented May 5, 2026

Initial mitigation for #893.

im-nick-adams added a commit to im-nick-adams/bazel-remote that referenced this pull request May 5, 2026
Matches upstream PR buchgr#894 by mostynb. When the slow-path retry in
availableOrTryProxy() fails to open a blob file, remove the stale
LRU entry while still holding the lock. This prevents ghost entries
from persisting indefinitely and causing repeated 'Lost inputs no
longer available remotely' failures for clients using
--remote_download_toplevel.

Upstream issue: buchgr#893
Upstream PR: buchgr#894
Copy link
Copy Markdown
Contributor

@ulrfa ulrfa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good! I agree it make sense to delete phantom entries to minimize the impact of issues.

However, see suggestion regarding the logging.

@ulrfa
Copy link
Copy Markdown
Contributor

ulrfa commented May 6, 2026

The github.com web UI gives weird error messages for my review comment. I'm trying to write it here instead:

If os.Open() in the fast path (line 465) is failing with IsNotExist and lru.Get() on line 471 is not finding the key, then I assume it is caused by an undersized cache and the operator needs to address the problem. The existing warning is reasonable for that scenario:

"Warning: expected %q to exist on disk, undersized cache?"

But I expect the os.Open() in the slow path (line 474) to never fail, and if it does it could indicate either:

  • Bug in bazel-remote.
  • Something external manipulating the disk storage.
  • Several bazel-remote instances incorrectly configured to share same disk storage.
  • Other unknown disk storage issue.

Therefore I suggest having separate log messages so that we can distinguish between if failing in fast or slow path. Perhaps also including err in the log message in case of failure in slow path?

@mostynb mostynb force-pushed the fix_ghost_lru_entries branch from 342e7cf to c283c2c Compare May 6, 2026 20:32
@mostynb
Copy link
Copy Markdown
Collaborator Author

mostynb commented May 6, 2026

Good! I agree it make sense to delete phantom entries to minimize the impact of issues.

However, see suggestion regarding the logging.

Updated the log to show if we took the slow path, and the last reported error.

@mostynb mostynb force-pushed the fix_ghost_lru_entries branch from c283c2c to acdfca9 Compare May 6, 2026 20:34
@mostynb mostynb force-pushed the fix_ghost_lru_entries branch from acdfca9 to 7ee9ad2 Compare May 6, 2026 20:57
@mostynb mostynb merged commit 7ee9ad2 into buchgr:master May 6, 2026
3 checks passed
@mostynb mostynb deleted the fix_ghost_lru_entries branch May 6, 2026 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants