Skip to content

evict_to_len crash: node in linked list but missing from hash trie #154

@droideck

Description

@droideck

Environment

  • concread 0.5.10, plus diagnostic commit dd13f6a
  • rustc 1.88.0
  • Consumer: 389-ds-base

Problem

Under sustained concurrent load, evict_to_len panics because a node popped from the freq/rec linked list has no corresponding entry in the hash trie. cache.get_mut(&key) returns None for a key that was inserted moments earlier in the same commit.

This happens when a read transaction completes and try_quiesce_stats opportunistically acquires the write lock to commit pending cache maintenance.

Initial crash (concread 0.5.10, no diagnostics)

#13 ARCache::evict_to_len                            (mod.rs:1346)
      r = Option::None
      owned = LLNodeOwned { inner: 0x7f91877cfb40 }
#14 ARCache::evict                                   (mod.rs:1455)
      rec_to_len=120587, freq_to_len=4243, delta=1, p=86482
#15 ARCache::commit                                  (mod.rs:1694)
      commit_txid=40047203
#17 ARCache::try_quiesce_stats                       (mod.rs:693)
      tlocal: items=0, hit: len=0
      CursorWrite.length=136719, stats.includes=1
#18 cache_char_read_complete                         (cache.rs:228)

Diagnostic build crash (0.5.10 + dd13f6a)

I've added scanning for duplicates in the source list and checks against the destination ghost list, then reproduced:

evict_to_len: KEY MISSING from cache map for key "<dn_value>".
  Popped node ptr=0x7fb8d8ac7470, node_txid=41318110, node_size=1
  commit_txid=41318110, ll.len()=107677, to_ll.len()=34329,
  target_size=107677, dupes_remaining_in_src_ll=0,
  key_found_in_dest_ghost_ll=false
#12 ARCache::evict_to_len                            (mod.rs:1457)
      in_to_ll = false, dupes_in_ll = 0
      r = Option::None
      owned = LLNodeOwned { inner: 0x7fb8d8ac7470 }
#13 ARCache::evict                                   (mod.rs:1584)
      rec_to_len=107677, freq_to_len=17153, delta=1, p=84990
#14 ARCache::commit                                  (mod.rs:1881)
      commit_txid=41318110
#16 ARCache::try_quiesce_stats                       (mod.rs:711)
      tlocal: items=0, hit: len=0
      CursorWrite.length=159160, stats.includes=1
#17 cache_char_read_complete                         (cache.rs:228)

IIUC, node_txid == commit_txid means the node was added to the rec list during this commit by one of the drain functions. node_size=1 is HAUNTED_SIZE, consistent with a Haunted revival (I filed the drain_inc_rx size bug fix). No duplicates in the source list, key not in the ghost list. The key string itself is intact and readable in the panic message -- the linked list node's memory is fine. The hash trie just can't find it.

I wasn't able to find any place where we remove a cache map entry without removing the corresponding linked list node... And 389DS's code is fine too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions