Skip to content

feat: reference deduplication and cycle-safe decoding via DecodeSession#4257

Open
evanchooly wants to merge 1 commit into
masterfrom
claude/resume-reference-caching-p1lyU
Open

feat: reference deduplication and cycle-safe decoding via DecodeSession#4257
evanchooly wants to merge 1 commit into
masterfrom
claude/resume-reference-caching-p1lyU

Conversation

@evanchooly

Copy link
Copy Markdown
Member

Add a per-document ThreadLocal decode session that caches entity instances
by (collection, id). Nested activate() calls reuse the existing session so
all @reference fetches within one document decode share the same cache.

  • DecodeSession: lightweight ThreadLocal cache with activate/deactivate;
    activate() returns true only for the root caller (owns deactivation)
  • MorphiaCursor: activates a session around next()/tryNext(); only
    deactivates if it was the root activator
  • EntityDecoder: pre-registers the entity instance via peekId() (mark/reset)
    before decodeProperties(), enabling cycle detection
  • ReferenceCodec: checks session cache before firing DB queries for both
    single and collection references
  • TestReferences: adds testReferenceDeduplication (within-document) and
    testCyclicReferenceDoesNotStackOverflow

https://claude.ai/code/session_01NRjHAe69rLVqEZYZHnBsBF

Copilot AI review requested due to automatic review settings May 19, 2026 01:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a per-decode ThreadLocal cache (DecodeSession) that maps (collection, id) → entity so that, within a single document decode, repeated @Reference lookups resolve to the same Java instance and self-referential graphs do not stack-overflow. MorphiaCursor and MorphiaCodec activate/deactivate the session, EntityDecoder pre-registers each decoded entity (using a "peek the id" trick), and ReferenceCodec consults the session before issuing DB fetches.

Changes:

  • Introduce DecodeSession ThreadLocal with activate/deactivate/register/lookup/contains.
  • Hook activation into MorphiaCursor.next/tryNext and MorphiaCodec.decode; register early via peekId in EntityDecoder.
  • Make ReferenceCodec.fetch consult the session cache for both single and collection references, and add TestNG tests for deduplication and cyclic references.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
core/src/main/java/dev/morphia/mapping/codec/DecodeSession.java New ThreadLocal-backed cache for decoded entities.
core/src/main/java/dev/morphia/query/MorphiaCursor.java Activates/deactivates a session around each next/tryNext.
core/src/main/java/dev/morphia/mapping/codec/pojo/MorphiaCodec.java Activates/deactivates a session around decode.
core/src/main/java/dev/morphia/mapping/codec/pojo/EntityDecoder.java Pre-reads _id, registers entity instance, and post-registers via mapper id.
core/src/main/java/dev/morphia/mapping/codec/references/ReferenceCodec.java Looks up single and collection references in the session before fetching from DB.
core/src/test/java/dev/morphia/test/mapping/TestReferences.java Adds tests for in-document dedup and cyclic graph decoding.

Comment on lines +49 to 60
T instance = (T) instanceCreator.getInstance();

DecodeSession session = DecodeSession.current();
Object prereadId = null;
if (session != null) {
prereadId = peekId(reader);
if (prereadId != null) {
session.register(classModel.collectionName(), prereadId, instance);
}
}

decodeProperties(reader, decoderContext, instanceCreator, classModel);
}
}
return null;
} catch (Exception e) {
Comment on lines +63 to +64
PropertyModel idProp = classModel.getIdProperty();
if (idProp != null) {
Comment on lines +81 to +88
boolean root = DecodeSession.activate();
try {
return getDecoder().decode(reader, decoderContext);
} finally {
if (root) {
DecodeSession.deactivate();
}
}
Comment on lines +51 to +70
DecodeSession session = DecodeSession.current();
Object prereadId = null;
if (session != null) {
prereadId = peekId(reader);
if (prereadId != null) {
session.register(classModel.collectionName(), prereadId, instance);
}
}

decodeProperties(reader, decoderContext, instanceCreator, classModel);

if (session != null && prereadId == null) {
PropertyModel idProp = classModel.getIdProperty();
if (idProp != null) {
Object id = morphiaCodec.getDatastore().getMapper().getId(instance);
if (id != null) {
session.register(classModel.collectionName(), id, instance);
}
}
}
@evanchooly

Copy link
Copy Markdown
Member Author

Code review

Found 2 issues:

  1. ConstructorCreator-backed entities are pre-registered with a null/zero-valued instance. instanceCreator.getInstance() is called before decodeProperties(), which for all-args-constructor entities (Java records, Kotlin data classes) constructs the instance immediately with all parameters at their zero defaults. Because ConstructorCreator caches the first-constructed instance, subsequent set() calls during decodeProperties() fall into the instance != null branch and attempt to set fields via reflection — which silently fails for final fields. The session cache then holds a permanently broken shell that is returned to any caller who looks it up.

LOG.debug(format("Decoding document using codec for %s'", morphiaCodec.getEntityModel().getType().getName()));
MorphiaInstanceCreator instanceCreator = getInstanceCreator();
T instance = (T) instanceCreator.getInstance();
DecodeSession session = DecodeSession.current();
Object prereadId = null;
if (session != null) {
prereadId = peekId(reader);
if (prereadId != null) {
session.register(classModel.collectionName(), prereadId, instance);
}
}
decodeProperties(reader, decoderContext, instanceCreator, classModel);
if (session != null && prereadId == null) {

  1. LifecycleDecoder.decode() overrides EntityDecoder.decode() without calling super.decode(), so all session registration logic (peekId, session.register) is entirely skipped for entities annotated with @PreLoad/@PostLoad. Cyclic or duplicate @Reference graphs involving lifecycle-annotated entities will still cause stack overflows or duplicate DB fetches, silently defeating the feature for that class of entities.

@SuppressWarnings("unchecked")
public T decode(BsonReader reader, DecoderContext decoderContext) {
Document document = getMorphiaCodec().getRegistry().get(Document.class).decode(reader, decoderContext);
EntityModel model = getMorphiaCodec().getEntityModel();
if (model.useDiscriminator()) {
String discriminator = document.getString(model.discriminatorKey());
if (discriminator != null) {
Class<?> discriminatorClass = getMorphiaCodec().getDiscriminatorLookup().lookup(discriminator);
// need to load the codec to initialize cachedCodecs in field models
Codec<?> codec = getMorphiaCodec().getRegistry().get(discriminatorClass);
if (codec instanceof MorphiaCodec) {
model = ((MorphiaCodec<?>) codec).getEntityModel();
} else {
throw new CodecConfigurationException(format("Non-entity class used as discriminator: '%s'.", discriminator));
}
}
}
final MorphiaInstanceCreator instanceCreator = model.getInstanceCreator(getMorphiaCodec().getConversions());
T entity = (T) instanceCreator.getInstance();
model.callLifecycleMethods(PreLoad.class, entity, document, getMorphiaCodec().getDatastore());
decodeProperties(new DocumentReader(document, getMorphiaCodec().getConversions()), decoderContext, instanceCreator,
model);
model.callLifecycleMethods(PostLoad.class, entity, document, getMorphiaCodec().getDatastore());
return entity;
}

Introduces a per-cursor Identity Map (DecodeSession) that eliminates
redundant DB fetches and prevents infinite recursion on cyclic @reference
graphs.

Key changes:
- DecodeSession: ThreadLocal cache scoped to cursor lifetime (not per-next()
  call), shared across all documents iterated by one query. Removed unused
  contains() method.
- MorphiaCursor: session activated in constructor, deactivated in close();
  toList() uses try(this) so close() is always called. next()/tryNext()
  simplified to plain delegation.
- MorphiaCodec: activates session as fallback for non-cursor decode paths.
- EntityDecoder: pre-reads _id via peekId() before decodeProperties() so
  the instance is registered before any child references are resolved,
  enabling cycle detection. Logs at DEBUG on peekId failure instead of
  swallowing silently.
- ReferenceCodec: scalar/List/Set/Map reference paths all check the session
  before hitting the DB. List/Set use a partial-hit strategy: cached entries
  are reused and only misses are fetched. fetchSingle now uses
  try-with-resources to close its inner cursor, preventing session leaks
  that would otherwise corrupt lazy-reference behaviour in later decodes.
  buildIdMap extracted from fetchCollection for reuse.
@evanchooly evanchooly force-pushed the claude/resume-reference-caching-p1lyU branch from a64cc50 to 6a8b3be Compare June 24, 2026 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants