Skip to content

Draft: feat(format): schema evolution for the Java row codec#3714

Draft
stevenschlansker wants to merge 36 commits into
apache:mainfrom
stevenschlansker:row-codec-schema-versions
Draft

Draft: feat(format): schema evolution for the Java row codec#3714
stevenschlansker wants to merge 36 commits into
apache:mainfrom
stevenschlansker:row-codec-schema-versions

Conversation

@stevenschlansker

@stevenschlansker stevenschlansker commented May 29, 2026

Copy link
Copy Markdown
Contributor

Opt in with .withSchemaEvolution() on any row, array, or map codec builder. Fields carry @ForyVersion(since, until); removed fields are listed on a nested interface referenced from
@ForySchema(removedFields = ...). Older payloads are dispatched at read time; nothing changes when the flag is off. Standard and compact formats supported.

Why?

Currently changing row format schema definition in any way invalidates all records

What does this PR do?

Propose a new concept of format versions, each succeeding version may add or remove fields from types, and deserialization machinery picks version based on schema hash

  Schema evolution for the Java row codec

  Lets a consumer built from the current bean decode rows written by older versions of that bean.

  Adds opt-in schema evolution to the Java row codec, enabled with
  `withSchemaEvolution()` on the bean/array/map codec builders. A reader can decode
  payloads written against older versions of a bean by dispatching on a per-payload
  strict schema hash to a projection codec that reads the historical layout and
  maps it onto the current type, applying defaults for fields that did not yet
  exist and discarding fields that have been removed.

  Field history is declared with `@ForyVersion(since/until)` on live fields and on
  a `@ForySchema(removedFields = ...)` history interface for removed fields.
  `SchemaHistory` enumerates the version history, including the cross-product over
  nested versioned beans; each historical layout gets a projection codec whose row
  layout is precomputed once so projection decode costs the same per call as
  current-schema decode. The wire format uses an 8-byte strict-hash slot
  (reusing the row's existing hash slot, and new for evolution-enabled array/map elements),
  and producer/consumer must agree on the flag.

  Covers the standard and compact row formats, records (including
  `@ForyVersion` on record components) and interface beans, and nested versioned
  beans to arbitrary depth.


  API
  - `@ForyVersion` — since/until version window on a field, record component, or accessor method, defining when that field is present in the schema history.
  - `@ForySchema` — declares a bean's evolution intent and points at a nested interface describing removed fields.

  How it works

  Row payloads already carry an 8-byte schema hash. Evolution-enabled encoders keep
  a map of historical hashes; on decode, a hash mismatch is resolved against that
  map instead of failing.

  - `SchemaHistory` derives the ordered historical schemas from the version
    annotations and a strict hash per version (hash includes field name and
    nullability).
  - `RowCodecBuilder` generates one projection codec per historical version
    (`_V<n>` classes), each reading its historical schema and producing
    current-bean instances.
  - `BinaryRowEncoder.decode` dispatches on the peer hash: exact match → current
    codec; known historical hash → projection codec; else
    `ClassNotCompatibleException`.
  - Nested versioned beans dispatch recursively by strict hash, routed through
    array and map projection codecs so versioning composes through `List<Bean>` and
    `Map<_, Bean>`.

  Performance

  Both decode paths reuse the `CompactRowLayout` cache from #3717. The
  current-schema path allocates rows via `writer.newRow()`; the historical path via
  a per-projection `RowFactory` that holds its historical schema's layout, built
  once. Per-decode cost is the same on both — no layout recomputation. The
  evolution-enabled array/map codecs keep a single allocation per encode.

AI Contribution Checklist

AI Usage Disclosure

  • substantial_ai_assistance: yes
  • scope: all
  • affected_files_or_subsystems: row format Java
  • ai_review: <line-by-line self-review completed; summarize the two-reviewer loop and final no-further-comments result>
  • ai_review_artifacts:
  • human_verification: <checks run locally or in CI + pass/fail summary + contributor reviewed results>
  • performance_verification: ✔️
  • provenance_license_confirmation: ✔️
  • If yes, I included a completed AI Contribution Checklist in this PR description and the required AI Usage Disclosure.
  • If yes, my PR description includes the required ai_review summary and screenshot evidence of the final clean AI review results from both fresh reviewers on the current PR diff or current HEAD after the latest code changes.

Does this PR introduce any user-facing change?

New codec option: schema evolution. Some small annotations and a builder method.
Existing row format compatibility unchanged

Benchmark

withSchemaEvolution() is an opt-in feature that adds a new row-codec path; it does not modify any existing serialization hot path.
There is no apache/main baseline to compare against — SchemaEvolutionSuite exercises withSchemaEvolution(), which does not exist
on main — so the benchmark measures two things directly: the steady-state cost of enabling the flag, and projection-vs-current
parity.

Bounded JMH run (JDK 26, 2 forks × 4 iterations, 1s each, -prof gc). B/op is gc.alloc.rate.norm (bytes allocated per operation).

Benchmark Throughput (ops/s) B/op
currentDecode 17.6M 312
currentDecodeNoEvolution 16.6M 312
encode 15.9M 152
encodeNoEvolution 15.8M 152
compactCurrentDecode 16.4M 280
compactCurrentDecodeNoEvolution 16.3M 280
compactEncode 16.4M 144
compactEncodeNoEvolution 15.5M 144
olderDecode 24.5M 216
compactOlderDecode 24.9M 192

Findings:

  • Enabling evolution adds zero allocation on the current path. B/op is byte-identical between the evolution-on and *NoEvolution
    variants on every path (decode 312/312, encode 152/152, compact decode 280/280, compact encode 144/144).
  • Throughput overhead of the flag is within the run's noise band. Every on-vs-off pair overlaps within error. This bounded run has
    ~10% confidence intervals on the no-evolution variants, so the throughput claim is "no measurable difference," not a tight bound;
    allocation is exact.
  • Projection (older-version) decode is not penalized versus current decode. It allocates less here (216 vs 312 B/op standard; 192
    vs 280 compact) because it reads the narrower V1 schema, not because projection is inherently cheaper. Each projection codec holds its
    historical schema's precomputed row layout, so there is no per-decode rebuild.

Limitations

  • Producer and consumer must agree on the withSchemaEvolution() flag; the two
    framings are not wire-compatible. A flag-mismatched peer fails loudly with
    ClassNotCompatibleException (except evolution-off reading evolution-on bytes,
    which is undefined). Adopt by enabling the flag on both sides in a release that
    changes no schema, then evolve schemas once every peer is on the new build.
  • Evolution-enabled payloads are Java-only; cross-language consumers (Python,
    C++) cannot read them.
  • A versioned bean used as a map key is read with the current schema only, not
    dispatched to a projection codec (map keys carry no per-payload hash).
  • The number of generated projection codec classes grows as the product of the
    version counts of the distinct nested versioned bean classes. Retire entries
    from a bean's History interface once you no longer need to read payloads from
    that range to bound the growth.

Comment thread docs/guide/java/row-format.md
Comment thread docs/guide/java/row-format.md Outdated
Comment thread docs/guide/java/row-format.md Outdated
Comment thread docs/guide/java/row-format.md
@stevenschlansker stevenschlansker force-pushed the row-codec-schema-versions branch from eabbce4 to 30c5ba3 Compare June 26, 2026 15:17
Claude (on behalf of Steven Schlansker) added 9 commits June 26, 2026 15:27
Opt in with `.withSchemaEvolution()` on any row, array, or map codec
builder. Fields carry `@ForyVersion(since, until)`; removed fields are
listed on a nested interface referenced from
`@ForySchema(removedFields = ...)`, which preserves parameterized types
like `List<String>`. Older payloads are dispatched at read time; nothing
changes when the flag is off. Standard and compact formats supported;
interface-typed beans included.
…p codecs

BinaryArrayEncoder.encode(T) and BinaryMapEncoder.encode(T) previously composed
the hash-prefixed payload through MemoryUtils.buffer + writeInt64 + writeBytes +
getBytes, allocating three byte[] copies and a MemoryBuffer per call. Build the
result directly into a single byte[]: wrap it to write the 8-byte hash header,
then System.arraycopy the body in. The non-evolution paths are unchanged.
Adds RowFormatAllocationProbe, a thread-allocation harness that measures
per-encode allocations for the evolution-enabled array/map row codecs so
the one-allocation-per-encode property can be checked directly.

(The compact-row layout caching this commit originally introduced is now
provided by upstream's CompactRowLayout; only the probe remains.)
The strict schema hash already recurses through StructType, so two payloads
whose inner-struct shapes differ produce different outer hashes. The
implementation gap was in SchemaHistory.build, which only enumerated the
outer bean's own version boundaries — projection codecs for "outer V=K with
inner V=L" weren't generated, so older inner shapes failed to deserialize
even though the hash distinguished them.

Implementation:

- SchemaHistory.build now recurses into nested-bean fields whose type
  carries schema-evolution annotations, builds each inner's history, and
  cross-products over inner versions when enumerating outer versions. Each
  VersionedSchema now carries a map of (nested bean class -> chosen inner
  version) so the codec builder can wire the right inner projection codec.

- RowCodecBuilder.evolvingBuildForWriter emits one projection codec class
  per cross-product combination, using a per-nested-bean-type suffix map
  passed down through Encoding/RowEncoderBuilder. BaseBinaryEncoderBuilder
  exposes a `nestedBeanSuffix(TypeRef)` hook that the projection builder
  overrides to look up each nested bean's right suffix.

- Inner projection classes are generated recursively from
  nestedSuffixesFor(), so a deeply-nested versioned bean produces the
  required class tree at outer-build time.

Class-count complexity is O(product of versions across nesting), but each
projection class is small (decode-only) and only those reachable from the
outer's enumeration are generated.

Regression test nestedInnerEvolution_readerInnerNewerThanWriter and the
two-axis crossOuterAndInnerEvolution both pass. 138 tests in fory-format
green.
…decs

Array and map evolution paths were generating per-outer-version projection
classes named with only the outer version suffix and instantiated without an
inner-version routing map. When the element bean contained a versioned
nested bean, multiple cross-product entries collided on the codegen cache:
the projection always read inner beans at whichever version was compiled
first. The row codec already did this correctly; lift its suffix and nested-
suffix logic into a shared ProjectionRouting helper and reuse it from
ArrayCodecBuilder and MapCodecBuilder. Add array/map regression tests that
fail before the fix and pass after.
… codecs

The existing row test (evolutionFlagAsymmetryFailsLoud) had no array or map
equivalent. Add both. The evolution-on consumer reading evolution-off bytes
direction is loud (ClassNotCompatibleException); the reverse direction is
undefined per the wire format but must not silently return a structurally
plausible value. Rename isVersionedBeanElement/Value to isBeanElement/Value
with a doc comment, since the predicate is just isBean — calling it
"versioned" suggested the unversioned-bean case was excluded.
…ures collapse

bySignature.putIfAbsent could store a non-all-current cross-product combination
under the signature that build() later marks as the writer-side current. The
stored VS's nestedBeanVersions would then misreport at least one inner bean
as living at a non-current version, violating the documented contract on
current().nestedBeanVersions(). Reachable only if two combinations canonicalize
to the same outer signature, which today's inner-bySignature collapse prevents,
but the contract should not depend on that. Add a contract test that asserts
the invariant for a deeply nested versioned bean.
@ForyVersion declares RECORD_COMPONENT as a valid target but no test exercised
the record path. Add three cases in fory-latest-jdk-tests: a record with a
String field added at v2, a record with the @ForySchema-removed-field History
interface, and a record with a primitive int field added at v2 (verifying the
0 default).
Tighten the row-format schema-evolution doc to reflect the actual flag-mismatch
behavior (loud in one direction, undefined in the reverse for array/map) and
add a note that the projection codec class count grows as the product of
per-bean version counts in a composition, with retiring history entries as
the way to bound it.
@stevenschlansker stevenschlansker force-pushed the row-codec-schema-versions branch from 30c5ba3 to bc25986 Compare June 26, 2026 15:41
Claude (on behalf of Steven Schlansker) added 4 commits June 26, 2026 16:50
Three small edits in the row-format schema-evolution section: name all
primitive defaults (0, 0.0, false), fold the "parameterized types are
expressed naturally" assertion into the lead-in to the removed-field example,
and drop the trailing sentence that restated what the example already showed.
- Guard array/map evolution decode against payloads smaller than the 8-byte
  schema-hash prefix, failing with ClassNotCompatibleException instead of
  feeding a negative size into pointTo.
- Remove the dead 5-arg loadOrGenProjectionRowCodecClass overload; all callers
  pass the nested-suffix map.
- Replace fully-qualified java.util.* and Schema references with imports.
- Add tests covering the new too-small-payload guards.
Adds SchemaEvolutionSuite under benchmarks/java: encode plus current-version
and older-version (projection) decode benchmarks for evolution-enabled row
codecs. Run with the JMH gc profiler (-prof gc) for repeatable per-op
allocation numbers, including evidence that the projection decode path
allocates no more than the current-schema path (each projection holds its
historical schema's cached row layout).

Replaces the earlier hand-rolled allocation probe main(), which measured only
the non-evolution path and was never run by CI.
Formatting-only: google-java-format line wrapping across the schema-evolution
files. No logic changes.
@stevenschlansker stevenschlansker force-pushed the row-codec-schema-versions branch from b1a051a to ecb433c Compare June 26, 2026 17:13
Claude (on behalf of Steven Schlansker) added 5 commits June 26, 2026 17:54
…te per class

Carry the chosen inner VersionedSchema (with its strict hash) through the
cross-product instead of a bare version number, so nested projection routing
identifies the correct inner subtree to arbitrary depth. Enumerate one
cross-product dimension per nested bean class rather than per field: a writer
writes one definition of a class, so all fields of that class share a version
on the wire. This makes deep nesting and same-class-in-two-fields correct, and
makes the projection-class count a product over distinct nested classes rather
than over fields.
… path

Add the size<8 lower-bound guard to BinaryRowEncoder.decode so a truncated row
payload fails with ClassNotCompatibleException like the array and map paths
already do, instead of computing a negative body size. Swap the runtime
projection lookup maps (row/array/map) from Map<Long,_> to the primitive-keyed
LongMap to drop per-decode Long boxing on the historical-version path; the
current-schema hot path is unaffected. Narrow the catch in
SchemaHistory.isBeanWithVersioning from Exception to RuntimeException with an
accurate comment, and remove a dead null-check in RowEncoderBuilder. Add tests
for the removed-field @ForyVersion validation messages.
ArrayEncoderBuilder and MapEncoderBuilder divided the elapsed nanos by 1_000_000
(milliseconds) but logged the value with a "us" unit, overstating the unit by
1000x. Divide by 1000 so the logged value is microseconds, matching the unit
label and RowEncoderBuilder.
… path

Add evolution-off PersonV2 codecs (standard + compact) and four *NoEvolution
benchmarks so the suite measures the steady-state cost of withSchemaEvolution()
when reading and writing current-version data, not only projection parity.

Bounded JMH run (JDK 26, 2 forks x 4 iters, -prof gc), B/op = gc.alloc.rate.norm:

  currentDecode            17.6M ops/s   312 B/op
  currentDecodeNoEvolution 16.6M ops/s   312 B/op
  encode                   15.9M ops/s   152 B/op
  encodeNoEvolution        15.8M ops/s   152 B/op
  compactCurrentDecode     16.4M ops/s   280 B/op
  compactCurrentDecodeNoEvolution 16.3M 280 B/op
  compactEncode            16.4M ops/s   144 B/op
  compactEncodeNoEvolution 15.5M ops/s   144 B/op
  olderDecode              24.5M ops/s   216 B/op
  compactOlderDecode       24.9M ops/s   192 B/op

Enabling evolution adds zero allocation on the current path (B/op identical
on/off across all four paths); throughput differences are within the bounded
run's noise band. Projection (older) decode is not penalized versus current
decode; it allocates less here because it reads the narrower V1 schema.
SchemaHistory.isBeanWithVersioning probed every nested field's raw type with
Descriptor.getDescriptors to find @ForyVersion descriptors. TypeInference.inferField,
the real encode/decode path, routes collection/map/array/enum field types away from
getDescriptors (they are classified before the isBean branch), so a collection subclass
that shadows a field name across its hierarchy round-trips fine even though getDescriptors
rejects it for duplicate fields. The unguarded probe threw IllegalArgumentException and
broke SchemaHistory.build for such a bean.

Gate getDescriptors behind TypeUtils.isBean, matching inferField's classification, so only
genuine bean field types are introspected. A class that truly cannot be a bean still surfaces
its error through isBean, which fails identically on the real path.

Add a MemoryBuffer streaming round-trip test through a projection hit, covering the
sizeEmbedded int32-prefix framing the byte[] tests skip, and a reproducer
(versionedBeanWithShadowedCollectionFieldBuilds) for the shadowed-collection regression.
@stevenschlansker stevenschlansker force-pushed the row-codec-schema-versions branch from ca39fe8 to aba4338 Compare June 26, 2026 19:20
Claude (on behalf of Steven Schlansker) added 18 commits June 26, 2026 19:30
Move inferNamedField out from between the inferField overloads in
TypeInference so OverloadMethodsDeclarationOrder no longer fails the
Code Style Check job.
SchemaHistory.build discovered nested versioned beans only at a field's raw
type, so a versioned bean appearing as a List element or Map value was never
found: the outer's cross-product carried no dimension for the inner bean, and
its history was never enumerated. A reader whose inner bean had evolved then
had no projection matching an older payload's inner layout, and decode threw
ClassNotCompatibleException.

findVersionedBean now looks through list/array element and map key/value type
refs to locate the versioned bean, mirroring TypeInference's element handling
(component type for arrays, getElementType for iterables) and keeping the
collection-first classification that lets a shadowed-field collection subclass
short-circuit before any Descriptor.getDescriptors probe. The cross-product is
keyed by the discovered bean class, preserving the one-dimension-per-class
invariant. substituteNestedStruct rebuilds the list/map field with the chosen
historical struct in the bean's slot, leaving the wrapper and its nullability
exactly as inferNamedField produced them, so existing direct-field schemas and
hashes are unchanged.

Add evolvingBeanInCollectionField covering an inner bean evolved across a List
and a Map value read by a newer codec.
…11 safety

ElementType.RECORD_COMPONENT is a JDK 16 enum constant. fory-format compiles
with source/target=11 (no --release), so a modern build JDK accepts it but the
class fails at runtime on JDK 11 when @ForyVersion's @target is materialized.
Record components stay covered by FIELD+METHOD: the compiler propagates a
record-component annotation to the backing field and accessor, where
SchemaHistory.lookupForyVersion already reads it.

Add a nested-versioned-record evolution test (RecordRowTest) covering the
cross-product enumeration path with record-component naming, and fix the stale
comment that claimed @ForyVersion targets RECORD_COMPONENT.

Hoist the duplicated schema-history build (the compact-format sort transform plus
SchemaHistory.build) from the row/map/array codec builders into
BaseCodecBuilder.buildSchemaHistory, and extract an evolvingCodec(Class) helper
in the schema-evolution tests to remove repeated builder boilerplate. No wire or
behavior change.
A live field still exists as a Java member, so a finite until silently dropped
it from the current schema (until extends the version set, so latestVersion >=
until excludes the field) and the writer stopped serializing a field the bean
still has, with no error. collectLiveFields now rejects a finite until on a live
field and points the user at the @ForySchema.removedFields history class, which
is the only place a removal should be declared. Mirrors the existing
until==MAX_VALUE guard in collectRemovedFields.
findVersionedBean inspected map keys, so a row field typed Map<KeyBean, V> added
a key-version dimension to the cross-product and generated one projection codec
class per key version. Map keys carry no per-payload hash and are always read
with the current schema (see row-format.md), so those key-version projections
are never dispatched: dead classes plus inflated cross-product growth.

Restrict findVersionedBean and substituteNestedStruct to the map value, matching
the wire format's only routable nested-map position. Add a row-field
Map<versionedKey, V> evolution test that exercises this path.
…g builder

The evolution build path rotated this.schema to the history-derived current
version, and build() relied on reading it back after buildForWriter(). A reused
builder, or a direct buildForWriter() caller such as Encoders.bean, would then
observe the rotated schema. Bundle the resolved schema with the per-writer
factory (RowEncoderFactory) so build() creates its writer from the factory's
schema and the build no longer mutates builder state. No behavior change.
ProjectionCodecFactory.instantiate rebuilt a RowFactory per encoder instance,
though it depends only on the historical schema and codec format, both fixed at
build time. Under the documented one-encoder-per-thread usage this recomputed K
row factories per thread. Build it once in the factory constructor; instantiate()
now only rebuilds the generated codec, which binds the per-instance writer. The
Map and Array projection factories allocate per-instance BinaryArrayWriters that
genuinely bind per-encoder buffers, so they have no analogous hoistable work.
…ion bug

A top-level Map<structKeyBean, versionedValue> codec with schema evolution
corrupts the key when the value is read at a non-current version: the value's
version suffix is applied to the key bean too, and a same-class key/value share
one bean codec keyed by type rather than position, so the key decodes with the
value's historical layout. The fix is position-scoped bean-codec registration in
the map codegen and must activate during the lazy genCode of the value subtree;
it spans shared codegen, so it is tracked separately. The reproducer is disabled
to keep the suite green while documenting the failure precisely.
…ojection

A schema-evolution map codec whose value reads at a historical version
corrupted a struct key. The projection codec applied the value's version
suffix to every nested bean via the type-blind nestedBeanSuffix, and the
bean-codec registration maps were keyed by typeRef. When the key and value
share a class (the reader side is effectively Map<Bean,Bean> with the value
historical and the key current), both collapsed to one registration entry,
so the key reused the value's historical row codec and decoded a current
key row with the wrong field count.

Map keys carry no per-payload version hash and are always read at the
current schema, so route the key position to the current, unsuffixed codec
under a distinct registration key. BaseBinaryEncoderBuilder gains a
beanCodecKey(TypeRef) indirection (default identity, so row/array codecs are
unchanged) and keys its bean maps by it. MapEncoderBuilder overrides
nestedBeanSuffix and beanCodecKey for the key position, gated by an
inKeyPosition flag. The flag is scoped around both expression construction
and genCode of the key subtree, because the encode ForEach registers nested
beans eagerly in its constructor while the decode lazy array registers them
during genCode.

Enables the previously-disabled SchemaEvolutionStressTest#mapStructKeyValueEvolution.
…ersioned bean

A top-level array or map codec only took the schema-evolution path when its
element/value type was directly a bean, so Collection<List<Bean>> and
Map<K, List<Bean>> (or Map<K, Map<.., Bean>>) silently skipped projection: the
writer emitted no strict-hash prefix and the reader decoded older payloads at
the current layout, corrupting reads.

Route both top-level builders through the versioned bean reachable through the
element/value wrapper. SchemaHistory.evolutionBean descends list/map/array
wrappers and returns the bean at the leaf (versioned or not, so an unversioned
bean still emits the prefix and stays wire-compatible); projectThroughWrapper
rebuilds the historical element/value field with the wrapper preserved around
the projected struct, the same substitution the row-field path already uses for
a versioned bean nested in a collection field. The generated projection codec
already reads the wrapper from the container type, so no codegen change is
needed.

Covers the array-codec variant of the same bug as well as the reported map case.
collectLiveFields already rejects any finite until on a live field, so the
subsequent since >= until check could only fire for since == Integer.MAX_VALUE
and was dead for any real annotation. The reachable ordering check remains on
the removed-field path in collectRemovedFields, where a finite until is valid.
Document that schema-evolution decode selects a layout from the 8-byte strict
hash, and that a payload whose hash coincides with one of the reader's historical
layouts is decoded against it. This is the same hash-based dispatch the row
format has always used; the note makes the accepted trade-off explicit.
…row evolution

Every existing added-field evolution test defaults a scalar; defaulting an
added struct or collection slot is a distinct projection path that was
untested. Add a v1->v2 case where v2 introduces a nested struct and a list of
structs absent from the v1 wire, and assert both read back as null.

Also correct the RowFactory Javadoc: the layout is computed once only for the
compact format, which captures a CompactRowLayout in the factory; the default
format builds a BinaryRow per call, matching BinaryRowWriter#newRow.
…when nested in evolving beans

Two related fixes let interface beans work as map keys/values in the row
codec, both for plain inference and schema evolution:

- TypeUtils.isSupported dropped the TypeResolutionContext when recursing
  into map key/value types, calling the context-less overload that resets
  synthesizeInterfaces to false. An interface bean was therefore rejected
  as a map key or value even though the same type is supported as a direct
  field or list element (which thread the context). Thread ctx into both
  map key and value recursions, matching the iterable branch. The error
  surfaced as "Unsupported type <Outer>" because the failed map field made
  isBean(Outer) return false.

- SchemaHistory.isBeanWithVersioning probed for a nested versioned bean
  with the context-less TypeUtils.isBean, so a nested interface bean was
  never recognized as versioned. Its older versions were not enumerated
  into the outer cross-product, and an older inner payload had no matching
  projection, so decode failed with a schema-hash mismatch. Use the same
  synthesize-interfaces context as inferField and evolutionBean.

Tests: ImplementInterfaceTest#testMapValueInterface covers the plain-row
map-value case; SchemaEvolutionTest#evolvingInterfaceBeanNestedInOuterBean
covers a versioned interface bean nested as a field, list element, and map
value across an evolution boundary.
…BeanCodec()

The decode-time IllegalStateException claimed the encoder "should have be
added in serializeForBean()", but this branch moved nested-bean codec
registration into registerBeanCodec(), which serializeForBean() and the
decode-only projection path both call. On the projection path
serializeForBean() never runs, so the old message points a debugger at the
wrong method. Name registerBeanCodec() and fix the "be added" grammar.
…jection fields

isAccessorOfAbsentField matched a leftover interface method to an absent
field's descriptor by name and return type alone. A parameterized method
sharing that name and return type (e.g. a getScore(int) overload of a
since=2 getScore() field) was therefore silenced into a default-value body
during projection instead of throwing, returning wrong data. Guard on
parameterCount() == 0 since an accessor is always no-arg; the live-member
pass only ever removes the no-arg signature.

Also document why SchemaHistory.build needs no cycle guard: inferField's
checkNoCycle, run from RowCodecBuilder's constructor before build(), already
rejects self-referential beans, so the nested-bean recursion is unreachable
for a cycle.
…e framing

Add a soft warn-log in BaseCodecBuilder.buildSchemaHistory when a bean
resolves more than 256 historical schemas, since each becomes a generated
projection codec class and the count grows as the product of per-class
version counts across nested versioned beans. The count is read from the
already-materialized history, so tracking adds only one comparison.

Correct the decode(byte[]) comments in the row/array/map encoders: they
claimed encode writes no prefix, which is misleading now that the schema
hash leads the body (always for rows, under evolution for arrays/maps).
Rename the array/map decode body-length local from payloadSize to bodySize
per the codec read-identifier naming rule.
collectLiveFields and collectRemovedFields read ann.since() without a
lower-bound check, so since=0 (or negative) silently injected a schema
version no writer can emit, unlike every other malformed annotation which
fails fast at build. Validate since >= FIRST_VERSION on both paths.

Also point the nested-bean decode lookup miss at its real cause: a
beanCodecKey() miss means the decode ran outside the key/value position
scope that registered the codec, so name that in the message and comment
the coupling at the choke point instead of the generic registerBeanCodec
hint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant