fix: validate schema field IDs during unmarshal#1315
Conversation
tanmayrauth
left a comment
There was a problem hiding this comment.
LGTM, one nit on test coverage.
| assert.True(t, tableSchemaSimple.Equals(&schema)) | ||
| } | ||
|
|
||
| func TestUnmarshalSchemaRejectsDuplicateFieldIDs(t *testing.T) { |
There was a problem hiding this comment.
Nice — the top-level and nested-struct cases are the important ones. Since checkDuplicateFieldIDs also walks list element IDs and map key/value IDs (schema.go:118-130), would you consider adding a list or map dup-ID case too? Those branches are only exercised indirectly right now, and they're the easiest to silently break later.
b5da3aa to
30d2edd
Compare
zeroshade
left a comment
There was a problem hiding this comment.
Thanks for adding this. The duplicate-ID detection is correct, and I like the recursion into struct/list/map plus the negative-placeholder carve-out. Requesting changes on one gap relative to the PR title (validate schema field IDs) - noted inline.
| return err | ||
| } | ||
|
|
||
| if err := checkDuplicateFieldIDs(nil, aux.Fields); err != nil { |
There was a problem hiding this comment.
This catches duplicate positive IDs, but missing list/map IDs still slip through. checkDuplicateFieldIDs only records f.ID > 0 (schema.go:101), and an omitted list element-id or map key-id/value-id decodes to 0 - so a schema that omits one of those still unmarshals cleanly (a lone 0 isn't a duplicate). If the goal is to validate field IDs on unmarshal, it'd be worth rejecting missing/zero element/key/value IDs too (e.g. decode them as *int and error when absent), while still allowing the negative placeholders used during construction. Is the broader validation in scope here, or is this intentionally limited to duplicates?
laskoviymishka
left a comment
There was a problem hiding this comment.
Almost there — the missing-key (nil) checks and the checkDuplicateFieldIDs reuse across the constructor and the decode path are exactly the right shape. One blocker before this lands.
The == 0 rejection on element-id/key-id/value-id is stricter than the spec asks for. The spec only requires these IDs to be unique in the table schema — it doesn't reserve 0 or require positivity, and Java, PyIceberg, and iceberg-rust all parse 0 without complaint. So this makes iceberg-go permanently unable to read otherwise-valid metadata those clients wrote, and it breaks our own round-trip: MarshalJSON(&ListType{}) emits element-id: 0, which this same UnmarshalJSON now refuses. Worth noting too that these methods fire on every list/map decode — partition fields, sort orders, REST responses — so the effect is a global tightening of list/map decoding, not just schema validation.
Before merge:
- Drop the three
*aux.ID == 0/*aux.KeyID == 0/*aux.ValueID == 0checks; keep the nil checks and the duplicate-ID check. - Add a blank line before each consecutive guard block — nlreturn will fail CI otherwise.
- Flip the two "zero …" test rows to expect success, add the missing zero-map-key-id / missing-map-value-id cases, and add one positive round-trip case asserting a valid list/map schema still unmarshals.
The nil checks and duplicate-ID validation are good as-is. Happy to take another pass once the zero-checks come out.
| if aux.ID == nil { | ||
| return fmt.Errorf("%w: field is missing required 'element-id' key in JSON", ErrInvalidSchema) | ||
| } | ||
| if *aux.ID == 0 { | ||
| return fmt.Errorf("%w: field 'element-id' must not be 0", ErrInvalidSchema) | ||
| } |
There was a problem hiding this comment.
The nil check here is right — it mirrors Java's JsonUtil.getInt existence check, so keep it. The == 0 rejection is the blocker, though, and I'd drop it.
The spec only requires field IDs to be unique in the table schema — it doesn't reserve 0 or require positivity, and Java, PyIceberg, and iceberg-rust all parse 0 without complaint. So this makes iceberg-go permanently unable to read otherwise-valid metadata any of them wrote with element-id: 0. It also breaks our own round-trip: ListType.ElementID is a plain int, so json.Marshal(&iceberg.ListType{}) emits element-id: 0 and this same method then refuses it.
| if aux.ID == nil { | |
| return fmt.Errorf("%w: field is missing required 'element-id' key in JSON", ErrInvalidSchema) | |
| } | |
| if *aux.ID == 0 { | |
| return fmt.Errorf("%w: field 'element-id' must not be 0", ErrInvalidSchema) | |
| } | |
| if aux.ID == nil { | |
| return fmt.Errorf("%w: field is missing required 'element-id' key in JSON", ErrInvalidSchema) | |
| } |
Reject 0 vs. accept it as the spec allows — wdyt?
| if aux.KeyID == nil { | ||
| return fmt.Errorf("%w: field is missing required 'key-id' key in JSON", ErrInvalidSchema) | ||
| } | ||
| if *aux.KeyID == 0 { | ||
| return fmt.Errorf("%w: field 'key-id' must not be 0", ErrInvalidSchema) | ||
| } | ||
| if aux.ValueID == nil { | ||
| return fmt.Errorf("%w: field is missing required 'value-id' key in JSON", ErrInvalidSchema) | ||
| } | ||
| if *aux.ValueID == 0 { | ||
| return fmt.Errorf("%w: field 'value-id' must not be 0", ErrInvalidSchema) | ||
| } |
There was a problem hiding this comment.
Same blocker on the map side — keep both nil checks, drop the two == 0 checks. Java's mapFromJson parses zero key-id/value-id fine, so rejecting them breaks reading Java/Py/Rust-written maps the same way. The suggestion also spaces the two guards out, which keeps nlreturn happy.
| if aux.KeyID == nil { | |
| return fmt.Errorf("%w: field is missing required 'key-id' key in JSON", ErrInvalidSchema) | |
| } | |
| if *aux.KeyID == 0 { | |
| return fmt.Errorf("%w: field 'key-id' must not be 0", ErrInvalidSchema) | |
| } | |
| if aux.ValueID == nil { | |
| return fmt.Errorf("%w: field is missing required 'value-id' key in JSON", ErrInvalidSchema) | |
| } | |
| if *aux.ValueID == 0 { | |
| return fmt.Errorf("%w: field 'value-id' must not be 0", ErrInvalidSchema) | |
| } | |
| if aux.KeyID == nil { | |
| return fmt.Errorf("%w: field is missing required 'key-id' key in JSON", ErrInvalidSchema) | |
| } | |
| if aux.ValueID == nil { | |
| return fmt.Errorf("%w: field is missing required 'value-id' key in JSON", ErrInvalidSchema) | |
| } |
| contains: "field is missing required 'element-id' key in JSON", | ||
| }, | ||
| { | ||
| name: "zero list element id", |
There was a problem hiding this comment.
Once the == 0 checks come out, this row and "zero map value id" assert the wrong outcome — they'd need to flip to require.NoError. While here: the table never exercises zero map key-id or missing map value-id (both are real branches), and there's no positive case proving a valid non-zero list/map schema still unmarshals after the *int change — tableSchemaNested goes through the constructor, not JSON decode, so a regression there wouldn't be caught. I'd add a round-trip case asserting require.NoError.
Summary
Why
Schema constructors already reject duplicate positive field IDs, but Schema.UnmarshalJSON assigned decoded fields without running the same check. Malformed schema JSON could enter the system and later corrupt ID-based lookups, projections, name mapping, or writes.
Testing