fix: validate schema field IDs during unmarshal by fallintoplace · Pull Request #1315 · apache/iceberg-go

fallintoplace · 2026-06-24T21:42:32Z

Summary

validate duplicate field IDs while unmarshalling schemas from JSON
add coverage for top-level and nested duplicate IDs returning ErrInvalidSchema

Why

Schema constructors already reject duplicate positive field IDs, but Schema.UnmarshalJSON assigned decoded fields without running the same check. Malformed schema JSON could enter the system and later corrupt ID-based lookups, projections, name mapping, or writes.

Testing

go test . -run '^TestUnmarshalSchemaRejectsDuplicateFieldIDs$' -count=1\n- go test . -count=1\n- go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.11.4 run --timeout=10m .\n- git diff --check

tanmayrauth

LGTM, one nit on test coverage.

tanmayrauth · 2026-06-25T04:47:00Z

 	assert.True(t, tableSchemaSimple.Equals(&schema))
 }

+func TestUnmarshalSchemaRejectsDuplicateFieldIDs(t *testing.T) {


Nice — the top-level and nested-struct cases are the important ones. Since checkDuplicateFieldIDs also walks list element IDs and map key/value IDs (schema.go:118-130), would you consider adding a list or map dup-ID case too? Those branches are only exercised indirectly right now, and they're the easiest to silently break later.

zeroshade

Thanks for adding this. The duplicate-ID detection is correct, and I like the recursion into struct/list/map plus the negative-placeholder carve-out. Requesting changes on one gap relative to the PR title (validate schema field IDs) - noted inline.

zeroshade · 2026-06-26T17:10:46Z

 		return err
 	}

+	if err := checkDuplicateFieldIDs(nil, aux.Fields); err != nil {


This catches duplicate positive IDs, but missing list/map IDs still slip through. checkDuplicateFieldIDs only records f.ID > 0 (schema.go:101), and an omitted list element-id or map key-id/value-id decodes to 0 - so a schema that omits one of those still unmarshals cleanly (a lone 0 isn't a duplicate). If the goal is to validate field IDs on unmarshal, it'd be worth rejecting missing/zero element/key/value IDs too (e.g. decode them as *int and error when absent), while still allowing the negative placeholders used during construction. Is the broader validation in scope here, or is this intentionally limited to duplicates?

laskoviymishka

Almost there — the missing-key (nil) checks and the checkDuplicateFieldIDs reuse across the constructor and the decode path are exactly the right shape. One blocker before this lands.

The == 0 rejection on element-id/key-id/value-id is stricter than the spec asks for. The spec only requires these IDs to be unique in the table schema — it doesn't reserve 0 or require positivity, and Java, PyIceberg, and iceberg-rust all parse 0 without complaint. So this makes iceberg-go permanently unable to read otherwise-valid metadata those clients wrote, and it breaks our own round-trip: MarshalJSON(&ListType{}) emits element-id: 0, which this same UnmarshalJSON now refuses. Worth noting too that these methods fire on every list/map decode — partition fields, sort orders, REST responses — so the effect is a global tightening of list/map decoding, not just schema validation.

Before merge:

Drop the three *aux.ID == 0 / *aux.KeyID == 0 / *aux.ValueID == 0 checks; keep the nil checks and the duplicate-ID check.
Add a blank line before each consecutive guard block — nlreturn will fail CI otherwise.
Flip the two "zero …" test rows to expect success, add the missing zero-map-key-id / missing-map-value-id cases, and add one positive round-trip case asserting a valid list/map schema still unmarshals.

The nil checks and duplicate-ID validation are good as-is. Happy to take another pass once the zero-checks come out.

laskoviymishka · 2026-06-29T09:53:19Z

+	if aux.ID == nil {
+		return fmt.Errorf("%w: field is missing required 'element-id' key in JSON", ErrInvalidSchema)
+	}
+	if *aux.ID == 0 {
+		return fmt.Errorf("%w: field 'element-id' must not be 0", ErrInvalidSchema)
+	}


The nil check here is right — it mirrors Java's JsonUtil.getInt existence check, so keep it. The == 0 rejection is the blocker, though, and I'd drop it.

The spec only requires field IDs to be unique in the table schema — it doesn't reserve 0 or require positivity, and Java, PyIceberg, and iceberg-rust all parse 0 without complaint. So this makes iceberg-go permanently unable to read otherwise-valid metadata any of them wrote with element-id: 0. It also breaks our own round-trip: ListType.ElementID is a plain int, so json.Marshal(&iceberg.ListType{}) emits element-id: 0 and this same method then refuses it.

Suggested change

if aux.ID == nil {

return fmt.Errorf("%w: field is missing required 'element-id' key in JSON", ErrInvalidSchema)

}

if *aux.ID == 0 {

return fmt.Errorf("%w: field 'element-id' must not be 0", ErrInvalidSchema)

}

if aux.ID == nil {

return fmt.Errorf("%w: field is missing required 'element-id' key in JSON", ErrInvalidSchema)

}

Reject 0 vs. accept it as the spec allows — wdyt?

laskoviymishka · 2026-06-29T09:53:20Z

+	if aux.KeyID == nil {
+		return fmt.Errorf("%w: field is missing required 'key-id' key in JSON", ErrInvalidSchema)
+	}
+	if *aux.KeyID == 0 {
+		return fmt.Errorf("%w: field 'key-id' must not be 0", ErrInvalidSchema)
+	}
+	if aux.ValueID == nil {
+		return fmt.Errorf("%w: field is missing required 'value-id' key in JSON", ErrInvalidSchema)
+	}
+	if *aux.ValueID == 0 {
+		return fmt.Errorf("%w: field 'value-id' must not be 0", ErrInvalidSchema)
+	}


Same blocker on the map side — keep both nil checks, drop the two == 0 checks. Java's mapFromJson parses zero key-id/value-id fine, so rejecting them breaks reading Java/Py/Rust-written maps the same way. The suggestion also spaces the two guards out, which keeps nlreturn happy.

Suggested change

if aux.KeyID == nil {

return fmt.Errorf("%w: field is missing required 'key-id' key in JSON", ErrInvalidSchema)

}

if *aux.KeyID == 0 {

return fmt.Errorf("%w: field 'key-id' must not be 0", ErrInvalidSchema)

}

if aux.ValueID == nil {

return fmt.Errorf("%w: field is missing required 'value-id' key in JSON", ErrInvalidSchema)

}

if *aux.ValueID == 0 {

return fmt.Errorf("%w: field 'value-id' must not be 0", ErrInvalidSchema)

}

if aux.KeyID == nil {

return fmt.Errorf("%w: field is missing required 'key-id' key in JSON", ErrInvalidSchema)

}

if aux.ValueID == nil {

return fmt.Errorf("%w: field is missing required 'value-id' key in JSON", ErrInvalidSchema)

}

laskoviymishka · 2026-06-29T09:53:20Z

+			contains: "field is missing required 'element-id' key in JSON",
+		},
+		{
+			name: "zero list element id",


Once the == 0 checks come out, this row and "zero map value id" assert the wrong outcome — they'd need to flip to require.NoError. While here: the table never exercises zero map key-id or missing map value-id (both are real branches), and there's no positive case proving a valid non-zero list/map schema still unmarshals after the *int change — tableSchemaNested goes through the constructor, not JSON decode, so a regression there wouldn't be caught. I'd add a round-trip case asserting require.NoError.

fallintoplace requested a review from zeroshade as a code owner June 24, 2026 21:42

tanmayrauth reviewed Jun 25, 2026

View reviewed changes

validate duplicate schema field ids on unmarshal

30d2edd

fallintoplace force-pushed the fix/schema-unmarshal-duplicate-field-ids branch from b5da3aa to 30d2edd Compare June 25, 2026 07:30

zeroshade requested changes Jun 26, 2026

View reviewed changes

fix: require list and map field ids on schema unmarshal

bbc5516

laskoviymishka requested changes Jun 29, 2026

View reviewed changes

fix: allow zero collection field ids on unmarshal

a7970ce

laskoviymishka approved these changes Jun 29, 2026

View reviewed changes

zeroshade approved these changes Jul 3, 2026

View reviewed changes

zeroshade merged commit 4fd86d5 into apache:main Jul 3, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: validate schema field IDs during unmarshal#1315

fix: validate schema field IDs during unmarshal#1315
zeroshade merged 3 commits into
apache:mainfrom
fallintoplace:fix/schema-unmarshal-duplicate-field-ids

fallintoplace commented Jun 24, 2026

Uh oh!

tanmayrauth left a comment

Uh oh!

tanmayrauth Jun 25, 2026

Uh oh!

zeroshade left a comment

Uh oh!

zeroshade Jun 26, 2026

Uh oh!

laskoviymishka left a comment

Uh oh!

laskoviymishka Jun 29, 2026

Uh oh!

laskoviymishka Jun 29, 2026

Uh oh!

laskoviymishka Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

fallintoplace commented Jun 24, 2026

Uh oh!

tanmayrauth left a comment

Choose a reason for hiding this comment

Uh oh!

tanmayrauth Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

zeroshade left a comment

Choose a reason for hiding this comment

Uh oh!

zeroshade Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka left a comment

Choose a reason for hiding this comment

Uh oh!

laskoviymishka Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants