fix(table): reject nil uuid assignment updates#1329
Conversation
tanmayrauth
left a comment
There was a problem hiding this comment.
LGTM, clean fix and the guard sits in the right place. One optional question about reassignment semantics, inline.
| return nil | ||
| } | ||
|
|
||
| func (b *MetadataBuilder) SetUUID(uuid uuid.UUID) error { |
There was a problem hiding this comment.
Nice, routing the guard through SetUUID means the assign-uuid update path (updates.go:177 -> SetUUID) is covered by the same check, so no second guard is needed. And the uuid -> newUUID rename is actually required here, since the old uuid param shadowed the package and uuid.Nil wasn't referenceable. Matches Java's assignUUID precondition too.
| return errors.New("cannot set uuid to null") | ||
| } | ||
|
|
||
| if b.uuid == newUUID { |
There was a problem hiding this comment.
Optional / possibly out of scope: Java's assignUUID enforces a second invariant beyond non-nul, "Cannot reassign uuid", i.e. once a non-nil UUID is set you can't change it to a different one. Here SetUUID
still allows reassigning to a different non-nil UUID. Given the rationale ("the table UUID uniquely identifies the table"), do you want to reject that case too, or keep it for a separate PR?
laskoviymishka
left a comment
There was a problem hiding this comment.
Almost there. The SetUUID guard is the right shape and it matches Java's assignUUID precondition exactly, and I like that you kept the new-table path safe by converting nil to a fresh UUID inside NewMetadataWithUUID before it ever hits the guard.
One blocking thing before merge: I'd wrap the error with iceberg.ErrInvalidArgument like the rest of the validation paths in this builder. Right now it's the only public MetadataBuilder validation that returns a bare errors.New, so callers doing errors.Is(err, iceberg.ErrInvalidArgument) at the catalog layer silently miss it. It also lets both new tests assert on the sentinel instead of the string.
Two things I'd consider but won't hold the PR on: the assign-uuid test's NotEqual(uuid.Nil, ...) is weaker than it looks (capture and compare the original UUID instead), and the guard only covers post-construction mutations — metadata loaded from disk or promoted via ToV2 with a zero UUID still slips through Build(). That's pre-existing and fair to leave for a follow-up, but worth a tracking note so the "no zero UUID on disk" goal doesn't read as fully closed.
Fix the wrapping and this is good to land — happy to take another pass once it's in.
| if b.uuid == uuid { | ||
| func (b *MetadataBuilder) SetUUID(newUUID uuid.UUID) error { | ||
| if newUUID == uuid.Nil { | ||
| return errors.New("cannot set uuid to null") |
There was a problem hiding this comment.
This is the one thing I'd want fixed before merge. Every other input-validation error in MetadataBuilder (SetCurrentSchemaID, SetDefaultSortOrderID, SetDefaultSpecID, SetFormatVersion, …) wraps iceberg.ErrInvalidArgument, so this is the only public validation path that breaks the errors.Is contract — a catalog layer can't classify it without string-matching.
| return errors.New("cannot set uuid to null") | |
| return fmt.Errorf("%w: cannot set uuid to nil", iceberg.ErrInvalidArgument) |
With that, both new tests can move from require.ErrorContains to require.ErrorIs(t, err, iceberg.ErrInvalidArgument), which is the more robust assertion anyway.
| if b.uuid == uuid { | ||
| func (b *MetadataBuilder) SetUUID(newUUID uuid.UUID) error { | ||
| if newUUID == uuid.Nil { | ||
| return errors.New("cannot set uuid to null") |
There was a problem hiding this comment.
Minor, but the Go zero value here is uuid.Nil, so "null" reads as a port of the Java message — I'd say "nil" to match the language. The view builder uses "null" too, so this is a wash on internal consistency, but if you take the wrapping suggestion above this folds into the same line.
|
|
||
| meta, err := b.Build() | ||
| require.NoError(t, err) | ||
| require.NotEqual(t, uuid.Nil, meta.TableUUID()) |
There was a problem hiding this comment.
NotEqual(uuid.Nil, ...) passes for any non-nil value, so this doesn't actually prove the nil update was a no-op — the builder could have mutated to some other UUID and this still goes green. Since buildFromBase parses a real metadata file, I'd capture that original UUID up front and assert meta.TableUUID() equals it, the way TestSetUUIDRejectsNil checks originalUUID. That nails down "rejected and left untouched" rather than just "not nil".
|
|
||
| meta, err := builder.Build() | ||
| require.NoError(t, err) | ||
| require.Equal(t, originalUUID, meta.TableUUID()) |
There was a problem hiding this comment.
The whole design rests on NewMetadata/NewMetadataWithUUID converting uuid.Nil to uuid.New() before reaching the new guard, but nothing here asserts it. A future regression where that conversion gets dropped would now turn into a hard "cannot set uuid" error instead of a generated UUID, and we'd have no test catching it. Could we add a quick case calling NewMetadataWithUUID(..., uuid.Nil) and asserting meta.TableUUID() != uuid.Nil? Cheap, and it locks down the path the guard relies on.
Summary
reject uuid.Nil in the table metadata builder's SetUUID path
add regression coverage for direct builder calls and assign-uuid updates applied through the builder
preserve the existing NewMetadata/NewMetadataWithUUID behavior where uuid.Nil means "generate a fresh table UUID" for brand-new metadata
Why
New table creation already treats uuid.Nil as a sentinel for auto-generation, but the mutable table metadata builder still accepted uuid.Nil later through SetUUID and assign-uuid updates. That made it possible to write the zero UUID into table metadata even though the table UUID is supposed to uniquely identify the table.
Testing
go test ./table -run 'Test(SetUUIDRejectsNil|AssignUUIDUpdate_ApplyRejectsNilUUID|SetFormatVersionV1ToV2AssignsUUID)$' -count=1
go test ./table -count=1
go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.11.4 run --timeout=10m ./table/...
git diff --check