Skip to content

direct: fix plan JSON round-trip for RemoteState typed fields#5671

Open
denik wants to merge 10 commits into
mainfrom
denik/serialized-plan-number
Open

direct: fix plan JSON round-trip for RemoteState typed fields#5671
denik wants to merge 10 commits into
mainfrom
denik/serialized-plan-number

Conversation

@denik

@denik denik commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Bug: When deploying from a saved plan file (bundle deploy --plan), PlanEntry.RemoteState was decoded as map[string]interface{} instead of the typed struct, causing type assertions in resource adapters to silently fail.
  • Grants: removedGrantPrincipals got an untyped map → principal revocation was skipped on the second deploy.
  • Models: MlflowModelRemote embeds ml.ModelDatabricks which has a custom MarshalJSON that shadowed the outer model_id field — it was dropped from the plan JSON entirely. Permissions were then applied to an empty resource ID.
  • Apps/Clusters: remoteIsStarted/remoteClusterIsRunning got nil lifecycle/state → incorrect lifecycle decisions when deploying from plan.

Fixes:

  1. InitForApply re-hydrates each RemoteState entry by round-tripping through json.Marshal+json.Unmarshal into reflect.New(adapter.RemoteType().Elem()).
  2. MlflowModelRemote gets custom MarshalJSON/UnmarshalJSON (using the SDK marshal package, same pattern as AppRemote) so model_id is preserved.
  3. decoder.UseNumber() in LoadPlanFromFile prevents large int64 values from degrading to float64 through the re-hydration round-trip.

Test plan

  • acceptance/bundle/deploy/readplan/grants-remove-principal: deploys with extra grant principal, saves plan, removes principal, deploys from plan — verifies revocation happens.
  • acceptance/bundle/resources/apps/readplan-lifecycle: verifies no spurious app start when deploying description change from plan.
  • acceptance/bundle/resources/clusters/readplan-lifecycle: verifies no spurious cluster start when deploying tag change from plan.
  • acceptance/bundle/resources/models/readplan-permissions: verifies permission update (manager removed) uses correct numeric model ID when deploying from plan.

All tests run with READPLAN=["", "1"] to cover both inline and file-loaded plan paths.

This pull request and its description were written by Isaac.

@denik denik temporarily deployed to test-trigger-is June 21, 2026 05:11 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 21, 2026 05:11 — with GitHub Actions Inactive
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Approval status: pending

/acceptance/bundle/ - needs approval

30 files changed
Suggested: @janniklasrose
Also eligible: @pietern, @andrewnester, @shreyas-goenka, @anton-107, @lennartkats-db

/bundle/ - needs approval

4 files changed
Suggested: @janniklasrose
Also eligible: @pietern, @andrewnester, @shreyas-goenka, @anton-107, @lennartkats-db

General files (require maintainer)

Files: libs/testserver/clusters.go
Based on git history:

  • @pietern -- recent work in bundle/direct/, bundle/direct/dresources/

Any maintainer (@andrewnester, @anton-107, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 9402fa7

Run: 28077145698

Env 🟨​KNOWN ✅​pass 🙈​skip Time
🟨​ aws linux 1 216 99 2:55
🟨​ aws windows 1 218 97 2:33
🟨​ aws-ucws linux 1 297 18 3:30
🟨​ aws-ucws windows 1 299 16 3:36
🟨​ azure linux 1 216 98 3:08
🟨​ azure windows 1 218 96 2:30
🟨​ azure-ucws linux 1 299 15 3:41
🟨​ azure-ucws windows 1 301 13 3:26
🟨​ gcp linux 1 215 100 2:59
🟨​ gcp windows 1 217 98 2:46
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K

@denik denik force-pushed the denik/serialized-plan-number branch from 345ff9c to 467acab Compare June 23, 2026 12:44
@denik denik temporarily deployed to test-trigger-is June 23, 2026 12:45 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 23, 2026 12:45 — with GitHub Actions Inactive
denik added 9 commits June 23, 2026 18:35
When loading a plan from JSON, PlanEntry.RemoteState (typed any) is decoded
as map[string]interface{} rather than *GrantsState. The type assertion in
removedGrantPrincipals then failed silently, so principals removed from config
were never revoked on deploy --plan.

Fix by re-serializing the map and unmarshaling into *GrantsState when the
direct type assertion fails.

Co-authored-by: Isaac
LoadPlanFromFile decoded any-typed fields (RemoteState, ChangeDesc.Old/New/Remote)
as map[string]interface{}, causing type assertions in resource adapters to fail
(e.g. grants.removedGrantPrincipals did entry.RemoteState.(*GrantsState)).

Fix at the deserialization layer instead of in each adapter:
- Add UseNumber() to LoadPlanFromFile so large integers stay exact through
  the re-hydration step rather than losing precision as float64.
- In InitForApply, re-hydrate each entry's RemoteState from the generic map
  back into the correct typed struct using the adapter's StateType(), mirroring
  the existing NewState re-hydration that was already there.

Resource implementations are now unaware of whether a plan came from a file
or was computed in memory.

Co-authored-by: Isaac
ml.ModelDatabricks embeds a custom MarshalJSON that shadows outer struct fields,
causing model_id to be dropped when serializing MlflowModelRemote to the plan JSON.
Add MarshalJSON/UnmarshalJSON (using the SDK marshal package, same as AppRemote) so
model_id survives the plan file round-trip.

Also fix the RemoteType() vs StateType() comment/code in InitForApply.

Co-authored-by: Isaac
Tests cover the bug where RemoteState loaded from a plan JSON file loses
type information (deserialization to map[string]interface{}), which caused
incorrect lifecycle decisions for apps/clusters and missing model_id for
model permissions.

Also fix libs/testserver/clusters.go to preserve runtime-only fields (State,
ClusterId) on ClustersEdit, which the cluster readplan test depends on.

Co-authored-by: Isaac
…wModelRemote

Rebase introduced duplicate method declarations; keep only the intended pair.

Co-authored-by: Isaac
Also update cluster plan snapshot to include state/lifecycle fields now
preserved by the testserver on cluster edit.

Co-authored-by: Isaac
@denik denik force-pushed the denik/serialized-plan-number branch from 467acab to af3adc7 Compare June 24, 2026 05:04
@denik denik temporarily deployed to test-trigger-is June 24, 2026 05:04 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 24, 2026 05:04 — with GitHub Actions Inactive
yamlfmt requires single space before inline comments.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 24, 2026 05:22 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 24, 2026 05:22 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants