Skip to content

Add telemetry for bundle config-remote-sync#5556

Merged
ilyakuz-db merged 8 commits into
mainfrom
config-remote-sync-telemetry
Jun 17, 2026
Merged

Add telemetry for bundle config-remote-sync#5556
ilyakuz-db merged 8 commits into
mainfrom
config-remote-sync-telemetry

Conversation

@ilyakuz-db

Copy link
Copy Markdown
Contributor

Changes

Add a bundle_config_remote_sync_event telemetry event, emitted on every execution of the bundle config-remote-sync command (previously this command emitted no telemetry at all).

The event records aggregate data only: change counts by operation and resource type, file counts, variable-reference restoration counts by mechanism, a count of remote string values containing ${, the engine, and an enumerated error category. No resource names, keys, paths, or configuration values are logged.

Also wraps the missing-snapshot failure in a sentinel error (configsync.ErrStateSnapshotNotFound) so it can be classified.

Note: the server-side proto changes for this event are in flight; until they ship, the new event field is dropped at ingestion.

Why

The command is used by the in-workspace UI sync flow, and today its executions and failures are invisible in telemetry, which makes investigating sync-related issues unnecessarily hard.

Tests

Unit tests for stats collection and restoration counters; two new acceptance tests (happy path detecting a remote job rename, and the error path without deployed state reporting STATE_NOT_FOUND).

@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 0786514

Run: 27676676699

Env 🟨​KNOWN 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 264 998 8:46
🟨​ aws windows 7 15 266 996 10:48
💚​ aws-ucws linux 7 15 360 912 9:32
💚​ aws-ucws windows 7 15 362 910 11:36
💚​ azure linux 1 17 267 996 7:04
💚​ azure windows 1 17 269 994 9:11
🔄​ azure-ucws linux 4 1 17 361 908 14:56
💚​ azure-ucws windows 1 17 367 906 11:29
💚​ gcp linux 1 17 263 999 9:24
💚​ gcp windows 1 17 265 997 9:00
26 interesting tests: 15 SKIP, 7 KNOWN, 4 flaky
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🔄​ TestAccept/bundle/deployment/bind/alert 🙈​s 🙈​s 🙈​s 🙈​s ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p
🔄​ TestAccept/bundle/deployment/bind/alert/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestAccept/bundle/resources/apps/inline_config ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p
🔄​ TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 26 slowest tests (at least 2 minutes):
duration env testname
5:22 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
5:15 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:58 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:43 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:53 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:50 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:39 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:33 aws-ucws windows TestAccept
3:30 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:30 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:25 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:22 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:20 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:15 gcp windows TestAccept
3:14 azure windows TestAccept
3:13 azure-ucws windows TestAccept
3:12 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:08 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:06 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:00 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:50 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:47 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:44 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:36 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:07 azure-ucws linux TestAccept/bundle/resources/dashboards/generate_inplace/DATABRICKS_BUNDLE_ENGINE=direct
2:03 azure-ucws linux TestAccept/bundle/resources/volumes/recreate/DATABRICKS_BUNDLE_ENGINE=direct

@ilyakuz-db ilyakuz-db force-pushed the config-remote-sync-telemetry branch from fa19e19 to d955798 Compare June 15, 2026 15:39
@ilyakuz-db ilyakuz-db temporarily deployed to test-trigger-is June 15, 2026 15:40 — with GitHub Actions Inactive
@ilyakuz-db ilyakuz-db temporarily deployed to test-trigger-is June 15, 2026 15:40 — with GitHub Actions Inactive
…ounters

The unit tests cover add_count/remove_count, recreate_forcing_changes, and the
variable-restoration counts, but they were missing acceptance coverage. These
scenarios all round-trip against the local mock testserver, so they don't need a
cloud test.

- job_fields: assert the telemetry event (add_count/remove_count + per-resource
  breakdown). The remote edits already add and remove keyed fields.
- resolve_variables: assert refs_retargeted (env param re-targeted to a different
  variable) and refs_from_siblings (added params/tasks restored from siblings).
- New telemetry/config-remote-sync-recreate: a remote change to the immutable
  ingestion_definition.connection_name makes recreate_forcing_changes non-zero.
  storage (the other immutable pipeline field) can't be used because configsync
  always skips it as a backend-generated default.

The two enriched tests run on both engines, so the asserted event drops the
engine name (every other counter is engine-agnostic); RecordRequests is enabled
and out.requests.txt is removed in cleanup so it isn't compared.
Comment thread libs/telemetry/scrub.go
@@ -1,11 +1,25 @@
package phases
package telemetry

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to telemetry/ as we need to use it in config-remote-sync errors

Comment thread bundle/configsync/variables.go Outdated
}
if len(refs) == 1 {
for ref := range refs {
stats.FromSiblings++

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add nil check to avoid panics?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not ever fail because of telemetry code paths.

Telemetry is best-effort and must not break config-remote-sync:

- CollectChangeStats and LogTelemetry now defer a recover() that logs at
  debug and swallows any panic (e.g. the dresources.MustLoad* lifecycle-config
  loaders called from isRecreateForcing, or anything during proto build/upload).
- RestoreStats counters are incremented through nil-safe incRetargeted /
  incFromSiblings methods, so threading the pointer through the deep restoration
  recursion can never nil-panic; the now-redundant nil guard in
  RestoreVariableReferences is removed.
…-tests

# Conflicts:
#	bundle/phases/telemetry.go
@ilyakuz-db ilyakuz-db added this pull request to the merge queue Jun 17, 2026
Merged via the queue into main with commit 7a8119d Jun 17, 2026
25 checks passed
@ilyakuz-db ilyakuz-db deleted the config-remote-sync-telemetry branch June 17, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants