Skip to content

feat: Add file-based dashboard provisioner#1962

Open
ZeynelKoca wants to merge 1 commit intohyperdxio:mainfrom
ZeynelKoca:feature/dashboard-provisioner
Open

feat: Add file-based dashboard provisioner#1962
ZeynelKoca wants to merge 1 commit intohyperdxio:mainfrom
ZeynelKoca:feature/dashboard-provisioner

Conversation

@ZeynelKoca
Copy link
Copy Markdown
Contributor

@ZeynelKoca ZeynelKoca commented Mar 22, 2026

Summary

Add a provision-dashboards task that reads .json files from a directory and upserts dashboards into MongoDB, following the existing task system pattern (same as check-alerts).

Provisioned dashboards are flagged with provisioned: true so they never overwrite user-created dashboards with the same name. Files are validated against DashboardWithoutIdSchema. Removing a file does not delete the dashboard (safe by default, same as Grafana). The task is deployment-agnostic: it reads from a directory, regardless of how files get there.

When DASHBOARD_PROVISIONER_DIR is set, entry.prod.sh automatically starts the task as an additional process alongside the API, App, and check-alerts.

Note: Users can currently edit provisioned dashboards through the UI, but changes will be overwritten on the next sync cycle. Grafana handles this by blocking saves on provisioned dashboards. Adding a similar guard would be a good follow-up to improve UX.

Variable Required Default Description
DASHBOARD_PROVISIONER_DIR Yes Directory to read .json files from
DASHBOARD_PROVISIONER_TEAM_ID No* Scope to a specific team ID
DASHBOARD_PROVISIONER_ALL_TEAMS No* false Set to true to provision to all teams

*One of DASHBOARD_PROVISIONER_TEAM_ID or DASHBOARD_PROVISIONER_ALL_TEAMS=true is required.

How to test locally or on Vercel

  1. Create a directory with a dashboard JSON file:
    mkdir /tmp/dashboards
    echo '{"name":"Test Dashboard","tiles":[],"tags":[]}' > /tmp/dashboards/test.json
  2. Run the task:
    DASHBOARD_PROVISIONER_DIR=/tmp/dashboards DASHBOARD_PROVISIONER_ALL_TEAMS=true
    ./packages/api/bin/hyperdx task provision-dashboards
  3. Verify the dashboard appears in the UI
  4. Modify the JSON file, run again, verify it updates
  5. Delete the JSON file, run again, verify the dashboard persists

References

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 22, 2026

🦋 Changeset detected

Latest commit: 6023b15

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Minor
@hyperdx/app Minor
@hyperdx/otel-collector Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 22, 2026

@ZeynelKoca is attempting to deploy a commit to the HyperDX Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 22, 2026

PR Review

  • ⚠️ Duplicate dashboard names create confusing UX → When a provisioned file shares a name with a user-created dashboard, both coexist with identical names in the UI (see syncDashboards L53-63). The warning is logged but the user sees two "Foo" dashboards. Consider skipping provision for that name instead of coexisting, or at minimum note this explicitly in the UI. The follow-up guard mentioned in the PR description should be prioritized.

  • ⚠️ asyncDispose closes the shared mongoose connection (provisionDashboards/index.ts:165) → Consistent with the checkAlerts provider pattern, and fine here since each task runs as a single-shot process. No issue — just confirming it's intentional.

  • ✅ Task runner, shell entry point, Zod validation, partial unique index, and test coverage all look correct and follow project conventions.

@ZeynelKoca ZeynelKoca force-pushed the feature/dashboard-provisioner branch 17 times, most recently from 4c24c45 to abdafb3 Compare March 22, 2026 23:40
ZeynelKoca added a commit to ZeynelKoca/ClickStack-helm-charts that referenced this pull request Mar 22, 2026
k8s-sidecar watches ConfigMaps labeled "hyperdx.io/dashboard: true"
across all namespaces and writes dashboard JSON to a shared volume.
HyperDX reads and upserts them natively via file-based provisioner.

Requires hyperdxio/hyperdx#1962
@ZeynelKoca ZeynelKoca force-pushed the feature/dashboard-provisioner branch 5 times, most recently from 3e9be8a to ffe6158 Compare March 23, 2026 12:43
@ZeynelKoca ZeynelKoca changed the title Add file-based dashboard provisioner featu: Add file-based dashboard provisioner Mar 23, 2026
@ZeynelKoca ZeynelKoca changed the title featu: Add file-based dashboard provisioner feat: Add file-based dashboard provisioner Mar 23, 2026
@ZeynelKoca ZeynelKoca force-pushed the feature/dashboard-provisioner branch from ffe6158 to 10ae4d2 Compare March 24, 2026 08:21
Copy link
Copy Markdown
Contributor

@dhable dhable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the well thought out contribution. Overall this looks good but we don't typically have long running processes running as tasks inside the api process. We have put background processing like this as a task.

This approach provides a flexible deployment. Using our full stack image, we start the tasks as separate processes and then have a CronJob library manage scheduling of execution. For more advanced deployments, these scheduled tasks can run outside the main process, like using a k8s CronJob or other scheduling system.

I think this implementation could be easily adapted to that design since the startDashboardProvisioner() function is almost a direct fit for the task system. The check-alert task also needs to access Mongo so you should be able to connect in the same way.

@ZeynelKoca ZeynelKoca force-pushed the feature/dashboard-provisioner branch 4 times, most recently from 32daf0f to 65f5509 Compare March 29, 2026 18:01
@ZeynelKoca
Copy link
Copy Markdown
Contributor Author

PR Review

The implementation is well-structured and follows existing task patterns closely. A few items worth addressing:

* ⚠️ **`asyncDispose()` closes the mongoose connection after every cron tick** → In built-in scheduler mode (`RUN_SCHEDULED_TASKS_EXTERNALLY=false`), `asyncDispose()` is called in the `finally` block after each tick, closing the connection, then `connectDB()` re-opens it on the next tick. Verify `connectDB()` handles reconnection after explicit close (pre-existing pattern for other tasks, but worth confirming it works here).

* ⚠️ **No API-level guard for provisioned dashboards** → Users can freely edit or delete provisioned dashboards via the API. The provisioner will overwrite their edits on the next sync. Consider either blocking edits to `provisioned: true` dashboards in the update/delete API, or at minimum surfacing the `provisioned` flag in the UI so users understand the behavior.

* ⚠️ **Partial unique index added to an existing collection** → The new `{ name, team }` unique partial index on documents with `provisioned: true` will be built against existing data on first deployment. Since no existing documents have `provisioned: true`, this is safe, but worth verifying the index creation is non-blocking for large deployments (consider `{ background: true }` if the schema DSL supports it).

* ✅ Shell script changes look safe — `DASHBOARD_PROVISIONER_DIR` is only used in a conditional check; no injection risk.

* ✅ `asyncDispose()` matches the `HdxTask` interface contract correctly.

* ✅ Test coverage is comprehensive and uses real DB (consistent with project pattern of no DB mocks).
  1. asyncDispose reconnection: tested and works as intended. Same pattern as check-alerts, which also closes and reconnects each tick via its provider
  2. API guard: was already mentioned in the PR description. Recommend a follow-up PR to not overbloat this PR with code changes (since it also involves frontend work)
  3. Index creation: as mentioned, no existing documents have provisioned: true, so index builds instantly

ZeynelKoca added a commit to ZeynelKoca/ClickStack-helm-charts that referenced this pull request Mar 29, 2026
k8s-sidecar watches ConfigMaps labeled "hyperdx.io/dashboard: true"
across all namespaces and writes dashboard JSON to a shared volume.
HyperDX reads and upserts them natively via file-based provisioner.

Requires hyperdxio/hyperdx#1962
@ZeynelKoca
Copy link
Copy Markdown
Contributor Author

Thanks for the well thought out contribution. Overall this looks good but we don't typically have long running processes running as tasks inside the api process. We have put background processing like this as a task.

This approach provides a flexible deployment. Using our full stack image, we start the tasks as separate processes and then have a CronJob library manage scheduling of execution. For more advanced deployments, these scheduled tasks can run outside the main process, like using a k8s CronJob or other scheduling system.

I think this implementation could be easily adapted to that design since the startDashboardProvisioner() function is almost a direct fit for the task system. The check-alert task also needs to access Mongo so you should be able to connect in the same way.

Reimplemented with the existing concurrent task system. Relevant ClickStack helm PR is also updated

@ZeynelKoca ZeynelKoca requested a review from dhable March 29, 2026 18:25
@dhable
Copy link
Copy Markdown
Contributor

dhable commented Apr 2, 2026

@claude summarize why the integration test action is failing

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Claude Code is working…

I'll analyze this and get back to you.

View job run

@dhable
Copy link
Copy Markdown
Contributor

dhable commented Apr 2, 2026

Hey! I took a look at the integration test failure and tracked down the issue.

The "provisions to multiple teams" test in provisionDashboards.test.ts (line 149-151) calls createTeam() twice in the same test case — once for "Team A" and once for "Team B". The problem is that createTeam() in controllers/team.ts calls isTeamExisting(), which checks countDocuments({}) > 0 and throws "Team already exists" if any team is already in the database. So the second createTeam call always fails because Team A already exists.

Every other test in the codebase only calls createTeam() once per test case. A quick fix would be to use the Team model directly for the second team, e.g.:

const teamB = await new Team({ name: 'Team B' }).save();

(You'll need to add import Team from '@/models/team' as well.)

The knip check failure is unrelated — it's a 403 permissions error when the workflow tries to post its comment to the PR.

ZeynelKoca added a commit to ZeynelKoca/ClickStack-helm-charts that referenced this pull request Apr 2, 2026
k8s-sidecar watches ConfigMaps labeled "hyperdx.io/dashboard: true"
across all namespaces and writes dashboard JSON to a shared volume.
HyperDX reads and upserts them natively via file-based provisioner.

Requires hyperdxio/hyperdx#1962
@ZeynelKoca ZeynelKoca force-pushed the feature/dashboard-provisioner branch from fcb0ecc to f914080 Compare April 2, 2026 18:06
@ZeynelKoca
Copy link
Copy Markdown
Contributor Author

Hey! I took a look at the integration test failure and tracked down the issue.

The "provisions to multiple teams" test in provisionDashboards.test.ts (line 149-151) calls createTeam() twice in the same test case — once for "Team A" and once for "Team B". The problem is that createTeam() in controllers/team.ts calls isTeamExisting(), which checks countDocuments({}) > 0 and throws "Team already exists" if any team is already in the database. So the second createTeam call always fails because Team A already exists.

Every other test in the codebase only calls createTeam() once per test case. A quick fix would be to use the Team model directly for the second team, e.g.:

const teamB = await new Team({ name: 'Team B' }).save();

(You'll need to add import Team from '@/models/team' as well.)

The knip check failure is unrelated — it's a 403 permissions error when the workflow tries to post its comment to the PR.

Thanks for taking a look, I implemented your suggestion for the failing test case

ZeynelKoca added a commit to ZeynelKoca/ClickStack-helm-charts that referenced this pull request Apr 3, 2026
k8s-sidecar watches ConfigMaps labeled "hyperdx.io/dashboard: true"
across all namespaces and writes dashboard JSON to a shared volume.
HyperDX reads and upserts them natively via file-based provisioner.

Requires hyperdxio/hyperdx#1962
@ZeynelKoca ZeynelKoca force-pushed the feature/dashboard-provisioner branch from f914080 to 6023b15 Compare April 3, 2026 22:14
@github-actions github-actions bot added the review/tier-4 Critical — deep review + domain expert sign-off label Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Critical-path files (5):
    • packages/api/src/tasks/__tests__/types.test.ts
    • packages/api/src/tasks/index.ts
    • packages/api/src/tasks/provisionDashboards/__tests__/provisionDashboards.test.ts
    • packages/api/src/tasks/provisionDashboards/index.ts
    • packages/api/src/tasks/types.ts

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Files changed: 9
  • Lines changed: 263 (+ 351 in test files, excluded from tier calculation)
  • Branch: feature/dashboard-provisioner
  • Author: ZeynelKoca

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

ZeynelKoca added a commit to ZeynelKoca/ClickStack-helm-charts that referenced this pull request Apr 7, 2026
k8s-sidecar watches ConfigMaps labeled "hyperdx.io/dashboard: true"
across all namespaces and writes dashboard JSON to a shared volume.
HyperDX reads and upserts them natively via file-based provisioner.

Requires hyperdxio/hyperdx#1962
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-4 Critical — deep review + domain expert sign-off waiting-on-engineering

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants