Skip to content

Integrations: Telegram workstation journey, onboarding, and admin panel#181

Open
scion-gteam[bot] wants to merge 88 commits into
mainfrom
scion/integrationsadmin
Open

Integrations: Telegram workstation journey, onboarding, and admin panel#181
scion-gteam[bot] wants to merge 88 commits into
mainfrom
scion/integrationsadmin

Conversation

@scion-gteam

@scion-gteam scion-gteam Bot commented Jun 8, 2026

Copy link
Copy Markdown

Summary

First increment of the integrations architecture (design: .design/integrations-admin.md, #115). Delivers the complete Workstation Telegram journey end-to-end, the onboarding flow it builds on (workstation-improvements work, delivered here), an integrations admin panel, and supporting runtime/project fixes.

Highlights

  • Hot-start transport: telegram broker configured+started at runtime, no hub restart (WireBrokerPlugin w/ rollback); FanOutEventBus AddSpoke/RemoveSpoke (close replaced/removed spokes); per-plugin env via EnsureTelegramEnv (single source of truth for SCION_TELEGRAM_V2).
  • Onboarding: optional Telegram step (getMe validation, polling, settings.yaml); durable wizard resume (re-inits resumed step); image-pull hang fix (SSE pre-check); default image_registry accept-and-continue.
  • Telegram v2: polling starts on Configure when credentialed (idempotent); /register mode-aware reply (plain-text code on workstation; no invalid button URLs).
  • Admin: /admin/integrations multi-integration shell + Telegram panel (status + restart-safe enable/disable, config preserved).
  • Runtime: macOS auto-detection probes podman (single source of truth DetectLocalRuntime, memoized) — fixes container-only default (Broker heartbeat logs spurious 'container executable not found' errors on podman-only hosts #177).
  • Local-dir projects: /workspace mounts the project dir (not its parent); folder picker auto-selects new folders.

Testing

  • go build ./..., unit tests (eventbus/plugin/config/hub/runtime/telegram), tsc — green.
  • Live end-to-end on macOS workstation: onboarding -> Telegram bot config -> agent conversation.
  • Two-pass code review (backend + frontend); all findings addressed.

Notes

  • Includes the workstation-improvements onboarding commits (delivered here, not landing separately).
  • Workstation-focused; Cloud Solo / Single VM / HA modes deferred per the design.

Scion added 30 commits June 9, 2026 14:01
Add workstation-only system endpoints gated by requireWorkstation:

- GET /system/check: GatherDiagnostics returns structured results (git,
  runtime, config checks) with a ready flag
- GET /system/runtime: detect available runtime, return configured profile
- PUT /system/runtime: validate and persist runtime choice
- GET /system/status: ComputeOnboardingStatus for first-run wizard
- POST /system/init: call InitMachine with user-selected harnesses
- PUT /system/identity: already wired in Phase 1

All endpoints require loopback origin and return JSON.
Adds comprehensive tests for the workstation onboarding endpoints and
security primitives introduced in phases 0-5:

- requireWorkstation middleware: 404 when disabled, pass-through when enabled
- assertLoopback: table-driven IPv4/IPv6 loopback validation
- ClassifyPath: managed path detection, legacy groves path, AlreadyLinked via store
- POST /system/init: valid harnesses, unknown harness rejection, empty list rejection
- PUT /system/identity: writes and echoes display name and email
- POST /system/fs/validate-path: managed-path overlap error, normal path classification
- GET /system/fs/list: home directory listing, hidden file filtering, outside-home rejection

Also adds missing server.auth.display_name, server.auth.email, and
server.auth.username key handling in UpdateVersionedSetting, which the
identity endpoint depends on.
Replace the "not yet able to provide pre-built binaries" note with a
Homebrew quick start path. Lead with `brew install scion` + `scion
server start` which opens the onboarding wizard, then keep the existing
go install path as "Install from Source". Add a tip noting that the
wizard handles machine init automatically.
ptone added 29 commits June 9, 2026 14:01
Fix 2: Reorder setup handler to hot-start the plugin BEFORE persisting
to settings.yaml. If hot-start fails, return a clear error (Success:false)
and leave no half-applied state. Config is only written after the plugin
is confirmed running.

Fix 3: Add wireBrokerMu to serialize the get-or-create proxy path in
WireBrokerPlugin. Prevents concurrent setups from both seeing proxy==nil
and creating separate FanOutBrokers (the second StartMessageBroker is
a no-op, losing that spoke).
The telegram plugin (broker_v2.go) only accepts "poll" or "webhook"
for inbound_mode — anything else makes Configure() return an error,
silently preventing the bot from starting getUpdates polling.

Our setup code was writing "polling" in both the hot-start entry
and settings.yaml persistence, causing the bot to never respond.

Add regression test asserting the value matches the plugin's
accepted set.
The telegram plugin runs v1 by default (silently ignores inbound_mode,
no /setup, no group links). The hub requires v2 behavior. Without
SCION_TELEGRAM_V2=1 in the subprocess env, the bot never polls.

Add per-plugin Env support:
- PluginEntry.Env (config.go): extra env vars for plugin subprocess
- DiscoveredPlugin.Env (discovery.go): propagated through discovery
- loadPlugin (manager.go): sets cmd.Env when Env is non-empty
- V1PluginEntryLike.Env (settings.go): keeps adapter in sync

Set SCION_TELEGRAM_V2=1 for the telegram plugin in both paths:
- Hot-start: system_telegram.go PluginEntry.Env
- Startup: server_foreground.go initPluginManager conversion loop

Add tests verifying the env var is set and propagated.
Polling was gated on Subscribe() being called — but on hot-start, the
telegram spoke's Subscribe() is never reached (existing subscriptions
were snapshotted before the spoke was added). Even on fresh install
with no running agents, bootstrapExistingProjects finds nothing to
subscribe.

Fix: start polling idempotently at the end of Configure() when
inbound_mode==poll AND hub credentials (hub_url + hmac_key) are
present. Configure() is called twice: once at load (no creds), once
via ConfigureBroker (with creds). startPolling() is already idempotent
(no-op if pollCancel != nil or webhook mode).

This is correct because inbound messages are delivered via HTTP POST
to /api/v1/broker/inbound, independent of Subscribe() handlers.
Subscribe() was an incidental gate — the broker should poll whenever
it's configured and credentialed.

Add tests:
- TestV2_Configure_PollStartsWithHubCreds: no polling without creds,
  polling starts after second Configure with creds
- TestV2_Configure_PollIdempotent: re-configure doesn't restart polling
Telegram rejects http:// and localhost URLs in inline keyboard buttons
(BUTTON_URL_INVALID), causing /register to silently fail on workstation.

Fix: check if hub URL is a valid public HTTPS URL. If not (localhost,
http), send a plain-text message with the linking code and local URL
instead of an inline button. If even the keyboard send fails on a
public URL, fall back to plain text so the user is never left in
silence.

Also: use a fresh 10s context for the Telegram send (was reusing the
15s ctx already partly consumed by the hub POST), and drop the Markdown
parse mode to avoid parse edge cases.

Add isPublicHTTPS helper with tests covering https/http/localhost/
loopback. Add TestHandleRegister_LocalhostUsesPlainText verifying the
plain-text code path with no inline keyboard.
The wizard resume logic only advanced past steps 0-3 using server-side
status flags (identitySet, runtimeOK, harnessesSeeded). Steps 4-6
(Images, Workspace, Telegram) had no resume path, and the
onboardingStarted flag used sessionStorage which is lost on browser
close.

Now currentStep is saved to localStorage on every change and restored
on load (capped by the saved value and gated by onboardingStarted).
Both onboardingStarted and onboardingStep are cleared when the wizard
completes.
Two issues caused the onboarding image step to hang:

1. Race condition: the pull handler started a goroutine that published
   SSE events immediately, but the frontend only opened its EventSource
   after receiving the HTTP response. When images already existed, all
   events fired before the subscriber connected and were lost — the
   frontend waited forever. Fix: pre-check image existence synchronously
   in the handler, return results inline (initialResults/needPull), and
   only start the SSE goroutine for images that actually need pulling.

2. PullImage on all runtimes (podman, docker, apple container) used
   runInteractiveCommand which attaches stdin/stdout/stderr — wrong
   when called from a headless server goroutine. Changed to
   runSimpleCommand which captures output properly.
The macOS auto-detection path in GetRuntime only checked for the Apple
'container' CLI and fell back directly to docker, skipping podman
entirely. This caused the runtime broker to select 'container' (or
docker) even when podman was the only installed/configured runtime on
macOS — breaking agent create/dispatch/list and heartbeat (#177).

Fix: check container → podman → docker on macOS, mirroring the Linux
path which already checks podman before docker. Each candidate is
verified via exec.LookPath so only an actually-present binary is
selected.

Add a test for settings-based podman resolution.
GetDefaultSettingsData and GetDefaultSettingsDataYAML hardcoded
"container" as the default runtime on macOS regardless of whether the
binary existed. These functions produce the BASE layer of the settings
merge chain (loaded before global/project settings), so even with
correct user settings the embedded "profiles.local.runtime: container"
would be used whenever profile resolution fell back to the "local"
profile.

Fix: call DetectLocalRuntime() (which probes podman → container →
docker by actual binary availability) instead of a bare OS check. The
OS-only fallback is retained as a last resort if no runtime is found.

This is the second part of the runtime-selection regression fix — the
first commit fixed the auto-detection in factory.go, but the embedded
defaults were a separate path that also hardcoded "container" on macOS.
Reorder macOS auto-detection in GetRuntime to match the priority used
by DetectLocalRuntime (podman → container → docker). This ensures
consistent behavior across both the settings defaults layer and the
factory auto-detection fallback.
…e of truth

Replace the inline LookPath-based auto-detection in factory.go with a
call to config.DetectLocalRuntime(), which is the authoritative runtime
probe (podman → container → docker, with --version functional checks).
This eliminates the duplicated detection logic and ensures consistent
behavior across all runtime selection paths.

Update tests to use expectedDefaultRuntime() helper instead of
hardcoded OS-based expectations, matching the probing behavior.
…nue UX

When no image_registry is configured, the Images step now shows an
editable input pre-filled with 'ghcr.io/homebrew-scion' and an
"Accept & Continue" button. On accept, it persists the value via a new
PUT /api/v1/system/image-registry endpoint (loopback-only), then
proceeds to the normal image pull flow.

This replaces the previous error-only block that required the user to
run a CLI command and restart the server.
Add FanOutEventBus.RemoveSpoke(name) to remove a spoke from the
fan-out at runtime, mirroring AddSpoke. Idempotent (no-op if the
name doesn't exist).

Add plugin.Manager.StopPlugin(type, name) to kill a single plugin
subprocess and remove it from the manager's maps. Idempotent (no-op
if already stopped). Config and external state (telegram_v2.db) are
preserved for re-enable.

These primitives enable the integration admin disable flow:
StopPlugin → RemoveSpoke → persist enabled:false.
Add multi-integration admin API:
- GET /api/v1/admin/integrations — list all integrations with status
  (merges settings.yaml config + plugin HealthCheck + runtime state)
- POST /api/v1/admin/integrations/telegram/enable — hot-start via
  WireBrokerPlugin, persists telegram to message_broker.types
- POST /api/v1/admin/integrations/telegram/disable — StopPlugin +
  RemoveSpoke + removes from message_broker.types (preserves config)

Disable persistence survives restarts: startup's initPluginManager
loads the binary, but the broker wiring loop only wires types listed
in message_broker.types. Removing "telegram" from types means it
won't be wired or credentialed on restart.

Enable/disable are both idempotent. Auth: requireWorkstation (loopback)
for now, with comment noting future switch to requireAdmin.
Add /admin/integrations page with a multi-integration card layout:
- Shell renders a list of IntegrationStatus cards, each showing type
  icon, status badge, details grid, and an enable/disable toggle
- Telegram is the first integration panel (status + toggle)
- INTEGRATION_META registry makes adding future integrations
  (Google Chat, GitHub App) a single entry addition
- Cards are data-driven from GET /api/v1/admin/integrations

UI wiring:
- Route: /admin/integrations in main.ts ROUTES + ADMIN_ROUTES
- Nav: "Integrations" item in admin sidebar (nav.ts)
- Page title: "Integrations" in app-shell.ts PAGE_TITLES

The shell is deliberately generic — no telegram-specific logic in
the card renderer. Per-integration detail pages (groups, users) can
be added later as /admin/integrations/{type} routes.
…for local-dir projects

When a local-directory project's .scion marker resolves to an external
config path (~/.scion/project-configs/<slug>__<uuid>/.scion), the
fallback workspace source was computed as filepath.Dir(projectDir),
which gave the external config parent instead of the user's actual
project directory. Use the original projectPath input to derive the
correct workspace mount source.
GROUP 1 — Resume step init: updated() now triggers step loaders when
currentStep changes (loadSystemCheck for step 1, loadRuntime for step 2,
loadImagesStep for step 4). selectedHarnesses persisted to localStorage
in handleHarnessesNext and restored when entering step 4.

GROUP 2 — Stale error banner: this.error cleared on every step
transition in updated().

GROUP 3 — Render side-effects: storage cleanup moved from renderDone()
into updated() when currentStep===7. render() is now pure.

GROUP 4 — Small cleanups: SSE onerror sets user-facing error message;
removed unused runtimeAvailable state and its fetch; progress bar uses
(currentStep+1)/TOTAL_STEPS to reach 100% on the last visible step.
The image pull goroutine used s.ctx (server lifetime) with no per-image
timeout, so a hung PullImage would leak the goroutine forever. Now each
pull gets a 5-minute context.WithTimeout derived from s.ctx.
Adds a test that simulates the web-created local-dir project flow where
the .scion marker resolves to an external config dir without
workspace_path in settings. Verifies the /workspace volume mounts to
the project directory, not the config directory parent.
1. Cache DetectLocalRuntime result with sync.Once so repeated calls
   from GetDefaultSettingsData/YAML don't re-exec binaries each time.
   OverrideRuntimeDetection and mock helpers reset the cache so tests
   still get fresh probes per test case.

2. Align fallback when DetectLocalRuntime finds no runtime: both
   init.go and koanf.go now fall back to "docker" (matching
   factory.go), instead of diverging with "container" on macOS.
After creating a new folder in the dir-browser component, set
selectedPath and emit path-selected so the parent form immediately
reflects the new folder without requiring an extra click.
MUST-FIX:
- WireBrokerPlugin: rollback (StopPlugin) if GetBroker fails after
  LoadOne, preventing orphaned plugin processes.
- broker_v2.go Configure(): make idempotent on re-call. Second call
  (with hub creds) skips store/api/sendQueue/getMe/inboundMode init,
  only updates hub client + component handlers + project slug map.
  Prevents leaking old SQLite connections and duplicate resources.
- fanout.go: AddSpoke closes the replaced spoke's Bus; RemoveSpoke
  closes the removed spoke's Bus. Prevents resource leaks on
  re-setup and disable.
- Unify SCION_TELEGRAM_V2: add Env to V1PluginEntry (settings_v1.go),
  persist it in PersistTelegramConfig, read it in cold-start conversion
  (server_foreground.go) and admin enable handler. Removes the
  hardcoded name=="telegram" check — single source of truth.
- system_telegram.go: return generic error to client on hot-start
  failure, log detailed error server-side.

SHOULD-FIX:
- CheckObserver: use strings.EqualFold for case-insensitive match;
  add warning log when GetInfo fails.
- ValidateTelegramToken: url.PathEscape the token in getMe URL.
- startPolling: add "caller must hold b.mu" comment.
- system_telegram.go: add http.MaxBytesReader on both endpoints.
Add EnsureTelegramEnv(name, entry) as the SINGLE mechanism for
setting SCION_TELEGRAM_V2=1. It derives the env from the plugin
name (name=="telegram" && !selfManaged), not from persisted
settings. All three launch paths call it:
- Cold-start: server_foreground.go initPluginManager loop
- Hot-start: system_telegram.go setup handler
- Admin enable: system_integrations.go enable handler

This ensures existing settings.yaml files WITHOUT the Env entry
still launch telegram as v2 after a restart — no migration needed.

Tests:
- TestEnsureTelegramEnv_ExistingSettingsWithoutEnv: nil Env → v2 set
- TestEnsureTelegramEnv_SetsV2: basic functionality
- TestEnsureTelegramEnv_SkipsNonTelegram: other plugins untouched
- TestEnsureTelegramEnv_SkipsSelfManaged: self-managed untouched
- Add PublishRaw to PostgresEventPublisher (EventPublisher interface)
- Migrate system_handlers_test.go and fs_safety_test.go from removed
  sqlite.New to newTestStore (ent-backed composite store)
- Fix test data to use valid UUIDs (ent schema requires them)
- Add BrokerName to ProjectProvider test fixtures (ent validator)
- Remove stale conflict marker in server_foreground.go
- Restore encoding/json import
Run gofmt on files with formatting issues from rebase conflict
resolution (hub_config.go, server.go, fs_safety_test.go, root.go,
types.go) and pre-existing main issues (worktree_eligibility_test.go,
models.go).
This file uses newTestStore (from teststore_test.go which has the
!no_sqlite tag). Without the matching tag, go vet fails when the
sqlite driver is excluded.
The function gained a needsOnboarding bool parameter but the 3 test
call sites were not updated, causing go vet and golangci-lint to fail.
- system_integrations.go: check json.Encode, StopPlugin, SaveVersionedSettings
  return values; remove unused isTelegramInBrokerTypes
- telegram_setup.go: check resp.Body.Close and StopPlugin return values
- root.go: remove ineffectual name assignment in usesWorktrees
- system_telegram.go: check json.Encode return values
- daemon/ports.go: check ln.Close return value
- agent/provision_test.go: check os.Chdir/Setenv/MkdirAll/WriteFile
  return values (all test instances, consistent style)
@scion-gteam scion-gteam Bot force-pushed the scion/integrationsadmin branch from 5c2ee85 to 8fb003f Compare June 9, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants