Context
#53 added autoscaling based on polling GitHub Actions queue depth (src/lib/autoscale.ts). Polling is robust and stateless but introduces latency (the poll interval) and ongoing API load. GitHub's workflow_job webhook fires queued / in_progress / completed events synchronously and is the recommended low-latency input for runner autoscalers.
This issue is the natural follow-on to #53: add an event-driven path that complements the poller, not replaces it.
Scope
- Add a small webhook receiver (CLI command + a thin HTTP entrypoint, e.g.
tsx src/cli.ts autoscale-webhook --listen :8080) that:
- Validates the
X-Hub-Signature-256 HMAC using a shared secret from env.
- Filters events to workflows targeting the fleet's runner labels (from
config/*.yaml).
- Translates each
queued event into a scale-up signal and each completed event into a scale-down signal, deduped against the existing autoscaler state.
- Reuse the scaling primitives from
src/lib/autoscale.ts so the polling and webhook paths share one controller.
- Keep the polling autoscaler as the safety net (events can be missed); the doctor should warn if the receiver hasn't seen events recently.
- Document deployment options: standalone Node process behind the existing Synology compose project, or as a small ephemeral function (whichever fits the operator's network policy).
- Tests: signature verification (good/bad/missing), label filtering, idempotency on event replay, integration with the existing autoscaler controller.
Acceptance Criteria
- A
workflow_job queued event for a label this fleet owns provisions a new slot within a few seconds (well under the poller interval).
- Bad-signature requests are rejected with
401 and logged via src/lib/logger.ts.
pnpm doctor reports webhook freshness (last event timestamp) and warns when events have been silent longer than the poll interval.
- The polling autoscaler continues to function and converges to the same state when the receiver is offline.
Related
Context
#53 added autoscaling based on polling GitHub Actions queue depth (
src/lib/autoscale.ts). Polling is robust and stateless but introduces latency (the poll interval) and ongoing API load. GitHub'sworkflow_jobwebhook firesqueued/in_progress/completedevents synchronously and is the recommended low-latency input for runner autoscalers.This issue is the natural follow-on to #53: add an event-driven path that complements the poller, not replaces it.
Scope
tsx src/cli.ts autoscale-webhook --listen :8080) that:X-Hub-Signature-256HMAC using a shared secret from env.config/*.yaml).queuedevent into a scale-up signal and eachcompletedevent into a scale-down signal, deduped against the existing autoscaler state.src/lib/autoscale.tsso the polling and webhook paths share one controller.Acceptance Criteria
workflow_job queuedevent for a label this fleet owns provisions a new slot within a few seconds (well under the poller interval).401and logged viasrc/lib/logger.ts.pnpm doctorreports webhook freshness (last event timestamp) and warns when events have been silent longer than the poll interval.Related
src/lib/metrics.ts)