Skip to content

Event-driven autoscaling via workflow_job webhook #119

@jmcte

Description

@jmcte

Context

#53 added autoscaling based on polling GitHub Actions queue depth (src/lib/autoscale.ts). Polling is robust and stateless but introduces latency (the poll interval) and ongoing API load. GitHub's workflow_job webhook fires queued / in_progress / completed events synchronously and is the recommended low-latency input for runner autoscalers.

This issue is the natural follow-on to #53: add an event-driven path that complements the poller, not replaces it.

Scope

  • Add a small webhook receiver (CLI command + a thin HTTP entrypoint, e.g. tsx src/cli.ts autoscale-webhook --listen :8080) that:
    • Validates the X-Hub-Signature-256 HMAC using a shared secret from env.
    • Filters events to workflows targeting the fleet's runner labels (from config/*.yaml).
    • Translates each queued event into a scale-up signal and each completed event into a scale-down signal, deduped against the existing autoscaler state.
  • Reuse the scaling primitives from src/lib/autoscale.ts so the polling and webhook paths share one controller.
  • Keep the polling autoscaler as the safety net (events can be missed); the doctor should warn if the receiver hasn't seen events recently.
  • Document deployment options: standalone Node process behind the existing Synology compose project, or as a small ephemeral function (whichever fits the operator's network policy).
  • Tests: signature verification (good/bad/missing), label filtering, idempotency on event replay, integration with the existing autoscaler controller.

Acceptance Criteria

  • A workflow_job queued event for a label this fleet owns provisions a new slot within a few seconds (well under the poller interval).
  • Bad-signature requests are rejected with 401 and logged via src/lib/logger.ts.
  • pnpm doctor reports webhook freshness (last event timestamp) and warns when events have been silent longer than the poll interval.
  • The polling autoscaler continues to function and converges to the same state when the receiver is offline.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions