Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ This Rails app uses a small set of preferred libraries for common integration wo
- Mission Control Jobs remains available at `/ops/jobs` for queue inspection and operational actions.
- `bin/jobs-worker` and `bin/jobs-scheduler` set `R3X_RUNTIME_PROFILE=jobs` before boot. That internal profile keeps production eager load enabled for jobs pods, but trims boot to the Active Model / Active Job / Active Record runtime, excludes the dashboard web stack and web-only gems, removes the app `config/routes.rb` / `config/routes/**/*` files from Rails path registration, keeps `ActionController::Base.include_all_helpers = false` for headless boot, ignores `lib/r3x/workflow/cli.rb` so worker/scheduler processes do not eager-load the Thor wrapper, and leaves the operator-facing commands unchanged.
- `bin/workflow` sets `R3X_RUNTIME_PROFILE=workflow_cli` before boot. `workflow_cli` is also headless and skips the web stack, but unlike `jobs` it keeps `lib/r3x/workflow/cli.rb` available so the command can load `R3x::Workflow::Cli`. This profile is command-owned and is not a deployment env knob.
- The dashboard is DB-first: workflow pages and recent runs are derived from current `Solid Queue` recurring task and job tables, so they only show workflows and runs that have persisted runtime artifacts.
- The dashboard is DB-first: workflow pages and recent runs are derived from current `Solid Queue` tables plus `trigger_states`, so they only show workflows and runs that have persisted runtime artifacts.
- Dashboard-side queue rows and recurring-task rows are wrapped by app models `Dashboard::Run` and `Dashboard::RecurringTask`; treat those as the dashboard-facing read/write boundary over `solid_queue_jobs` and `solid_queue_recurring_tasks`.
- The dashboard can optionally query indexed application logs when `R3X_LOGS_PROVIDER` is configured. The current supported provider is `victorialogs`, which reads from `R3X_VICTORIA_LOGS_URL` via VictoriaLogs native query API.
- App log output format is controlled independently by `R3X_LOG_FORMAT`. Set it to `json` for structured logs (required by the dashboard log view), or `plain` for standard Rails flat text. Unsupported values raise on boot.
Expand All @@ -32,7 +32,7 @@ This Rails app uses a small set of preferred libraries for common integration wo
- `app/controllers/r3x/dashboard/overview_controller.rb`: root overview screen that surfaces attention items, recent runs, and workflow shortcuts from persisted runtime data.
- `app/views/r3x/dashboard/` + `app/views/layouts/r3x/dashboard.html.erb`: dashboard UI templates and layout.
- `app/models/dashboard/`: dashboard-facing models over persisted queue metadata, especially `Dashboard::Run` (`solid_queue_jobs` + execution-table relations), `Dashboard::RecurringTask` (`solid_queue_recurring_tasks` parsing and workflow-key lookups), and `Dashboard::DirectWorkflowEnqueuer` (the deliberate low-level Solid Queue enqueue boundary used when web pods should not load workflow classes).
- `app/lib/r3x/dashboard/`: read-only query objects that build workflow summaries, recent runs, and optional indexed log views from dashboard models and configured log providers.
- `app/lib/r3x/dashboard/`: read-only query objects that build workflow summaries, recent runs, and optional indexed log views from dashboard models, `TriggerState`, and configured log providers.
- `app/lib/r3x/dashboard/workflow/`: dashboard workflow composition/query namespace (`Catalog`, `Runs`, `Summaries`, `RunCounts`, `RunEnqueuer`) used by controllers and overview composition.
- `app/lib/r3x/dashboard/overview.rb`: assembles the root dashboard overview cards and sections from workflow summaries and persisted runs.
- `app/lib/r3x/client/hashi_corp_vault/`: Vault client helpers for config parsing and auth mode implementations (`Config`, `Auth::Token`, `Auth::Kubernetes`).
Expand All @@ -50,8 +50,8 @@ This Rails app uses a small set of preferred libraries for common integration wo
- `app/lib/r3x/client/google/translate.rb`: Google Translate client used by workflows via `ctx.client.google_translate(...)`.
- `lib/r3x/workflow/llm_schema.rb`: lazy wrapper around `RubyLLM::Schema` for workflows that need structured LLM output.
- `R3x::Client::Google` is a project namespace; when referencing the third-party Google gem namespace, use `::Google` to avoid constant collisions.
- `app/jobs/r3x/`: job entrypoints for workflow runtime support.
- `app/models/r3x/`: runtime support models under the `R3x` namespace.
- `app/jobs/r3x/`: job entrypoints, especially `R3x::ChangeDetectionJob`, which evaluates change-detecting triggers before enqueueing workflow runs directly on the workflow job class.
- `app/models/r3x/`: runtime support models such as `R3x::TriggerState` for per-trigger change-detection state.
- `workflows/`: user workflow packs. These are not the framework itself; they are loaded by the framework.
- `workflows/<pack>/test/`: self-contained tests for a specific workflow pack. Keep pack-local tests beside the workflow code, and use `test/fixtures/workflows/` for framework-level fixtures.
- `lib/r3x/workflow/boot.rb`: explicit workflow boot helper used by process entrypoints.
Expand All @@ -74,9 +74,12 @@ This Rails app uses a small set of preferred libraries for common integration wo
- `R3x::RecurringTasksConfig` turns schedulable workflow triggers into Solid Queue dynamic recurring tasks via `SolidQueue::RecurringTask`. All triggers have a `unique_key` (based on type + options hash) used for identification and duplicate detection. `schedule_all!` persists dynamic tasks and sweeps stale ones.
- Workflow packs are loaded explicitly by process entrypoints, not globally during Rails boot. `R3x::Workflow::Entrypoint` centralizes those boot decisions for the web server hook and `bin/jobs*`. In the current split-process setup, the generic `bin/jobs` entrypoint keeps the default Solid Queue behavior: it only loads workflow classes when `SOLID_QUEUE_IN_PUMA=true`, and otherwise it also schedules recurring tasks before starting the CLI. `bin/jobs-worker` only loads workflow classes, defaults `config/queue.worker.yml`, defaults `SOLID_QUEUE_SKIP_RECURRING=true`, and boots Rails under the internal `jobs` runtime profile. `bin/jobs-scheduler` defaults `config/queue.scheduler.yml`, schedules recurring tasks, and also boots under the `jobs` runtime profile. `bin/workflow` boots Rails under the internal `workflow_cli` runtime profile. Both `jobs` and `workflow_cli` are headless profiles selected by the command itself, not by deployment env: they remove the app route files from Rails path registration before initialization, keep `ActionController::Base.include_all_helpers = false` to avoid helper scans from framework eager-load, and deliberately exclude web-only gems. Only `jobs` also ignores `lib/r3x/workflow/cli.rb` to keep worker/scheduler boots slimmer. The web process also loads workflows on server boot so Mission Control Jobs can validate and enqueue workflow-backed recurring tasks; it only schedules recurring tasks when it is also hosting Solid Queue in-process, which currently means development or `SOLID_QUEUE_IN_PUMA=true`.
- Production deployments should prefer separate web and jobs controllers/processes over embedding Solid Queue into the Puma web process. Prefer separate worker and scheduler jobs controllers: worker pods can run `bin/jobs-worker`, while scheduler pods can run `bin/jobs-scheduler`. If `SOLID_QUEUE_IN_PUMA` is enabled, remember that the Puma plugin can fork an additional Solid Queue supervisor plus worker/dispatcher/scheduler processes inside the same pod, the web boot path will also load workflows and schedule recurring tasks, and separate jobs pods should not also try to own recurring scheduling in that mode.
- The dashboard does not require workflow packs to be loaded on the web pod. It reconstructs workflow pages from `solid_queue_recurring_tasks` and `solid_queue_jobs`, via `Dashboard::RecurringTask` and `Dashboard::Run`, and can enqueue `Run now` actions through `POST /workflows/:workflow_key/runs` without loading workflow classes on the web pod: the workflow-level action deliberately uses `Dashboard::DirectWorkflowEnqueuer` to insert the workflow job directly into `Solid Queue` with no trigger arguments so runtime resolves the implicit manual trigger, while trigger-specific scheduled actions enqueue the persisted workflow job class with the recurring task arguments.
- The dashboard does not require workflow packs to be loaded on the web pod. It reconstructs workflow pages from `solid_queue_recurring_tasks`, `solid_queue_jobs`, and `trigger_states`, via `Dashboard::RecurringTask`, `Dashboard::Run`, and `R3x::TriggerState`, and can enqueue `Run now` actions through `POST /workflows/:workflow_key/runs` without loading workflow classes on the web pod: the generic workflow-level `Run now` action deliberately uses `Dashboard::DirectWorkflowEnqueuer` to insert the workflow job directly into `Solid Queue` with no trigger arguments so runtime resolves the implicit manual trigger, while trigger-specific change-detecting actions still use `R3x::ChangeDetectionJob`.
- Change-detecting triggers are file-defined trigger objects that provide `cron`, `unique_key`, and `detect_changes(workflow_key:, state:)`. Their durable runtime state lives in `R3x::TriggerState`.
- Workflow entrypoints can be disabled without deletion by adding `# r3x:disable ...` near the top of `workflow.rb`. Disabled entrypoints are skipped by the pack loader and are not registered/scheduled.
- `ApplicationJob` and `R3x::Workflow::Base` add stable tagged log context so indexed logs can be correlated back to run pages. The workflow job itself keeps the per-run tags minimal (`r3x.run_active_job_id` and `r3x.trigger_key`).
- `R3x::ChangeDetectionJob` loads the trigger, fetches/updates `R3x::TriggerState`, and only enqueues the workflow job class itself when the trigger reports a change.
- Because the app currently uses `Solid Queue` as a database-backed backend on the same Active Record database connection, code may intentionally rely on a database transaction covering both `TriggerState` updates and `perform_later`. Do not assume those guarantees survive a future backend or database split.
- `ApplicationJob`, `R3x::Workflow::Base`, and `R3x::ChangeDetectionJob` add stable tagged log context so indexed logs can be correlated back to run pages. The workflow job itself keeps the per-run tags minimal (`r3x.run_active_job_id` and `r3x.trigger_key`), while change-detection orchestration still emits `r3x.workflow_key` for broader workflow-level correlation.
- When `R3X_LOG_FORMAT=json`, logs are emitted as structured JSON with explicit `level`, `message`, and tag data so the dashboard can read real log levels directly. When `R3X_LOG_FORMAT=plain` (or unset), logs use standard Rails flat text.
- Known limitation: because queued workflow runs persist the concrete workflow class name, renaming or removing a workflow class across deploys can strand older queued runs with job deserialization failures. This is currently an accepted tradeoff for preserving `ActiveJob::Continuable` on the workflow job itself.
- The dashboard's run history is DB-first and parses `Solid Queue` / `Active Job` payloads directly. It still accepts the underlying tradeoff that finished runs are retention-bound and that workflows with no persisted runtime artifacts are invisible to the dashboard.
Expand Down Expand Up @@ -132,13 +135,13 @@ bin/workflow [options] [command] [arguments]

### Operational note

- When refactoring workflow class names, remember that already queued scheduled runs may still point at the old concrete class name.
- When refactoring workflow class names, remember that already queued scheduled or change-detected runs may still point at the old concrete class name.
- If a workflow class is renamed or removed, consider cleaning up pending jobs and recurring tasks created under the old class, or accept that older queued runs may fail deserialization.

## Maintenance Warning

- Keep this file synchronized with the real codebase. If you change workflow loading, trigger discovery, scheduling flow, top-level directory structure, namespaces, or the framework/user-workflow boundary, update the relevant `AGENTS.md` sections in the same change.
- In particular, update examples and notes here when changing files such as `lib/r3x/workflow.rb`, `lib/r3x/workflow/pack_loader.rb`, `lib/r3x/workflow/registry.rb`, `lib/r3x/workflow/boot.rb`, `lib/r3x/recurring_tasks_config.rb`, `lib/r3x/triggers.rb`, `bin/workflow`, `bin/jobs`, or `config/application.rb`.
- In particular, update examples and notes here when changing files such as `lib/r3x/workflow.rb`, `lib/r3x/workflow/pack_loader.rb`, `lib/r3x/workflow/registry.rb`, `lib/r3x/workflow/boot.rb`, `lib/r3x/recurring_tasks_config.rb`, `lib/r3x/triggers.rb`, `app/jobs/r3x/change_detection_job.rb`, `bin/workflow`, `bin/jobs`, or `config/application.rb`.
- Also update this file when changing the shared DSL validation contract in files such as `lib/r3x/dsl/validatable.rb`, `lib/r3x/configuration_error.rb`, or the base classes for workflow-declared objects.
- Also update this file when changing Active Job backend semantics, `Solid Queue` database wiring, or any logic that depends on enqueueing being inside the same database transaction as app writes.
- When adding a new subsystem or moving code between `lib/r3x/`, `app/lib/r3x/`, `app/jobs/r3x/`, or `workflows/`, refresh the project overview and codebase map so future agents can still orient themselves quickly.
Expand Down
5 changes: 4 additions & 1 deletion app/helpers/r3x/dashboard/application_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ def dashboard_tone_for(value)
"idle" => "muted",
"queued" => "info",
"running" => "info",
"scheduled" => "info"
"scheduled" => "info",
"trigger_error" => "danger"
}.fetch(value, "muted")
end

Expand Down Expand Up @@ -137,13 +138,15 @@ def dashboard_trigger_label(trigger_entry)
return "Schedule: #{cron}" if cron.present?

{
"change_detecting" => "Change detection",
"manual" => "Manual",
"observed" => "Observed trigger"
}.fetch(mode.to_s, mode.to_s.humanize)
end

def dashboard_trigger_kind(trigger_entry)
{
"change_detecting" => "Change detection",
"manual" => "Manual",
"observed" => "Observed",
"schedule" => "Schedule",
Expand Down
114 changes: 114 additions & 0 deletions app/jobs/r3x/change_detection_job.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
module R3x
class ChangeDetectionJob < ApplicationJob
queue_as :default

def perform(workflow_key, options = nil)
workflow_key, options = normalize_arguments(workflow_key, options)
trigger_key = options.fetch(:trigger_key)
trigger_state = nil

with_log_tags(
R3x::Log.tag(R3x::Log::WORKFLOW_KEY_TAG, workflow_key),
R3x::Log.tag(R3x::Log::TRIGGER_KEY_TAG, trigger_key)
) do
workflow_class = R3x::Workflow::Registry.fetch(workflow_key)
trigger = find_trigger(workflow_class: workflow_class, trigger_key: trigger_key)
logger.info "Checking change-detecting trigger type=#{trigger.type}"

trigger_state = load_trigger_state(workflow_key: workflow_key, trigger_key: trigger_key, trigger_type: trigger.type)
result = normalize_result(
trigger.detect_changes(
workflow_key: workflow_key,
state: trigger_state.state.deep_symbolize_keys
)
)
logger.info "Evaluated change-detecting trigger changed=#{result[:changed]}"

TriggerState.transaction do
if result[:changed]
with_log_tags(R3x::Log.tag(R3x::Log::JOB_OUTCOME_TAG, "changed")) do
logger.info "Change detected; enqueueing workflow class=#{workflow_class.name}"
end

workflow_class.perform_later(trigger_key, trigger_payload: result[:payload])
end

trigger_state.record_check!(result)
end
end
rescue => e
trigger_state.record_error!(e) if defined?(trigger_state) && trigger_state&.persisted?

with_log_tags(
R3x::Log.tag(R3x::Log::WORKFLOW_KEY_TAG, workflow_key),
R3x::Log.tag(R3x::Log::TRIGGER_KEY_TAG, defined?(trigger_key) ? trigger_key : nil),
R3x::Log.tag(R3x::Log::JOB_OUTCOME_TAG, "failed")
) do
structured_error(message: "Change detection failed", error: e)
end

raise
end

private

def find_trigger(workflow_class:, trigger_key:)
trigger = workflow_class.triggers_by_key[trigger_key]

if trigger.nil?
raise ArgumentError, "Unknown trigger key '#{trigger_key}' for workflow '#{workflow_class.workflow_key}'"
end

unless trigger.change_detecting?
raise ArgumentError, "Trigger '#{trigger_key}' is not change-detecting"
end

trigger
end

def load_trigger_state(workflow_key:, trigger_key:, trigger_type:)
TriggerState.find_or_create_by!(
workflow_key: workflow_key,
trigger_key: trigger_key
) do |state|
state.trigger_type = trigger_type.to_s
state.state = {}
end
end

def normalize_arguments(workflow_key, options)
if workflow_key.is_a?(Hash) && options.nil?
options = workflow_key
workflow_key = nil
end

options = normalize_options_hash(options)
workflow_key ||= options[:workflow_key]

[ workflow_key.presence || raise(ArgumentError, "Missing workflow_key"), options ]
end

def normalize_options_hash(options)
case options
when nil
{}
when Hash
options.deep_symbolize_keys
else
raise ArgumentError, "Expected options hash, got #{options.class.name}"
end
end

def normalize_result(result)
normalized = result.deep_symbolize_keys

unless normalized.key?(:changed) && normalized.key?(:state)
raise ArgumentError, "Change-detecting trigger must return a hash with :changed and :state"
end

normalized[:state] ||= {}
normalized[:payload] ||= nil
normalized
end
end
end
12 changes: 9 additions & 3 deletions app/lib/r3x/dashboard/overview.rb
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def summary_cards

def needs_attention
@needs_attention ||= workflows
.select { |workflow| workflow.dig(:health, :status) == "failed" }
.select { |workflow| %w[failed trigger_error].include?(workflow.dig(:health, :status)) }
.map do |workflow|
workflow.merge(
attention_at: attention_time_for(workflow),
Expand Down Expand Up @@ -65,11 +65,17 @@ def workflows
end

def attention_time_for(workflow)
workflow.dig(:last_run, :recorded_at)
return workflow.dig(:last_run, :recorded_at) if workflow.dig(:health, :status) == "failed"

workflow[:trigger_entries]
.filter_map { |entry| entry[:trigger_state]&.last_error_at }
.max
end

def attention_label_for(workflow)
"Failed"
return "Failed" if workflow.dig(:health, :status) == "failed"

"Trigger error"
end
end
end
Expand Down
Loading