These notes apply to workflow code in general.
bin/workflow boots Rails through the internal workflow_cli runtime profile.
That profile is headless: it skips the dashboard/Mission Control web stack,
web-only gems, and app route registration, and it keeps
ActionController::Base.include_all_helpers = false so framework eager-load
does not scan app helpers. Unlike the slimmer jobs profile used by
bin/jobs-worker and bin/jobs-scheduler, workflow_cli still leaves
lib/r3x/workflow/cli.rb available for the Thor wrapper.
-
To disable a workflow without deleting it, add a top-of-file pragma near the start of
workflow.rb:# r3x:disable Reason for disabling -
R3x::Workflow::PackLoaderscans the first lines of each entrypoint and skips files with this pragma, so the workflow file is notrequired, not registered, and not scheduled. -
Keep the pragma near the top of the file (before constants/classes) so it is easy to spot during reviews and maintenance.
- Workflows inherit from
R3x::Workflow::Baseand implement#run. R3x::Workflow::Baseis anApplicationJoband includesActiveJob::Continuable.- The current execution context is available as
ctxon the workflow instance duringperform, so helper methods can use it without threading it through every call. - Use
steparound boundaries that should be resumable, such as external API calls, slow network work, or other distinct phases. - Keep steps small and meaningful. A step should mark a real unit of progress, not just wrap every line.
- Use
conditionfor high-level guards that must be true beforerunstarts. Conditions are evaluated only on the initial execution, not whenActiveJob::Continuableresumes an isolated or interrupted step.
-
stepis a resumable boundary, not a value-return helper. -
Use
isolated: truefor steps that should run in their own execution after prior progress is serialized. Pair it with workflow-specificresume_optionswhen the next step should run after a deliberate delay instead of blocking a worker withsleep. -
Do not assign the result of a
stepblock to a variable and assume it is the block result. -
Put data-fetching logic in a normal helper, then use
steparound the resumable work that consumes that data. -
If you see
true.selectornil.selectin a workflow crash, check whether astepblock was used as if it returned the fetched value.# Good raw_events = fetch_from_apify step :process_events do |step| process_events(Array.wrap(raw_events), step) end # Bad raw_events = step :fetch_events do fetch_from_apify end
Use condition for workflow-level preconditions that must be true before any work starts:
condition :nobody_home?, reason: "Somebody is home"The predicate is an instance method, so it can use ctx and workflow helpers. If the predicate
returns false, the workflow returns { "status" => "skipped", "reason" => reason } and logs the
skip instead of calling run. Conditions do not run during Continuable resumes, so they are safe to
combine with delayed or isolated steps.
Use on_complete for tiny side effects that should happen only after the whole workflow succeeds:
on_complete { ctx.client.healthchecks_io(HEALTHCHECK_UUID).ping }Completion callbacks run after run returns and before the workflow logs Workflow run completed.
Blocks execute on the workflow instance, so they can use ctx, constants, and private workflow
helpers. Multiple callbacks run in declaration order.
on_complete does not run when condition skips the workflow, when run raises, or when
ActiveJob::Continuable interrupts before the full workflow completes. If a completion callback
raises, the workflow fails instead of logging a false success.
ctx- The current workflow execution context.
- Available during
perform/run. - Use it to access clients and runtime data without passing context through every helper method.
step- Marks a resumable boundary using
ActiveJob::Continuable. - Use it around slow or externally dependent phases that should resume cleanly after interruption or retry.
- Use
isolated: trueplusresume_options = { wait: ... }when the next step should be queued for later instead of sleeping inside the current job.
- Marks a resumable boundary using
condition- Declares a workflow-level guard that must return true before
runon the initial execution. - Requires a positive predicate method and a human-readable reason for skipped runs.
- Declares a workflow-level guard that must return true before
on_complete- Runs a block after successful full workflow completion.
- Keep it for small final side effects, such as success healthcheck pings.
- A raised callback error fails the workflow.
with_cache- Wraps an expensive block in
Rails.cacheso repeated local runs can reuse the same result. - Good for slow, noisy, or hard-to-reproduce API calls while iterating on a workflow.
bin/workflow run --skip-cache <path>bypasses allwith_cacheblocks for that run without editing the workflow.R3X_SKIP_CACHE=truedoes the same override at the env level.- In production,
with_cachestill raises by default unlessR3X_SKIP_CACHE=trueis set. - Use
with_cache(force: true)when you need to refresh a stale cached value.
- Wraps an expensive block in
ctx.durable_set(name = :default, ttl: 90.days)- Returns a workflow-scoped durable set backed by
Rails.cache. - Good for remembering which items were already processed, sent, uploaded, or otherwise handled across workflow runs.
- Members are scoped by workflow key and set name, so different workflows and different sets do not collide.
- When the app uses
:solid_cache_store, customttl:values must not exceedconfig/cache.ymlstore_options.max_age. - Use
include?,add, anddeleteon the returned set.
- Returns a workflow-scoped durable set backed by
- Prefer this for best-effort dedup across runs; prefer a real table only when you need permanent history or hard uniqueness guarantees.
-
ctx.client.httpreturns a new instance of the HTTP client (R3x::Client::Http) on every call. -
When making requests in a loop, instantiate the client once outside the loop so timeout and SSL configuration is built once and the workflow code stays clear.
-
Good:
http = ctx.client.http urls.map do |url| response = http.get(url) # ... end
-
Bad:
urls.map do |url| response = ctx.client.http.get(url) # ... end
-
For ad hoc batch work that benefits from a persistent HTTPX session, use
ctx.client.persistent_httpso connection lifecycle is explicit:ctx.client.persistent_http(timeout: 30) do |http| urls.each { |url| http.get(url) } end
-
Do not add
persistent:options to thin clients such as Discord, Google Translate, Prometheus, or VictoriaLogs unless there is a measured hot path that performs many requests in one controlled scope.
See docs/workflows/example_multi_step_digest.md for a
full worked example that combines step, with_cache, ctx.durable_set, structured LLM output,
and multiple ctx.client.* integrations in one workflow.
Use bin/workflow for local workflow development. It boots Rails through the workflow_cli runtime
profile, loads the requested workflow file, and runs the workflow in-process.
export R3X_WORKFLOW_PATHS="$PWD/workflows"
bin/workflow list
bin/workflow info <workflow_key>
bin/workflow run workflows/<workflow_name>/workflow.rb
bin/workflow run --dry-run workflows/<workflow_name>/workflow.rb
bin/workflow run --skip-cache workflows/<workflow_name>/workflow.rbFor an included workflow in this checkout:
bin/workflow info porto_santo_news
bin/workflow run --dry-run workflows/porto_santo_news/workflow.rb-
listandinfoload workflow packs fromR3X_WORKFLOW_PATHS. -
runalways takes a direct path to aworkflow.rbfile. -
In
developmentandtest,bin/workflow rundefaults to dry-run, so dry-run-aware clients avoid real side effects. -
--dry-runexplicitly enables dry-run for that run (R3X_DRY_RUN=true). -
--no-dry-runexplicitly disables dry-run for that run (R3X_DRY_RUN=false), even indevelopment. -
--skip-cachesetsR3X_SKIP_CACHE=truefor that run and bypasseswith_cache. -
Use
--dry-run --skip-cachetogether when you want a fresh, low-risk local run:bin/workflow run --dry-run --skip-cache workflows/<workflow_name>/workflow.rb
-
To run with real delivery in
development, use--no-dry-runor setR3X_DRY_RUN=false:R3X_DRY_RUN=false bin/workflow run workflows/<workflow_name>/workflow.rb
Workflow packs can have their own Minitest tests next to the workflow code:
workflows/
example_digest/
workflow.rb
test/
workflow_test.rb
Pack-local tests should require the Rails environment and the workflow file, then instantiate the workflow directly. Use small fakes for workflow clients so tests verify behavior without calling external services.
#!/usr/bin/env ruby
require "bundler/setup"
require "minitest/autorun"
require_relative "../../../config/environment"
require_relative "../workflow"
class ExampleDigestTest < Minitest::Test
def test_declares_schedule_trigger
schedule = Workflows::ExampleDigest.triggers.find(&:cron_schedulable?)
assert schedule
assert_equal "0 8 * * *", schedule.cron
end
endRun a pack-local test directly:
ruby workflows/<workflow_name>/test/workflow_test.rbUse Minitest for workflow tests. Prefer real parsing and transformation code with fake client boundaries over stubbing the whole workflow. A useful test should fail if the workflow stops filtering, deduplicating, rendering, or delivering the expected thing.
For workflows that depend on real integration credentials, Vault gives local development strong
dev/prod parity. When local env points at the same Vault address and secret path as production, the
app can boot, log in to Vault, and hydrate ENV with the same secret names used by production pods.
That means you can test a workflow locally against the same credential contract that production uses:
export R3X_VAULT_ADDR=http://vault.example.internal:8200
export R3X_VAULT_SECRETS_PATH=secret/data/env/r3x
export R3X_VAULT_AUTH_METHOD=kubernetes
export R3X_VAULT_KUBERNETES_ROLE=r3x
bin/workflow run --dry-run workflows/<workflow_name>/workflow.rbThe exact auth mode depends on your environment. Token auth is also supported for local operator work:
export R3X_VAULT_ADDR=http://vault.example.internal:8200
export R3X_VAULT_TOKEN=<token>
export R3X_VAULT_SECRETS_PATH=secret/data/env/r3xKeep the distinction clear:
- Vault parity means the same secret names and values can be loaded locally and in production.
bin/workflow run --dry-runshould still be the default local command for workflows with side effects.- Dry run protects delivery behavior; Vault parity protects configuration drift.
- Use
just vault_checkto verify Vault connectivity and visible secret keys without printing secret values. - See docs/deployment.md#vault-secrets for the full Vault setup.
-
For small extraction chains, prefer
presenceand chained fallbacks over repeatedblank?branches. -
Keep simple parsing close to the data source unless the logic is genuinely reusable.
-
Good:
body = normalize_text(node.at_xpath("./description")&.inner_html).presence || normalize_text(node.at_xpath("./encoded")&.inner_html).presence || normalize_text(node.at_xpath("./title")&.text)
-
Bad:
body = normalize_text(node.at_xpath("./description")&.inner_html) body = normalize_text(node.at_xpath("./encoded")&.inner_html) if body.blank? body = normalize_text(node.at_xpath("./title")&.text) if body.blank?
- Prefer letting workflows fail loudly.
- Avoid broad
rescueblocks that hide the original problem. - Only rescue when translating a known boundary error into a clearer domain failure or cleanup.
- Prefer
with_cacheonly around clearly expensive or noisy calls, not around the whole workflow. - Prefer
ctx.durable_setfor cross-run item dedup, notwith_cache. - The normal workflow is:
- add
with_cachearound the slowest boundary while iterating - use
bin/workflow run --skip-cache <path>when you want a fresh uncached run - leave the helper in place if it remains useful for future debugging
- add
- For durable dedup, use a stable member key from the item itself, such as a URL digest or external post ID, and add it only after the relevant side effect succeeds.
- If a cached block becomes confusing or hides too much behavior, remove it instead of stacking more flags or conditions around it.
- When a workflow suddenly sees a boolean or
nilwhere an array should be, inspect the neareststepboundary first before blaming the external API.
-
trigger :scheduleaccepts an optionaltimezone:. -
Timezones may be IANA names like
Europe/Parisor Rails names likePacific Time (US & Canada). -
Rails-style names are normalized to canonical TZInfo names before scheduling.
-
If
timezone:is omitted,R3X_TIMEZONEis used when present. -
If the cron string already embeds a timezone, that embedded timezone wins over
R3X_TIMEZONE. -
Use one of these styles, not both:
trigger :schedule, cron: "every day at 9am Europe/Paris"
trigger :schedule, cron: "every day at 9am", timezone: "Europe/Paris"
-
If both
timezone:and the cron string specify timezones, configuration fails fast.
- Prefer
logger.debug { ... }for debug logs so the message is lazy evaluated. - Use block form when the log string is expensive to build or includes interpolated values.
- For
info,warn, anderror, use eager string logging when the message should always be emitted. - Workflow execution already carries tagged context such as
r3x.run_active_job_idandr3x.trigger_keyfor indexed log correlation in the dashboard. Orchestration jobs also tag lines withr3x.workflow_keywhere that broader workflow-level context is useful. Prefer logging through the existing Rails logger so those tags stay attached to emitted lines. - App logs are always emitted as structured JSON with explicit
level,message, andtags. - Dashboard run logs read the explicit
levelfrom structured log payloads. They do not infer levels from free-form message text.
-
When logging hashes or structured data, avoid manually interpolating individual fields.
-
amazing_printis preloaded globally — you can call.ai(...)on any object without addingrequire "amazing_print"yourself. -
Use
.ai(plain: true)so keys are aligned and output is readable, without ANSI colour codes that clutter log files.# Good logger.info("Camera check result:\n#{result.ai(plain: true)}") # Bad logger.info("Checked camera #{url}, result: #{result["status"]}, description: #{result["description"]}")
-
If the structure is large, consider logging it on
debuginstead ofinfo.
-
When a workflow expects structured LLM output, prefer
RubyLLMschema support. -
Use a schema when you want JSON-like data back instead of parsing free-form text by hand.
-
Keep the schema close to the prompt so the expected shape is obvious.
-
Define new workflow schemas with
R3x::Workflow::LlmSchema.define. -
This is the current convention because it keeps
ruby_llm-schemaoff the boot path for workflows that do not use structured LLM output. -
Older direct inheritance from
RubyLLM::Schemastill works, but treat it as legacy in new code. -
For nested JSON, define the shape with
arrayandobjectblocks inside the helper block, then pass that schema tomessage(...). -
Read the parsed structured result from
response.content; avoid manual JSON parsing when the schema already captures the shape.WeeklyDigestSchema = R3x::Workflow::LlmSchema.define do array :EN do object :entry do string :name string :location string :date_time end end array :PT do object :entry do string :name string :location string :date_time end end end
- Side-effecting workflow helpers should support
dry_runwhen it makes sense. - Resolve defaults with
R3x::Policy.dry_run_for(:key, dry_run). - Not every client supports dry run.
- If a client does not support it, say so clearly in the workflow or helper instead of assuming it will no-op.
-
Network calls and other flaky external interactions should be wrapped with the
retryablegem (already in the Gemfile). -
Prefer
Retryable.retryable(...)over manualbegin/rescue/retryloops in workflow code. -
Use it for HTTP requests, API calls, file downloads, or any operation where transient failures are expected and safe to retry.
-
Basic usage:
Retryable.retryable(tries: 3, on: [HTTPX::TimeoutError, HTTPX::ConnectionError]) do connection.get("/api/data").body end
-
Common options:
tries— total attempts (default 2). Set to3for two retries.on— exception class or array of classes to catch (defaultStandardError).sleep— seconds between retries (default 1). Use0to skip pauses, or a lambda for exponential backoff:lambda { |n| 4**n }.matching— retry based on exception message:matching: /timeout/i.not— exceptions that should never be retried, takes precedence overon.
-
Block receives two optional arguments: retry count so far and the last exception:
Retryable.retryable(tries: 4, on: HTTPX::HTTPError) do |retries, exception| logger.debug { "Attempt #{retries} failed: #{exception}" } if retries > 0 http.get("/endpoint") end
-
For logging retries, use
log_method:log_method = lambda do |retries, exception| logger.debug { "[Attempt ##{retries}] Retrying: #{exception.class} - #{exception.message}" } end Retryable.retryable(tries: 3, on: HTTPX::TimeoutError, log_method: log_method) do http.get("/endpoint") end
-
Avoid retrying operations that are not idempotent or that cause external side effects (e.g. sending emails, creating records) unless the remote API guarantees idempotency.
-
Full documentation: https://github.com/nfedyashev/retryable
ruby_llm has built-in automatic retry through its Faraday middleware. Defaults are applied
per RubyLLM::Context inside R3x::Client::Llm, so every workflow run gets an isolated copy.
Processes that never call ctx.client.llm do not load the gem at all.
The retry defaults are set in app/lib/r3x/client/llm.rb inside the RubyLLM.context block.
Only retry_interval has a project-level override (the gem default is 0.1); everything else uses the gem defaults.
This means the first retry waits 60 seconds, the second waits 120 seconds, then it gives up. The gem automatically retries on transient provider errors:
RubyLLM::RateLimitError(HTTP 429)RubyLLM::ServerError(HTTP 500)RubyLLM::ServiceUnavailableError(HTTP 502-504)RubyLLM::OverloadedError(HTTP 529)- Network timeouts and connection failures
If a particular workflow needs different retry behavior, pass overrides directly to
ctx.client.llm(...):
response = ctx.client.llm(
api_key_env: "GEMINI_API_KEY_MICHAL",
max_retries: 5,
retry_interval: 30.0
).message(
model: "gemini-3-flash-preview",
prompt: prompt
)Any option passed this way overrides the default for that single R3x::Client::Llm
instance. The rest of the call stays the same -- the retry is handled transparently by
ruby_llm.
OpenAI-compatible provider aliases can carry their own RubyLLM routing defaults and are automatically resolved based on the API key environment variable name:
response = ctx.client.llm(
api_key_env: "OPENCODE_GO_API_KEY"
).message(
model: "deepseek-chat",
prompt: prompt
)The provider is dynamically inferred from the environment variable name prefix (e.g. OPENCODE_GO to :opencode_go), routing the request through the lazy registered custom provider adapter while allowing the workflow to choose the exact token env variant (such as OPENCODE_GO_API_KEY, OPENCODE_GO_API_KEY_PROJECTA, or OPENCODE_GO_API_KEY_PROJECTB). Dynamic providers use RubyLLM's provider-level model assumption hook, so workflows can pass provider-specific model IDs without maintaining a static model registry.
- Do not design
#runto return a special metadata hash, status object, or summary structure. - If the last expression happens to return a value (for example an array from
filter_mapor the result of a helper), that is acceptable, but do not writerunspecifically to produce a return value unless the user explicitly asks for one. - Workflows are side-effect driven: their purpose is to fetch data, transform it, and deliver it. Prefer logging and monitoring over return-value contracts.
- If a caller needs to observe what a workflow did, inspect the durable set, the logs, or the downstream system (Discord, email, API) rather than relying on a return value from
run.