workr

A very simple R data pipeline framework. {workr} provides a minimal mental model for describing and executing step-by-step workflows. These simple workflows can be combined into configurable data pipelines that can automate large tasks.

What is {workr}?

{workr} was built to solve a specific problem: reusable, customizable data pipelines for complex clinical trial monitoring.

The core functions in {workr} were originally developed as part of the {gsm} framework for risk-based quality monitoring (RBQM). The {gsm} team developed a stable, reusable model for generating metrics to monitor clinical trials.

Our challenge was figuring out how to run those metrics across a large portfolio.

Take 30 studies with monthly snapshots, each needing 15 metrics computed in 5 steps, and you get 27,000 computations per year. Each study also has slightly different requirements, so maintaining individual scripts quickly becomes a massive pain.

{workr}'s solution: Define workflows once, track customizations in YAML files, and compose them into larger pipelines.

The original gsm::RunWorkflow functions were developed in a few hours and were seen as a stopgap until we picked a "real" pipeline.

The approach has proven to be surprisingly stable and flexible. So much so that we've created {workr} and started using it outside of our {gsm} pipelines.

{workr} workflows

{workr} workflows are list objects that are typically defined in yaml files. Each workflow has the following components:

Steps are functions that accept data and parameters, producing output that gets added to the shared data list
Meta is workflow-level configuration accessible to all steps
Spec optional data specification defining expected input data for the workflow.

The package provides three core functions for running workflows:

workr::RunStep() - execute a single workflow step
workr::RunWorkflow() - execute a workflow specification (YAML)
workr::RunWorkflows() - run multiple workflows in sequence

Sample Workflow

Define a workflow in YAML:

# hello_cars.yaml
meta:
  ID: hello_cars
  col: speed
steps:
  - name: dplyr::pull 
    output: speed
    params:
      df: df
      col: col
  - name: mean
    output: result
    params:
      lData: speed

Run it from R:

wf <- yaml::read_yaml("hello_cars.yaml")
lData <- list(df = cars)

result <- workr::RunWorkflow(
  lWorkflow = wf,
  lData = lData
)

# result = 15.4 (mean of cars$speed)

Each step in a workflow:

Calls a function (specified by step$name)
Passes parameters from params (resolving references to lData, meta, or literal values)
Saves the result to lData using the output name
Makes it available for the next step

That's it! By chaining steps (and even whole workflows) together, you can build complex pipelines from simple, reusable components.

Combining Workflows

{workr} workflows are designed to be chained together. The output of one workflow becomes the input for the next. {workr} provides several tools to support this functionality.

`workr::RunWorkflows` calls multiple workflows

While workr::RunWorkflow runs all the steps in a single workflow, workr::RunWorkflows (with an s) runs multiple workflows one after the other. Just pass a list of workflows. A few details:

workr::RunWorkflows() still takes a single lData object as input, each workflow makes its updates, and then the updated lData object is passed along to the next workflow.
workr::MakeWorkflowList() is an easy way to read a whole folder of YAML workflows into the format expected for workr::RunWorkflows().
workr::MakeWorkflowList() reorders workflows based on meta$priority, so if you need things to run in a certain order, make sure to set that parameter. If nothing is provided, priority is set to 0.

`workr::RunProject` calls multiple sets of workflows

Last but not least, sometimes you want to chain multiple calls of workr::RunWorkflows(). workr::RunProject() calls workr::RunWorkflows() for every sub-directory (phase) in a given project directory, sharing one lData object across phases.

# Project directory structure:
# project/
#   01_mapping/
#     ae.yaml
#     lb.yaml
#   02_analysis/
#     kri.yaml

results <- workr::RunProject(
  strPath = "project",
  lData = list(raw_data = my_data)
)
# Runs 01_mapping workflows first, then 02_analysis
# Outputs from 01_mapping are available as inputs to 02_analysis

Key options:

strPhases — run a subset of phases, or control their order
bReturnResult / bKeepInputData — passed through to RunWorkflows()
bRecursive — passed through to MakeWorkflowList()

Phases are sorted alphabetically by default (use numeric prefixes like 01_, 02_ to control order).

`workr::Manifest` — Reproducible Package Environments

One nice thing about {workr} workflows is that they can be combined across packages. To support this, {workr} includes tooling for creating reproducible manifests — versioned snapshots of packages and their workflows at a point in time.

pkgManifest() resolves a list of GitHub packages to specific versions and generates:

manifest.csv — pinned package versions with SHAs
rproject.toml — rv-compatible dependency file
workflows/ — merged workflow YAML files pulled from each package's inst/workflow/

Package manifests are stored on orphan branches (prefixed ss-* for "snapshot-source") and updated nightly via GitHub Actions. These branches serve as the source of truth for reproducible package environments.

📦 Demo snapshot (ss-demo) — gsm.core, gsm.mapping, gsm.kri, gsm.reporting

Visualizing Workflows

YAML workflows can be a little hard to follow, especially when you're running a few (or more than a few) in a row, so we've created some tools to help visualize and track workflows.

{workr} Shiny app

workr::DemoApp_init() launches a simple Shiny app application that lets you explore and run workflows in real time. A hosted version is available at jwildfire.shinyapps.io/workr-demoapp.

Note: The hosted app includes only the basic workflow examples (01_RunWorkflow and 02_RunWorkflows) to stay within shinyapps.io memory limits. Run workr::DemoApp_init() locally to explore all examples, including the KRI and data transformation workflows.

open.gismo

open.gismo is an end-to-end platform for running {workr} projects on GitHub.

Automation via GitHub Actions

We provide several GitHub Actions to automate snapshot creation and site deployment.

Workflow	Trigger	Purpose
`manifest.yaml`	Reusable / manual	Resolve packages and generate manifest artifacts on an orphan `ss-*` branch
`nightly-manifest.yaml`	Cron (2am UTC) / manual	Runs `manifest.yaml` for configured manifest branches
`pkgdown-with-examples.yaml`	Push to main/dev / PR / manual	Build pkgdown site with examples and slides
`pkgdown-cleanup.yaml`	PR close	Remove PR preview deployments from gh-pages
`R-CMD-check.yaml`	Push to main / PR	Standard R CMD check
`R-CMD-check-dev.yaml`	Push to dev / PR	R CMD check against dev dependencies
`qcthat.yaml`	PR / release / issue-close / manual	Generate issue-test coverage + UAT reports and fail on uncovered completed issues
`r-releaser-caller.yaml`	Manual	Release automation via r-releaser

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
R		R
inst		inst
man		man
pkgdown		pkgdown
slides		slides
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.rsconnectignore		.rsconnectignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
_pkgdown.yml		_pkgdown.yml
workr.Rproj		workr.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

workr

What is {workr}?

{workr} workflows

Sample Workflow

Combining Workflows

`workr::RunWorkflows` calls multiple workflows

`workr::RunProject` calls multiple sets of workflows

`workr::Manifest` — Reproducible Package Environments

Visualizing Workflows

{workr} Shiny app

open.gismo

Automation via GitHub Actions

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

workr

What is {workr}?

{workr} workflows

Sample Workflow

Combining Workflows

workr::RunWorkflows calls multiple workflows

workr::RunProject calls multiple sets of workflows

workr::Manifest — Reproducible Package Environments

Visualizing Workflows

{workr} Shiny app

open.gismo

Automation via GitHub Actions

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`workr::RunWorkflows` calls multiple workflows

`workr::RunProject` calls multiple sets of workflows

`workr::Manifest` — Reproducible Package Environments

Packages