This Issue was drafted by GitHub Copilot using Claude Opus 4.6 and reviewed by Jeremy Wildfire (@jwildfire)
Improve RunProject data handoff and per-folder configuration
Summary
RunProject() currently merges all workflow results flat into current_data and passes everything forward. This does not match real-world usage where each phase needs specific data shaped differently. The goal is to get to the point where sequential calls to runWorkflows() (e.g. in this open.gismo demo) can be replaced by a single call to workr::RunProject().
Related: #10 (original RunProject implementation, now closed).
Problem
The demo runWorkflows.R script runs four phases with custom data plumbing between each:
| Phase |
Input data |
Notes |
| 1_mappings |
lRaw (CSV files) |
Straightforward — raw data in, mapped data out |
| 2_metrics |
c(mapped, list(lWorkflows = metrics_wf)) |
Needs mapped data plus the workflow definitions themselves |
| 3_reporting |
c(mapped, list(lAnalyzed = analyzed, lWorkflows = metrics_wf, dSnapshotDate = Sys.Date(), Reporting_Results_Longitudinal = NULL)) |
Wraps phase 2 results inside lAnalyzed, carries forward mapped (not all accumulated results), injects non-workflow data (dSnapshotDate, NULL placeholder) |
| 4_modules |
reporting |
Only phase 3 results — not the full accumulated data |
Additionally, between phases 2 and 3, a data transformation coerces GroupID to character across all analyzed results.
Gaps in current RunProject()
- Naive data merging — All phase results are merged flat into
current_data. There's no way to wrap results (e.g., lAnalyzed = analyzed), select a subset, or exclude prior phases.
- No per-folder configuration — Each phase may need custom input mappings, extra static data, or post-processing hooks. There's no mechanism for this.
- Can't pass workflow metadata as data — Phase 2 results need the workflow list passed alongside actual data (
lWorkflows = metrics_wf), and phase 3 passes it again. RunProject has no way to inject workflow definitions into lData.
- No inter-phase transformations — The GroupID coercion between phases 2 and 3 has no home in the current architecture.
- No control over what data carries forward — Phase 4 should only receive phase 3 results, but
RunProject accumulates everything.
Proposed Solution
Per-folder config file (_config.yaml)
Each phase folder can optionally include a _config.yaml file that controls data flow and phase behavior. Example:
# workflows/2_metrics/_config.yaml
input:
from_phases: [1_mappings] # Which prior phase results to include (default: all)
include_workflows: true # Inject lWorkflows for this phase into lData
extra: # Additional static data to merge into lData
some_param: "value"
output:
wrap_as: null # Optionally wrap all results under a named key (e.g., "lAnalyzed")
transform: null # Optional post-processing (see below)
# workflows/3_reporting/_config.yaml
input:
from_phases: [1_mappings] # Only mapped data, not metrics
from_results: # Pull specific named results from prior phases
lAnalyzed: 2_metrics # Wrap all phase 2 results into lAnalyzed
include_workflows:
from_phase: 2_metrics # Inject workflow definitions from phase 2 as lWorkflows
extra:
dSnapshotDate: "Sys.Date()"
Reporting_Results_Longitudinal: null
# workflows/4_modules/_config.yaml
input:
from_phases: [3_reporting] # Only phase 3 results
Possible alternative: callback functions
Instead of (or in addition to) YAML config, support callback functions:
RunProject(
strPath = "workflows",
lData = lRaw,
fnPhaseInput = function(phase, lPhaseResults, lData, lWorkflowsByPhase) {
# Custom logic to build lData for each phase
},
fnPhaseOutput = function(phase, result) {
# Post-processing (e.g., GroupID coercion)
}
)
Acceptance Criteria
Implementation Notes
- The
_config.yaml approach is more declarative and aligns with the YAML-driven workflow philosophy, but may not handle arbitrary transformations (like GroupID coercion).
- The callback approach is more flexible but less portable.
- A hybrid approach (YAML for common patterns + optional callbacks for custom logic) may be best.
- Consider whether
RunWorkflows() itself needs changes or if this is purely a RunProject concern.
This Issue was drafted by GitHub Copilot using Claude Opus 4.6 and reviewed by Jeremy Wildfire (@jwildfire)
Improve RunProject data handoff and per-folder configuration
Summary
RunProject()currently merges all workflow results flat intocurrent_dataand passes everything forward. This does not match real-world usage where each phase needs specific data shaped differently. The goal is to get to the point where sequential calls torunWorkflows()(e.g. in thisopen.gismodemo) can be replaced by a single call toworkr::RunProject().Related: #10 (original
RunProjectimplementation, now closed).Problem
The demo
runWorkflows.Rscript runs four phases with custom data plumbing between each:lRaw(CSV files)c(mapped, list(lWorkflows = metrics_wf))c(mapped, list(lAnalyzed = analyzed, lWorkflows = metrics_wf, dSnapshotDate = Sys.Date(), Reporting_Results_Longitudinal = NULL))lAnalyzed, carries forwardmapped(not all accumulated results), injects non-workflow data (dSnapshotDate,NULLplaceholder)reportingAdditionally, between phases 2 and 3, a data transformation coerces
GroupIDto character across all analyzed results.Gaps in current
RunProject()current_data. There's no way to wrap results (e.g.,lAnalyzed = analyzed), select a subset, or exclude prior phases.lWorkflows = metrics_wf), and phase 3 passes it again.RunProjecthas no way to inject workflow definitions intolData.RunProjectaccumulates everything.Proposed Solution
Per-folder config file (
_config.yaml)Each phase folder can optionally include a
_config.yamlfile that controls data flow and phase behavior. Example:Possible alternative: callback functions
Instead of (or in addition to) YAML config, support callback functions:
Acceptance Criteria
runWorkflows.Rfrom the open.gismo demo branch can be fully replicated with a singleRunProject()call (plus initial data loading)RunProject()calls without config files continue to workImplementation Notes
_config.yamlapproach is more declarative and aligns with the YAML-driven workflow philosophy, but may not handle arbitrary transformations (like GroupID coercion).RunWorkflows()itself needs changes or if this is purely aRunProjectconcern.