fix: pin_module_config no longer silently destroys pipelines (audit wave 3) by JamesHWade · Pull Request #116 · JamesHWade/dsprrr

JamesHWade · 2026-06-14T22:20:46Z

Third wave from the codebase audit. Stacked on #115 (wave 2).

pin_module_config silently destroyed pipelines (`dsprrr-07u`)

module_kind() collapsed every unrecognised module class to "predict", so a PipelineModule passed the v2-persistence allow-list and serialised as a bare PredictModule. A pinned pipeline restored as a single empty PredictModule, silently dropping every step and its bootstrapped per-step demos. The same hole affected other unsupported types (Ensemble, RAG, …).

module_kind() now falls back to the actual class name instead of "predict", so the persistence allow-list correctly rejects unsupported types.
serialize_module_config_v2() detects PipelineModule explicitly and aborts with a clear "pipeline persistence not yet supported" message.

Regression test verified to reproduce the silent data loss before the fix. Full suite: 3016 passing, no new failures. air format --check . passes.

🤖 Generated with Claude Code

Highest-value, lowest-risk findings from the codebase audit. Release readiness (dsprrr-w0e): - Lower the ellmer requirement to the released >= 0.4.1 and drop the CRAN-forbidden "Remotes: tidyverse/ellmer" entry. - Remove unused Suggests: future and lifecycle. - Add NEWS.md. Correctness: - print.dsprrr_batch_result now counts failed items. It read the error from item$error / the output object, but create_error_result() stores it in metadata$error with output = NA, so failed batches always printed "All items completed successfully" (dsprrr-8l0). - Unknown or misspelled signature types now error with a closest-match suggestion instead of silently becoming string fields. validate_type_string() was fully written but never called (dsprrr-47p). - module_trials() warns and returns an inspectable summary when every trial fails, instead of a cryptic "attempt to select less than one element" (dsprrr-hew). - .cache is accepted and validated by all module types and pipelines, not only PredictModule, where it previously fell into ... and was dropped with a spurious "unknown input" warning (dsprrr-jup). Test isolation: - Add tests/testthat/setup.R plus a DSPRRR_CACHE_PATH env var so the suite no longer writes the disk cache into the package source tree (dsprrr-63v). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…mean (wave 2) BootstrapFewShot harvests demos with field-aware metrics (dsprrr-s3b): - The single-module compile path passed the bare cell value as `expected`, so a field-aware metric (e.g. metric_exact_match(field = "answer"), the documented default) errored inside extract_field(), was scored NA, and bootstrapped ZERO demos. It now passes the full training row when the metric declares a field, mirroring the pipeline path and evaluate(). Tracks metric_field separately from output_col so the distinction survives. evaluate() counts failed rows as 0 in mean_score (dsprrr-tn1): - mean_score used na.rm = TRUE, so failed rows dropped out of the number that drives all optimizer selection — a config correct on the 2 rows it answers could outrank one robust across 10. Failures now count as 0 (matching DSPy). The raw `scores` vector keeps NA and n_errors still reports failures. Both fixes have regression tests verified to fail without the change. Full suite: 3009 passing, no new failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rollout_id was fully implemented in the cache layer but no caller ever passed it. With caching enabled (the default), BestOfN/Refine/Assert made byte-identical attempts: attempt 1 missed the cache, attempts 2..N hit it and returned the same response, so effective N was always 1. Thread rollout_id end to end: - call_llm_request() and PredictModule's private call_llm() / forward() now accept rollout_id and pass it to cached_chat_structured(), which already folds it into the cache key. - BestOfN, Refine, and Assert pass rollout_id = i on each attempt, so every attempt gets a distinct cache key. Regression tests (verified to fail without the fix): - forward() threads rollout_id into the cache key (repeat id hits cache, new id misses). - BestOfN and Refine pass 1,2,3 to successive attempts. Note: a per-attempt temperature override (also in dsprrr-pcd) and optimizer-side diversity (SIMBA/GEPA/COPRO/bootstrap teacher) remain follow-ups. Full suite: 3015 passing, no new failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…07u) module_kind() collapsed every unrecognised module class to "predict", so a PipelineModule (and EnsembleModule, RAGModule, ...) passed the v2-persistence allow-list and serialised as a bare PredictModule. A pinned pipeline restored as a single empty PredictModule, silently dropping every step and its bootstrapped per-step demos. - module_kind() now falls back to the actual class name instead of "predict", so unsupported types are correctly rejected by the persistence allow-list. - serialize_module_config_v2() detects PipelineModule explicitly and aborts with a clear "pipeline persistence not yet supported" message pointing users to pin each step individually. Regression test: pin_module_config() on a pipeline now errors instead of silently losing data. Full suite: 3016 passing, no new failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector · 2026-06-14T22:20:51Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copilot

Pull request overview

This PR fixes a critical persistence bug where pin_module_config() could silently serialize unsupported module types (notably PipelineModule) as a bare PredictModule, causing pipelines (and their step demos) to be lost on restore. It tightens module-kind detection and adds an explicit guardrail to prevent pipeline persistence until it’s properly supported.

Changes:

Update module_kind() to fall back to the actual module class name (instead of "predict") for unrecognized module classes, preventing unsupported types from passing the v2 persistence allow-list.
Add an explicit PipelineModule check in serialize_module_config_v2() that aborts with a clear, user-actionable error message.
Add a regression test ensuring pipelines cannot be pinned silently.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
tests/testthat/test-orchestration.R	Adds a regression test asserting `pin_module_config()` errors on pipelines rather than silently degrading them.
R/run.R	Fixes `module_kind()` fallback behavior to avoid misclassifying unknown module classes as `"predict"`.
R/orchestration.R	Explicitly rejects `PipelineModule` persistence with a clear `cli_abort()` message before serialization proceeds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Codex/Copilot review of PR #114: - Module$run() now validates .cache via validate_cache_arg(), so callers reaching the R6 method directly get the same loud error as the run() generic instead of a silently-forwarded malformed value. - default_disk_cache_path() treats an empty DSPRRR_CACHE_PATH as unset (nzchar guard) so the disk cache never resolves to "". - Test teardown restores the cache default with an explicit disk_path, independent of LIFO env-var cleanup ordering. - Reword the .cache NEWS bullet to scope it accurately: it is validated and plumbed everywhere, but direct-call modules don't cache yet (dsprrr-aa2). - Move the BootstrapFewShot (s3b) and evaluate (tn1) NEWS bullets out of wave 1 -- their code lands in wave 2 -- so each stacked PR's changelog matches its own diff. - Comment the batch-print test to explain why cli_fmt() is the correct capture (cli writes to stdout; capture.output(type="message") would miss it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Regression from dsprrr-pcd: BestOfN/Refine/Assert passed rollout_id = i explicitly while also spreading ..., so a nested wrapper (refine(best_of_n(mod)), best_of_n(refine(mod)), with_assertions(best_of_n(mod))) handed the inner forward() rollout_id twice -> "formal argument matched by multiple actual arguments". - Each wrapper forward() now takes an explicit rollout_id = NULL formal, so an inherited id is consumed out of ... instead of being re-spread. - New compose_rollout_id() helper (R/utils.R) folds the inherited id into the per-attempt index ("2" at top level, "2.1" when nested), keeping every (outer, inner) attempt's cache key unique. - Document @param rollout_id on the wrapper and PredictModule forward() methods (addresses Copilot review on PR #115). - Add regression tests for all three nesting combinations; update the two existing pcd tests for the now-character ids. - Add the s3b, tn1, and pcd/wx6 NEWS bullets here (their code lives in wave 2), so PR #115's changelog matches its diff (dsprrr-6qo). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The pipeline-persistence guard (dsprrr-07u) had no changelog entry. Add it here so PR #116's NEWS matches its own diff. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

JamesHWade · 2026-06-15T22:04:47Z

No actionable inline comments here (Copilot found none; Codex hit its usage limit). The wave-1 and wave-2 review fixes were merged forward into this branch, and this PR adds the pin_module_config (dsprrr-07u) NEWS bullet so its changelog matches its diff. CI is green.

…ave3 # Conflicts: # NEWS.md

JamesHWade and others added 4 commits June 14, 2026 18:00

Copilot AI review requested due to automatic review settings June 14, 2026 22:20

Copilot started reviewing on behalf of JamesHWade June 14, 2026 22:21 View session

Copilot AI reviewed Jun 14, 2026

View reviewed changes

JamesHWade and others added 5 commits June 15, 2026 17:55

Merge branch 'feature/audit-fixes-wave1' into feature/audit-fixes-wave2

e3db7a6

Merge branch 'feature/audit-fixes-wave2' into feature/audit-fixes-wave3

6a9b2ea

docs: add pin_module_config NEWS bullet for wave 3 (dsprrr-6qo)

43c4e11

The pipeline-persistence guard (dsprrr-07u) had no changelog entry. Add it here so PR #116's NEWS matches its own diff. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into feature/audit-fixes-w…

d8f9edf

…ave3 # Conflicts: # NEWS.md

JamesHWade changed the base branch from feature/audit-fixes-wave2 to main June 15, 2026 22:41

JamesHWade merged commit e0a91ed into main Jun 15, 2026
16 checks passed

JamesHWade deleted the feature/audit-fixes-wave3 branch June 15, 2026 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pin_module_config no longer silently destroys pipelines (audit wave 3)#116

fix: pin_module_config no longer silently destroys pipelines (audit wave 3)#116
JamesHWade merged 10 commits into
mainfrom
feature/audit-fixes-wave3

JamesHWade commented Jun 14, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

JamesHWade commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JamesHWade commented Jun 14, 2026

pin_module_config silently destroyed pipelines (dsprrr-07u)

Uh oh!

chatgpt-codex-connector Bot commented Jun 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

JamesHWade commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pin_module_config silently destroyed pipelines (`dsprrr-07u`)