Skip to content

fix(preview): unpack all three values returned by _auto_compile_prompts#1721

Open
fmferrari wants to merge 2 commits into
microsoft:mainfrom
fmferrari:fix/preview-too-many-values-to-unpack
Open

fix(preview): unpack all three values returned by _auto_compile_prompts#1721
fmferrari wants to merge 2 commits into
microsoft:mainfrom
fmferrari:fix/preview-too-many-values-to-unpack

Conversation

@fmferrari

Copy link
Copy Markdown

Problem

apm preview <script> crashes with:

ValueError: too many values to unpack (expected 2)

This happens for every script definition regardless of content.

Root Cause

_auto_compile_prompts() in src/apm_cli/core/script_runner.py has return type tuple[str, list[str], str] and always returns three values:

return compiled_command, compiled_prompt_files, runtime_content

_execute_script_command() correctly unpacks all three:

compiled_command, compiled_prompt_files, runtime_content = self._auto_compile_prompts(command, params)

But preview() in src/apm_cli/commands/run.py unpacks only two values in both its code paths:

  1. The rich-UI try block (line ~142)
  2. The except (ImportError, NameError) fallback path (line ~182)

Fix

Add _ to capture the unused runtime_content in both call-sites, matching how _execute_script_command already handles the return value. Preview does not execute the script, so runtime_content is not needed.

Reproduction

Any apm.yml with a scripts: block:

scripts:
  start: "echo hello"
$ apm preview start
Error previewing script: too many values to unpack (expected 2)

Tested against APM 0.19.0.

_auto_compile_prompts() returns a 3-tuple:
  (compiled_command, compiled_prompt_files, runtime_content)

The preview() command was unpacking only 2 values in both its rich-UI
path and its fallback except (ImportError, NameError) path, causing
apm preview to always crash with:

  ValueError: too many values to unpack (expected 2)

Fix: add _ to capture the unused runtime_content in both call-sites.
_execute_script_command already unpacks all three values correctly.

Fixes: apm preview <script> crashes for any script definition
Copilot AI review requested due to automatic review settings June 9, 2026 21:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the preview command to match an updated return signature from script_runner._auto_compile_prompts, preventing tuple-unpacking errors during command preview.

Changes:

  • Adjusts tuple unpacking to ignore the new third return value from _auto_compile_prompts.
  • Adds an inline comment documenting the updated return shape and why only the first two items are used.

Comment thread src/apm_cli/commands/run.py Outdated
Comment thread src/apm_cli/commands/run.py Outdated
@sergio-sisternes-epam sergio-sisternes-epam added the panel-review Trigger the apm-review-panel gh-aw workflow label Jun 19, 2026
@github-actions

Copy link
Copy Markdown

APM Review Panel: ship_with_followups

Fix apm preview crash (ValueError: too many values to unpack) on .prompt.md scripts; prevents silent funnel exit at the dry-run surface new evaluators hit first.

cc @sergio-sisternes-epam -- a fresh advisory pass is ready for your review.

All six active panelists agree: the *_ unpack fix is minimal, correct, and applied symmetrically to both the rich-panel and plain-fallback branches of preview. The crash affected every script using .prompt.md prompt compilation -- which is the primary documented use case for apm preview -- meaning the command was broken for its most common invocation. The fix is safe to ship.

The most actionable open signal is two independent evidence-backed findings (both outcome: missing) from test-coverage-expert and python-architect: the 2-tuple mock in test_run_phase3.py:36 and test_run_command_surface.py:36 is a genuine regression-trap gap. Both a, b = (x, y) and a, b, *_ = (x, y) succeed against a 2-tuple mock, so the original bug could be silently re-introduced and all preview unit tests would still pass. These are load-bearing gaps, not opinion findings. Updating both mocks to (compiled_cmd, prompt_files, None) is the highest-priority follow-up, feasible in-PR before merge. Separately, no integration test invokes the preview CLI end-to-end against a real ScriptRunner -- the tier-floor for a CLI surface -- leaving a second regression-trap gap.

Three panelists (devx-ux, supply-chain-security, oss-growth) independently surface the runtime_content transparency gap from different angles: UX parity with run, secure-by-default inspection before sending to an external LLM, and adoption trust. This is a pre-existing gap that was previously unreachable because the command crashed first; this PR makes it reliably visible on every script using a runtime command. The convergence across three independent personas strengthens the case for a tracked follow-up to surface runtime_content in preview output, but it does not prevent merge.

Aligned with: Pragmatic as npm -- Preview is APM's --dry-run equivalent; the crash made it unreliable for the most common use case (.prompt.md scripts), breaking parity with the npm/cargo/pip dry-run mental model that evaluators bring to first contact. OSS community-driven -- A crash in the first command a cautious evaluator reaches for destroys trust before the user files an issue. This fix directly improves first-impression quality on the highest-stakes entry point into the tool.

Growth signal. The oss-growth-hacker flags a strategic framing worth internalizing: safe-mode surfaces (preview, --dry-run, compile --dry-run) are first-class demo hardening targets. A crash in preview at exactly the moment a new user is building confidence is a silent funnel exit with outsized adoption impact, because evaluators hit these paths before any other. This fix should be surfaced in release notes under a 'preview now works with prompt-compiled scripts' angle to reinforce the APM-is-safe-to-evaluate story.

Panel summary

Persona B R N Takeaway
Python Architect 0 2 1 Fix is correct; update 2 unit test mocks to 3-tuple and fix return annotation to str or None to prevent future arity drift.
CLI Logging Expert 0 0 1 Fix is output-neutral and correct; both branches updated symmetrically. One nit: comment could name the discarded field for future maintainers.
DevX UX Expert 0 1 1 Fix is correct and symmetric across both render paths. runtime_content (compiled AI prompt body) is silently dropped -- a pre-existing gap now reliably reachable after this fix.
Supply Chain Security Expert 0 0 1 Fix introduces no new attack surface. *_ discard of runtime_content is correct for a preview-only path; no execution side-effects added.
OSS Growth Hacker 0 1 1 preview crash on .prompt.md scripts is a silent first-run killer; fix is correct but the [Unreleased] CHANGELOG is empty and deserves a visible user-facing entry.
Doc Writer 0 1 0 Docs for preview command are accurate; CHANGELOG [Unreleased] is missing a Fixed entry for this user-visible crash fix.
Test Coverage Expert 0 2 0 Bug fix has no regression trap: 2-tuple mock makes all preview unit tests pass both before and after the fix. Update mock to 3-tuple; add integration test.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [Python Architect + Test Coverage Expert] Update _auto_compile_prompts mock from 2-tuple to 3-tuple in test_run_phase3.py:36 and test_run_command_surface.py:36 -- Both panelists independently produced evidence-backed findings (outcome: missing) showing the 2-tuple mock makes all preview unit tests pass both before and after the fix; no CI signal guards against arity regression. Changing to (compiled_cmd, prompt_files, None) closes the regression trap with a one-line fix per file. Feasible in-PR before merge.
  2. [Test Coverage Expert] Add integration test TestPreviewCLIIntegration: CliRunner().invoke(preview, ['greet']) with a real ScriptRunner against an apm.yml fixture -- All existing preview tests are unit-tier with a mocked ScriptRunner. The original crash would only surface at integration tier. Without this, the CLI surface has no integration-tier regression trap.
  3. [Doc Writer + OSS Growth Hacker] Add [Unreleased] ### Fixed entry in CHANGELOG.md for the apm preview ValueError crash fix -- The [Unreleased] block is currently empty. This is a user-visible crash on the primary documented use case of apm preview. Omitting an entry makes the fix invisible to users scanning release notes deciding whether to upgrade.
  4. [Python Architect] Fix return annotation on _auto_compile_prompts in script_runner.py:226 from tuple[str, list[str], str] to tuple[str, list[str], str or None] -- runtime_content initializes to None unconditionally at line 239 and is only set inside the is_runtime_cmd branch. The declared str annotation is false; the docstring already acknowledges runtime_content_or_none.
  5. [DevX UX Expert + Supply Chain Security Expert] Surface runtime_content in preview output when non-None, labeled 'Compiled prompt content (sent to runtime)' -- Three panelists flag this pre-existing gap: for LLM/copilot runtime scripts, preview drops the compiled prompt body with *_ and users cannot inspect what text would be sent to the AI service before execution. Mirror script_runner.py:154's format_content_preview().

Architecture

classDiagram
    direction LR
    class run_py {
        <<Module>>
        +preview(ctx, script_name, param, verbose)
    }
    class ScriptRunner {
        +list_scripts() dict
        +_execute_script_command(command, params) bool
        +_auto_compile_prompts(command, params) tuple
    }
    class CompileResult {
        <<ValueObject>>
        compiled_command str
        compiled_prompt_files list
        runtime_content str
    }
    run_py ..> ScriptRunner : instantiates
    ScriptRunner ..> CompileResult : returns 3-tuple
    run_py ..> CompileResult : partial unpack star-discard
    note for CompileResult "annotation declares str but value initializes to None -- fix: str or None"
    class run_py:::touched
    class CompileResult:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A["apm run preview SCRIPT"] --> B["ScriptRunner.list_scripts()"]
    B --> C{script found?}
    C -- no --> D[sys.exit 1]
    C -- yes --> E["try: _rich_panel Original command"]
    E --> F["_auto_compile_prompts(command, params)"]
    F -->|3-tuple| G["compiled_command, compiled_prompt_files, star_ = ...\nrun.py:141 FIXED hunk-1"]
    G --> H{compiled_prompt_files?}
    H -- yes --> I["_rich_panel: Compiled command + files"]
    H -- no --> J["_rich_panel: No compilation + warning"]
    I --> K["logger.success()"]
    J --> K
    E -- ImportError or NameError --> M["fallback: click.echo Original"]
    M --> N["_auto_compile_prompts(command, params)"]
    N -->|3-tuple| O["compiled_command, compiled_prompt_files, star_ = ...\nrun.py:181 FIXED hunk-2"]
    O --> K
Loading

Recommendation

Ship this PR. The fix is correct, minimal, and unanimous across all active panelists -- no findings prevent merge. Track the 3-tuple mock update (test_run_phase3.py:36 + test_run_command_surface.py:36) as the highest-priority follow-up: it is a genuine regression trap that a one-line change per file closes and the only gap that would allow this exact class of arity bug to recur silently through CI. Add a CHANGELOG [Unreleased] Fixed entry before the next release cycle. The runtime_content preview transparency gap is a meaningful follow-up but is pre-existing and warrants its own tracked issue rather than delaying this crash fix.


Full per-persona findings

Python Architect

  • [recommended] Return annotation tuple[str, list[str], str] is wrong; runtime_content initializes to None at src/apm_cli/core/script_runner.py:226
    At script_runner.py:239, runtime_content = None is set unconditionally. It is only assigned a non-None value inside the if is_runtime_cmd: branch (line 263-264). The declared return type lies to the type checker and to callers. The docstring at line 234 already acknowledges this: runtime_content_or_none. The correct annotation is -> tuple[str, list[str], str | None].
    Suggested: ) -> tuple[str, list[str], str | None]: # aligns with docstring line 234

  • [recommended] Two unit test mocks set _auto_compile_prompts return to 2-tuple, hiding future arity regressions at tests/unit/commands/test_run_phase3.py:36
    test_run_phase3.py:36 and test_run_command_surface.py:36 both set runner._auto_compile_prompts.return_value = (compiled_cmd, prompt_files). This is the exact mechanism that let the original bug survive CI: the mock satisfied the old 2-var unpack and satisfies the fixed *_ unpack equally, so the tests are neutral to the regression. The mocks must reflect the real 3-slot contract.
    Suggested: runner._auto_compile_prompts.return_value = (compiled_cmd, prompt_files, None)
    Proof (missing at unit): tests/unit/commands/test_run_phase3.py::_make_script_runner -- proves: unit mock reflects the real 3-slot return contract so arity regressions are caught before runtime.

  • [nit] NamedTuple return type would make the 3-slot contract self-documenting and prevent positional drift
    Three callers each needing different slots (preview needs slots 0-1, _execute_script_command needs all 3) is the threshold where named access (result.runtime_content) is cleaner than positional unpacking. A NamedTuple (e.g. CompileResult) is backward-compatible with tuple unpacking and would have prevented both the original ValueError and the mock arity mismatch. The *_ fix is correct and idiomatic at this scope; NamedTuple refactor is deferrable.

CLI Logging Expert

  • [nit] Comment could name the discarded return value to aid future maintainers at src/apm_cli/commands/run.py:140
    The *_ discard is correct, but the comment does not name the thrown-away field. If _auto_compile_prompts gains a 4th return value later, a reader has no signal about what was already intentionally dropped. Naming runtime_content makes the contract self-documenting.
    Suggested: # preview only needs the command and compiled prompt file list; runtime_content (3rd return value) is intentionally discarded here.

DevX UX Expert

  • [recommended] runtime_content (compiled prompt body sent to AI runtime) is silently discarded in preview; users cannot see what text the AI would receive at src/apm_cli/commands/run.py:141
    apm preview's stated contract is 'show what would be executed'. For scripts using copilot/codex/llm/gemini runtimes, _auto_compile_prompts returns runtime_content -- the full compiled prompt text piped to the AI subprocess. The run command already surfaces this via format_content_preview(). preview drops it with *_ and only shows the file path list. This gap was pre-existing but unreachable (command crashed first); this PR makes it reliably visible on every runtime-command script.
    Suggested: After compiled_command, compiled_prompt_files, *rest = ..., extract runtime_content = rest[0] if rest else None and render a panel titled 'Compiled prompt content (sent to runtime)' when non-None. Mirror script_runner.py:154's format_content_preview().

  • [nit] help= text 'Preview a script's compiled prompt files' undersells the command at src/apm_cli/commands/run.py:95
    The current help string mentions only the file list. A developer scanning apm --help cannot infer the dry-run scope. Compare npm --dry-run, cargo --dry-run, pip install --dry-run.
    Suggested: Change to help="Dry-run a script: show original and compiled commands without executing".

Supply Chain Security Expert

  • [nit] runtime_content (compiled LLM prompt body) is discarded silently; preview never shows what content would be sent to external runtimes at src/apm_cli/commands/run.py:141
    For commands targeting copilot/codex/llm/gemini, runtime_content is the exact text that would be submitted to the external service. Being able to inspect what will be sent to an external LLM before execution is a meaningful trust signal and aligns with secure-by-default posture. The fix is correct (the crash was worse); surfacing runtime_content in preview output is a worthwhile follow-up.

OSS Growth Hacker

  • [recommended] CHANGELOG [Unreleased] is empty; this crash fix needs a user-facing Fixed entry before the next release at CHANGELOG.md:8
    apm run preview is documented as the canonical 'inspect before run' path. A ValueError: too many values to unpack crash on any .prompt.md script is a silent funnel exit -- the user sees a Python traceback, infers the tool is broken, and abandons before filing an issue. That regression belongs in the CHANGELOG under ### Fixed.
    Suggested: Add to [Unreleased] ### Fixed: apm run preview no longer crashes with ValueError: too many values to unpack when the script uses .prompt.md prompt compilation. (fix(preview): unpack all three values returned by _auto_compile_prompts #1721)

  • [nit] Release note angle: 'preview now works with prompt-compiled scripts' reinforces the preview-before-run mental model for social/newsletter copy.

Doc Writer

  • [recommended] CHANGELOG [Unreleased] has no entry for this apm preview crash fix at CHANGELOG.md
    The project follows Keep a Changelog strictly. This PR resolves a user-visible ValueError that breaks apm preview whenever _auto_compile_prompts returns more than 2 values -- the primary documented use-case. Omitting a CHANGELOG entry means the fix is invisible to users reading release notes.
    Suggested: Add to ## [Unreleased] / ### Fixed: apm preview no longer raises ValueError: too many values to unpack when _auto_compile_prompts returns additional values; previewing scripts that reference .prompt.md files now works as documented.

Test Coverage Expert

  • [recommended] Missing regression trap: _make_script_runner returns 2-tuple so no test fails if the arity unpack bug is re-introduced at tests/unit/commands/test_run_phase3.py:36
    The bug was ValueError: too many values to unpack. The fix uses *_ but the test mock returns a 2-tuple (compiled_cmd, prompt_files). Both a, b = (x, y) and a, b, *_ = (x, y) succeed with a 2-tuple mock -- all preview tests pass BOTH before and after the fix. Verified via grep: both test_run_phase3.py:36 and test_run_command_surface.py:36 set return_value = (compiled_cmd, prompt_files).
    Suggested: Change mock to: runner._auto_compile_prompts.return_value = (compiled_cmd, prompt_files, runtime_content) and update all callsites to pass 3-tuples, e.g. auto_compile_result=("make all", [], None).
    Proof (missing at unit): tests/unit/commands/test_run_phase3.py::_make_script_runner -- proves: apm preview raises ValueError if _auto_compile_prompts returns more values than the unpack target expects [devx]

  • [recommended] No integration test invokes the preview CLI command with a real ScriptRunner; CLI surface tier-floor is integration-with-fixtures at tests/integration/test_script_runner_phase3w4.py
    All existing preview tests are unit-tier with a mocked ScriptRunner. Grep confirmed: no integration test in tests/integration/ invokes preview CLI with a real ScriptRunner. The original crash would only surface in an integration test that calls CliRunner().invoke(preview, ['myscript']) with a real ScriptRunner and real .prompt.md fixture.
    Suggested: Add TestPreviewCLIIntegration to tests/integration/test_script_runner_phase3w4.py (or a new test_run_preview_e2e.py): CliRunner().invoke(preview, ['greet']) against a real apm.yml fixture with a real ScriptRunner instance.
    Proof (missing at integration-with-fixtures): tests/integration/test_script_runner_phase3w4.py::TestPreviewCLIIntegration::test_preview_with_real_script_runner_no_crash -- proves: apm preview executes end-to-end through real ScriptRunner without ValueError [devx]

Auth Expert -- inactive

Only src/apm_cli/commands/run.py is changed; no auth, token management, credential resolution, host classification, or remote-host fallback code is touched.

Performance Expert -- inactive

The only touched file is src/apm_cli/commands/run.py (preview command display path); no cache, transport, materialize, resolve, or install pipeline code is affected.

This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.

Generated by PR Review Panel for issue #1721 · sonnet46 10.3M ·

@github-actions github-actions Bot removed the panel-review Trigger the apm-review-panel gh-aw workflow label Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants