Skip to content

Harden A2A pipeline recovery, cleanup, and multimodal flows#129

Merged
guima-why merged 59 commits into
mainfrom
fix_pipeline
Jun 24, 2026
Merged

Harden A2A pipeline recovery, cleanup, and multimodal flows#129
guima-why merged 59 commits into
mainfrom
fix_pipeline

Conversation

@guima-why

@guima-why guima-why commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR is the full fix_pipeline branch relative to main. It hardens and expands pipeline mode across A2A, REPL, the selling pipeline workflow, rollback cleanup, multimodal input, durable state recovery, developer tooling, tests, website navigation, and i18n.

The branch is not limited to the final review-fix commits. The net diff against origin/main is large: 201 files changed, with roughly 46.9k insertions and 1.4k deletions.

Major Changes

A2A pipeline execution and recovery

  • Adds structured A2A pipeline input handling through PipelineUserInput, including text, JSON, raw/file URL parts, and supported image MIME types (image/png, image/jpeg, image/webp, image/gif).
  • Adds image resizing/downsampling for A2A pipeline image parts and validates unsupported image/media types before entering pipeline execution.
  • Adds A2A message validation in pipeline mode before dispatch so invalid pipeline requests fail early.
  • Preserves recoverable task ids and rich recoverable error data through JSON-RPC and A2A v0.3 compatibility paths.
  • Improves task/context ownership checks for active sidecars, terminal sidecars, waiting input, task aliasing, and recovery without an explicit taskId.
  • Hardens A2A journal, snapshot, and public recovery state handling, including schema catch-up/rebuild, replay filtering by task/context, repaired journal tails, and sanitized public recovery payloads.
  • Handles cancel handoff, active interrupts, pending input, ask_user_question resume, task resubscribe, and normal-chat handoff more consistently.
  • Keeps normal-chat A2A follow-ups after pipeline handoff in the same context while using the correct non-pipeline task flow.

Durable state, sidecar, and persistence behavior

  • Adds shared durable state-file helpers for atomic writes, cross-device safe replace, locked JSONL append, parent-directory fsync where supported, Windows replace retries, and per-path process/thread locking.
  • Moves recovery-critical pipeline sidecar/session writes onto these durable helpers.
  • Makes pipeline state persistence fail closed in key paths instead of reporting successful progress when sidecar/session state could not be written.
  • Preserves candidate/sub-pipeline execution state, failed candidate details, pending ask_user_question resume data, active attempt ids, transcripts, and rollback metadata across restarts.
  • Reorders rollback persistence so rollback sidecar state is saved before rollback events are appended.
  • Marks terminal/cleared sidecars as non-resumable while preserving files for debugging.

Rollback cleanup and cloud-resource tracking

  • Adds CleanupLedger, CleanupObserver, CleanupResource, and cleanup prompt metadata for tracking resources created by pipeline steps and later requiring cleanup.
  • Emits ResourceObservedEvent when ROS stack creation reveals a stack id, both through ros_stack and generic aliyun_api CreateStack paths.
  • Adds deploying-step hooks that record observed ROS stacks and mark only the relevant deploying attempt's stack resources for cleanup after rollback.
  • Generates hidden cleanup prompts for normal-chat handoff and preserves them through session save, context compaction, resume, prompt inspection, and A2A normal-chat startup.
  • Publishes cleanup started/progress/failed/completed events and exposes sanitized cleanup state in A2A snapshots and recovery payloads.
  • Rejects or surfaces cleanup result mismatches instead of treating unrelated tool results as cleanup success.
  • Sanitizes cleanup mismatch/errors before exposing them to users or public A2A state.

Pipeline engine behavior and completion guards

  • Adds PipelineUserInput and image-aware display text so pipeline internals can keep original content blocks while UI, telemetry, snapshots, and sidecars have stable text summaries.
  • Adds surface-specific step prompt overrides, including an A2A-specific confirm/select prompt.
  • Adds completion guard state rebuilt from tool results and resume messages, including support for guards that require a successful ros_stack result with matching stack id/status.
  • Adds pre-validation for complete_step inputs before accepting restored/precompleted tool state.
  • Adds pipeline hooks for on_resource_observed and on_rollback_cleanup_required.
  • Removes static rollback rule parsing from loaded steps and relies on explicit runtime rollback targets/state instead.
  • Passes pipeline-mode, trusted-read-directory, and relative-read-directory context into step agent loops and tool execution.

Selling pipeline workflow and prompt/skill changes

  • Updates the selling pipeline configuration with stricter base prompt sections, surface overrides, completion guards, and deployment flow constraints.
  • Adds A2A-specific candidate confirmation/select prompt behavior.
  • Improves intent parsing for ambiguous app/deployment requests, non-Alibaba-Cloud requests, explicit StackName, region, naming constraints, VPC/Zone/CIDR, and network constraints.
  • Allows intent and architecture steps to consult memory when relevant while keeping current user input authoritative.
  • Changes cost estimation guidance to prefer PreviewStack-validated pricing parameter sets when possible, carry deployment_parameters, list missing deployment parameters, and avoid writing defaults into templates as cross-step state.
  • Adds selection parameter overrides so users or A2A clients can select a candidate and pass deployment parameter overrides structurally.
  • Tightens deployment behavior around user-specified stack names, delete confirmation, parameter assembly, failed deployment reporting, and successful stack-id requirements.
  • Restricts ROS template API calls in pipeline mode to TemplateURL instead of direct TemplateBody to avoid large inline state and improve recoverability.
  • Requires unique ROS stack names and enforces stack completion guards before declaring deployment success.

REPL, normal mode, and session behavior

  • Adds real pipeline-sidecar detection on REPL startup and /resume, including user choice to resume or discard resumable sidecars.
  • Preserves normal chat history when a session has terminal/non-resumable pipeline state.
  • Adds replay-only notices for user-aborted terminal pipelines when resuming as normal chat.
  • Persists the visible pipeline user turn into the root session so resumes do not look empty after terminal pipeline sessions.
  • Filters hidden recalled-memory and cleanup prompts from session index metadata, resume picker previews, transcript rendering, and prompt summaries where appropriate.
  • Adds /prompt visibility for cleanup prompts, including a dedicated cleanup prompt tab when present.
  • Extends REPL image paste/render behavior for pipeline-visible user turns and image block display.
  • Improves read-memory errors by listing available memories when a requested memory name is missing.

Tooling, debugger, selling console, and website navigation

  • Adds a local A2A selling pipeline console server and static web UI under scripts/a2a/selling_console.py and scripts/a2a/selling_console_web/.
  • Extends the A2A debugger for pipeline state, task state, SSE proxy behavior, cwd validation notes, image uploads, and pipeline image-input limits.
  • Adds a real PTY-driven REPL pipeline E2E runner under scripts/repl/e2e/ and the pexpect dev dependency.
  • Extends A2A recovery E2E scenarios with static image fixtures, image-based waiting-input/selection/handoff/interrupt scenarios, and rollback cleanup recovery scenarios.
  • Adds website navigation/footer entries for the existing Pipeline Mode docs and test coverage for docs navigation labels.
  • Updates script READMEs for the new debugger, selling console, A2A E2E, and REPL E2E flows.

Cross-platform and path-safety fixes

  • Adds Windows-aware and case-insensitive path handling where needed for safe path checks.
  • Adds safer path-lock and state-write helpers for recovery-critical files.
  • Preserves trusted read directories for pipeline skills/tools so pipeline step file access works without broadening general read permissions.
  • Keeps normal non-pipeline ToolContext construction backward-compatible while adding pipeline-mode context fields.

i18n and tests

  • Updates the six supported gettext locale files (de, es, fr, ja, pt, zh) for new and changed user-facing strings.
  • Adds or updates focused tests across A2A, pipeline engine, cleanup, REPL UI, selling pipeline skills/prompts, session storage/indexing, path safety, tool execution, image input, website navigation, and E2E helper scripts.

Rebase Details

  • Review scope branch point requested by the user: 02f0a57b and later.
  • Current base after rebase: origin/main at 411e1baf.
  • Current branch head: a36fae75.
  • Post-rebase/follow-up commits include:
    • cdc03eea fix: address pipeline review findings
    • 83da92b3 fix: translate pipeline i18n messages
    • a36fae75 fix: complete translations after main rebase

Review Notes

  • This branch intentionally has a broad surface area. The main review focus should be A2A pipeline recovery, sidecar/snapshot/journal correctness, rollback cleanup safety, normal-mode compatibility, Windows/path behavior, i18n, and the selling pipeline prompt/skill contract.
  • Generated or temporary review docs under the repository-level docs/ directory and generated template artifacts were intentionally removed from the committed net diff earlier. The website Pipeline Mode doc already exists under website/docs/automation/pipeline-mode.md; this PR exposes it in website navigation.
  • Known non-goal: per-request Aliyun credentials from A2A metadata are preserved for normal chat runtime paths, but A2A pipeline execution does not yet pass that credential override into IacCodeA2APipelineExecutor. If product requirements expect per-request cloud credentials in pipeline mode, that should be handled as follow-up.
  • Known non-goal: the selling console web UI sends text-only requests. Image-part request coverage belongs to the A2A debugger and A2A E2E scenarios.
  • Known non-goal: the debugger/selling-console UI does not expose a per-request model selector, although iac-code a2a-client call --iac-code-model is supported by main.

Validation

  • uv run pytest tests/a2a/test_executor.py tests/a2a/test_parts.py tests/a2a/test_client.py tests/cli/test_a2a_command.py tests/services/providers/test_aliyun.py tests/a2a/test_transport_dispatcher.py - 211 passed.
  • make translate && uv run pytest tests/test_i18n.py -q - 20 passed.
  • make test - 7010 passed, 268 warnings.
  • make lint - ruff and ty passed.
  • git diff --check - passed.
  • uv run pytest tests/a2a/test_executor.py::test_pipeline_mode_image_input_checks_provider_context -q - 1 passed.

guima-why added 30 commits June 24, 2026 13:52
Record rollback cleanup resources durably, resume cleanup through normal chat, surface cleanup progress in REPL/A2A/observability, and expose cleanup prompts in prompt snapshots.
- include available memory index when read_memory receives an unknown name
- stop injecting full auto-memory content into pipeline step prompts
- keep pipeline step AgentLoop free of memory side recall
- make selling pipeline memory tool policy explicit
- guide intent and architecture steps to read memory when relevant
Add a pipeline-only relative read root so step agents can read skill reference files by relative path without changing normal REPL trusted-read behavior. Preserve ToolContext compatibility and cover the pipeline/non-pipeline boundaries with regression tests.
Comment thread scripts/a2a/selling_console.py Fixed
Comment thread scripts/a2a/selling_console.py Fixed
Comment thread scripts/a2a/selling_console.py Fixed
@guima-why guima-why marked this pull request as ready for review June 24, 2026 08:09
@guima-why guima-why merged commit aca6861 into main Jun 24, 2026
14 checks passed
@guima-why guima-why deleted the fix_pipeline branch June 24, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants