Skip to content

feat(04-coding-agents): add sample 05 — autonomous coding agent with durable orchestration#1725

Open
smoell wants to merge 5 commits into
awslabs:mainfrom
smoell:main
Open

feat(04-coding-agents): add sample 05 — autonomous coding agent with durable orchestration#1725
smoell wants to merge 5 commits into
awslabs:mainfrom
smoell:main

Conversation

@smoell

@smoell smoell commented Jun 19, 2026

Copy link
Copy Markdown

Amazon Bedrock AgentCore Samples Pull Request

Important

  1. We strictly follow a issue-first approach, please first open an issue relating to this Pull Request.
  2. Once this Pull Request is ready for review please attach review ready label to it. Only PRs with review ready will be reviewed.

Issue number:

Concise description of the PR

Adds sample 05 (autonomous-coding-agent-durable) to 01-features/02-host-your-agent/01-runtime/04-coding-agents/,
because the existing coding agent samples cover interactive use cases (Claude Code, CMA) but not
event-driven headless operation with durable orchestration, cross-ticket memory, and multi-agent review.

User experience

Before: No sample for building a fully autonomous, event-driven coding backend on AgentCore Runtime. Users wanting headless ticket-to-code pipelines with retry loops, evaluator agents, and cross-ticket learning had no reference implementation.

After: Users can deploy a complete 5-stage pipeline (admission → hydrate → code loop → review → finalize) via cdk deploy --all. The sample demonstrates:

  • Lambda Durable Functions with zero-cost suspension (wait_for_callback)
  • 4 specialized runtimes (coding agent, sandbox, Swift sandbox, evaluator)
  • Cedar policy enforcement at the sandbox layer
  • AgentCore Memory for per-repo lesson recall/write
  • Deterministic test gates with orchestrator-level retry
  • A demo UI for submitting tickets and viewing results

Checklist

If your change doesn't seem to apply, please leave them unchecked.

  • I have reviewed the contributing guidelines
  • Add your name to CONTRIBUTORS.md
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Are you uploading a dataset?
  • Have you documented Introduction, Architecture Diagram, Prerequisites, Usage, Sample Prompts, and Clean Up steps in your example README?
  • I agree to resolve any issues created for this example in the future.
  • I have performed a self-review of this change
  • Changes have been tested
  • Changes are documented

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

smoell added 2 commits June 19, 2026 13:22
Event-driven headless coding backend on AgentCore Runtime with:
- Lambda Durable Function orchestrator (zero-cost suspension)
- 4 specialized runtimes (coding agent, sandbox, Swift sandbox, evaluator)
- Cedar policy enforcement at sandbox layer
- AgentCore Memory for cross-ticket learning
- Evaluator agent for read-only code review
- CDK deployment (8 stacks)
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

Latest scan for commit: 05cd489 | Updated: 2026-06-23 14:59:03 UTC

Security Scan Results

Scan Metadata

  • Project: ASH
  • Scan executed: 2026-06-23T14:58:47+00:00
  • ASH version: 3.0.0

Summary

Scanner Results

The table below shows findings by scanner, with status based on severity thresholds and dependencies:

Column Explanations:

Severity Levels (S/C/H/M/L/I):

  • Suppressed (S): Security findings that have been explicitly suppressed/ignored and don't affect the scanner's pass/fail status
  • Critical (C): The most severe security vulnerabilities requiring immediate remediation (e.g., SQL injection, remote code execution)
  • High (H): Serious security vulnerabilities that should be addressed promptly (e.g., authentication bypasses, privilege escalation)
  • Medium (M): Moderate security risks that should be addressed in normal development cycles (e.g., weak encryption, input validation issues)
  • Low (L): Minor security concerns with limited impact (e.g., information disclosure, weak recommendations)
  • Info (I): Informational findings for awareness with minimal security risk (e.g., code quality suggestions, best practice recommendations)

Other Columns:

  • Time: Duration taken by each scanner to complete its analysis
  • Action: Total number of actionable findings at or above the configured severity threshold that require attention

Scanner Results:

  • PASSED: Scanner found no security issues at or above the configured severity threshold - code is clean for this scanner
  • FAILED: Scanner found security vulnerabilities at or above the threshold that require attention and remediation
  • MISSING: Scanner could not run because required dependencies/tools are not installed or available
  • SKIPPED: Scanner was intentionally disabled or excluded from this scan
  • ERROR: Scanner encountered an execution error and could not complete successfully

Severity Thresholds (Thresh Column):

  • CRITICAL: Only Critical severity findings cause scanner to fail
  • HIGH: High and Critical severity findings cause scanner to fail
  • MEDIUM (MED): Medium, High, and Critical severity findings cause scanner to fail
  • LOW: Low, Medium, High, and Critical severity findings cause scanner to fail
  • ALL: Any finding of any severity level causes scanner to fail

Threshold Source: Values in parentheses indicate where the threshold is configured:

  • (g) = global: Set in the global_settings section of ASH configuration
  • (c) = config: Set in the individual scanner configuration section
  • (s) = scanner: Default threshold built into the scanner itself

Statistics calculation:

  • All statistics are calculated from the final aggregated SARIF report
  • Suppressed findings are counted separately and do not contribute to actionable findings
  • Scanner status is determined by comparing actionable findings to the threshold
Scanner S C H M L I Time Action Result Thresh
bandit 0 7 0 0 139 0 1.8s 7 FAILED MED (g)
cdk-nag 0 0 0 0 0 0 6.8s 0 PASSED MED (g)
cfn-nag 0 0 0 0 0 0 38ms 0 PASSED MED (g)
checkov 0 1 0 0 0 0 5.2s 1 FAILED MED (g)
detect-secre… 0 0 0 0 0 0 1.3s 0 PASSED MED (g)
grype 0 0 0 0 0 0 52.0s 0 PASSED MED (g)
npm-audit 0 0 0 0 0 0 256ms 0 PASSED MED (g)
opengrep 0 0 0 0 0 0 <1ms 0 SKIPPED MED (g)
semgrep 0 0 0 0 0 0 <1ms 0 MISSING MED (g)
syft 0 0 0 0 0 0 2.8s 0 PASSED MED (g)

Detailed Findings

Show 8 actionable findings

Finding 1: B108

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B108
  • Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/coding-agent/app.py:139-141

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

region = os.environ.get("AWS_REGION", "us-east-1")
    home = os.environ.get("HOME") or "/tmp/agenthome"
    os.makedirs(os.path.join(home, ".claude"), exist_ok=True)

Finding 2: B108

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B108
  • Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/evaluator-agent/app.py:70-72

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

region = os.environ.get("AWS_REGION", "us-east-1")
    home = os.environ.get("HOME") or "/tmp/evalhome"
    os.makedirs(os.path.join(home, ".claude"), exist_ok=True)

Finding 3: B108

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B108
  • Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/orchestrator/handler.py:186-188

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

if runtime == "swift":
        scratch = f"/tmp/spmbuild_{ticket}"
        return (f'/bin/bash -c "cd /mnt/shared/{ticket} && '

Finding 4: B602

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B602
  • Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/sandbox/app.py:285-290

Description:
subprocess call with shell=True identified, security issue.

Code Snippet:

proc = subprocess.Popen(
                argv, shell=popen_shell, cwd=popen_cwd, env=env,
                stdout=subprocess.PIPE, stderr=subprocess.PIPE,
            )
        except FileNotFoundError:
            # `unshare` not present on this image → degrade to unjailed (still cwd-confined).

Finding 5: B108

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B108
  • Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/tests/test_orchestrator.py:76-78

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

scratch = cmd.split("--scratch-path", 1)[1].split()[0]
        assert scratch.startswith("/tmp/")        # microVM-local, NOT the shared mount
        assert not scratch.startswith("/mnt/")

Finding 6: B108

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B108
  • Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/tests/test_poc_components.py:88-90

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

monkeypatch.setenv("SANDBOX_LANG", lang)
        monkeypatch.setenv("WORKSPACE_PATH", "/tmp/ws")
        import sandbox.app as app

Finding 7: B108

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B108
  • Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/tests/test_validation.py:144-146

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

link1 = os.path.join(inner_dir, "link1")
        os.symlink("/tmp", link1)
        with pytest.raises(ValidationError, match="escapes base"):

Finding 8: CKV_DOCKER_3

  • Severity: HIGH
  • Scanner: checkov
  • Rule ID: CKV_DOCKER_3
  • Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/sandbox/Dockerfile.swift:1-51

Description:
Ensure that a user for the container has been created

Code Snippet:

# Sandbox (Swift toolchain) — AgentCore Runtime data plane (ARM64-only). NOT an agent.
# Same command-executor contract as the Python sandbox, but carries the Swift
# toolchain so it can `swift build` / `swift test` repo code. One image per language;
# the business-logic service picks which sandbox runtime to invoke per ticket.
#
# Security: non-root execution (via entrypoint), pinned Python deps, HEALTHCHECK.
FROM --platform=linux/arm64 swift:6.1-jammy

WORKDIR /app

ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=1 \
    SANDBOX_LANG=swift

# The Swift image is Ubuntu-based; add Python 3 to run the same app.py executor.
# git + ca-certificates let SwiftPM resolve package dependencies.
RUN apt-get update && apt-get install -y --no-install-recommends \
        python3 python3-pip python3-venv git ca-certificates \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
# swift:6.1-jammy ships Python 3.10 with an older pip (no PEP-668 / --break-system-packages).
RUN pip3 install --no-cache-dir -r requirements.txt

COPY app.py policy_engine.py entrypoint.sh .
COPY policies/ /app/policies/

# Copy shared security module (from repo root's shared/ directory)
COPY shared_libs/shared/ /app/shared_libs/shared/

ENV MOUNT_PATH=/mnt/shared \
    WORKSPACE_PATH=/mnt/workspace \
    HOME=/mnt/workspace \
    SWIFTPM_CACHE_DIR=/mnt/workspace/.spm-cache

# git refuses to operate on dependency checkouts under .build when they are owned by a
# different uid than the runner ("detected dubious ownership"). This also affects the
# test gate, which runs `swift test` via InvokeAgentRuntimeCommand (a plain shell, not our
# _run_command path), so the setting must live in the image, applied to ALL users.
RUN git config --system --add safe.directory '*'

RUN useradd -m -u 1000 -d /home/sbx sbx \
    && chmod +x /app/entrypoint.sh

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/ping')" || exit 1

EXPOSE 8080
# entrypoint.sh ensures /mnt/workspace is writable by sbx before starting the app
CMD ["/app/entrypoint.sh"]

Report generated by Automated Security Helper (ASH) at 2026-06-23T14:58:43+00:00

smoell added 2 commits June 23, 2026 11:59
- Rename ambiguous variable l to lesson in list comprehensions
  (shared/memory.py, orchestrator/handler.py)
- Split multi-import into separate statements (sandbox/app.py)
- Remove unnecessary f-string prefix (cdk/stacks/storage_stack.py)
- Add property-based tests verifying lint compliance and behavior preservation
@smoell

smoell commented Jun 23, 2026

Copy link
Copy Markdown
Author

fix: resolve ruff lint violations (E741, E401, F541) across four files

  • Rename ambiguous variable l to lesson in list comprehensions
    (shared/memory.py, orchestrator/handler.py)
  • Split multi-import into separate statements (sandbox/app.py)
  • Remove unnecessary f-string prefix (cdk/stacks/storage_stack.py)
  • Add property-based tests verifying lint compliance and behavior preservation

…tations

- Add # nosec B108 to intentional /tmp usage in isolated containers/microVMs
- Add # nosec B602 to sandboxed subprocess executor (sandbox/app.py)
- Add # nosec B108 to test files (assertions and fixtures, not real /tmp usage)
- Add #checkov:skip=CKV_DOCKER_3 to Dockerfile.swift (entrypoint.sh handles su)
- Each annotation includes justification for audit trail

@evandrofranco evandrofranco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants