`feat(04-coding-agents): add sample 05 — autonomous coding agent with durable orchestration` by smoell · Pull Request #1725 · awslabs/agentcore-samples

smoell · 2026-06-19T11:31:02Z

Amazon Bedrock AgentCore Samples Pull Request

Important

We strictly follow a issue-first approach, please first open an issue relating to this Pull Request.
Once this Pull Request is ready for review please attach review ready label to it. Only PRs with review ready will be reviewed.

Issue number:

Concise description of the PR

Adds sample 05 (autonomous-coding-agent-durable) to 01-features/02-host-your-agent/01-runtime/04-coding-agents/,
because the existing coding agent samples cover interactive use cases (Claude Code, CMA) but not
event-driven headless operation with durable orchestration, cross-ticket memory, and multi-agent review.

User experience

Before: No sample for building a fully autonomous, event-driven coding backend on AgentCore Runtime. Users wanting headless ticket-to-code pipelines with retry loops, evaluator agents, and cross-ticket learning had no reference implementation.

After: Users can deploy a complete 5-stage pipeline (admission → hydrate → code loop → review → finalize) via cdk deploy --all. The sample demonstrates:

Lambda Durable Functions with zero-cost suspension (wait_for_callback)
4 specialized runtimes (coding agent, sandbox, Swift sandbox, evaluator)
Cedar policy enforcement at the sandbox layer
AgentCore Memory for per-repo lesson recall/write
Deterministic test gates with orchestrator-level retry
A demo UI for submitting tickets and viewing results

Checklist

If your change doesn't seem to apply, please leave them unchecked.

I have reviewed the contributing guidelines
Add your name to CONTRIBUTORS.md
Have you checked to ensure there aren't other open Pull Requests for the same update/change?
Are you uploading a dataset?
Have you documented Introduction, Architecture Diagram, Prerequisites, Usage, Sample Prompts, and Clean Up steps in your example README?
I agree to resolve any issues created for this example in the future.
I have performed a self-review of this change
Changes have been tested
Changes are documented

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

Event-driven headless coding backend on AgentCore Runtime with: - Lambda Durable Function orchestrator (zero-cost suspension) - 4 specialized runtimes (coding agent, sandbox, Swift sandbox, evaluator) - Cedar policy enforcement at sandbox layer - AgentCore Memory for cross-ticket learning - Evaluator agent for read-only code review - CDK deployment (8 stacks)

…dable to other frameworks

github-actions · 2026-06-22T16:37:12Z

Latest scan for commit: 05cd489 | Updated: 2026-06-23 14:59:03 UTC

Security Scan Results

Scan Metadata

Project: ASH
Scan executed: 2026-06-23T14:58:47+00:00
ASH version: 3.0.0

Summary

Scanner Results

The table below shows findings by scanner, with status based on severity thresholds and dependencies:

Column Explanations:

Severity Levels (S/C/H/M/L/I):

Suppressed (S): Security findings that have been explicitly suppressed/ignored and don't affect the scanner's pass/fail status
Critical (C): The most severe security vulnerabilities requiring immediate remediation (e.g., SQL injection, remote code execution)
High (H): Serious security vulnerabilities that should be addressed promptly (e.g., authentication bypasses, privilege escalation)
Medium (M): Moderate security risks that should be addressed in normal development cycles (e.g., weak encryption, input validation issues)
Low (L): Minor security concerns with limited impact (e.g., information disclosure, weak recommendations)
Info (I): Informational findings for awareness with minimal security risk (e.g., code quality suggestions, best practice recommendations)

Other Columns:

Time: Duration taken by each scanner to complete its analysis
Action: Total number of actionable findings at or above the configured severity threshold that require attention

Scanner Results:

PASSED: Scanner found no security issues at or above the configured severity threshold - code is clean for this scanner
FAILED: Scanner found security vulnerabilities at or above the threshold that require attention and remediation
MISSING: Scanner could not run because required dependencies/tools are not installed or available
SKIPPED: Scanner was intentionally disabled or excluded from this scan
ERROR: Scanner encountered an execution error and could not complete successfully

Severity Thresholds (Thresh Column):

CRITICAL: Only Critical severity findings cause scanner to fail
HIGH: High and Critical severity findings cause scanner to fail
MEDIUM (MED): Medium, High, and Critical severity findings cause scanner to fail
LOW: Low, Medium, High, and Critical severity findings cause scanner to fail
ALL: Any finding of any severity level causes scanner to fail

Threshold Source: Values in parentheses indicate where the threshold is configured:

(g) = global: Set in the global_settings section of ASH configuration
(c) = config: Set in the individual scanner configuration section
(s) = scanner: Default threshold built into the scanner itself

Statistics calculation:

All statistics are calculated from the final aggregated SARIF report
Suppressed findings are counted separately and do not contribute to actionable findings
Scanner status is determined by comparing actionable findings to the threshold

Scanner	C	L	Time	Action	Result	Thresh
bandit	7	139	1.8s	7	FAILED	MED (g)
cdk-nag	0	0	6.8s	0	PASSED	MED (g)
cfn-nag	0	0	38ms	0	PASSED	MED (g)
checkov	1	0	5.2s	1	FAILED	MED (g)
detect-secre…	0	0	1.3s	0	PASSED	MED (g)
grype	0	0	52.0s	0	PASSED	MED (g)
npm-audit	0	0	256ms	0	PASSED	MED (g)
opengrep	0	0	<1ms	0	SKIPPED	MED (g)
semgrep	0	0	<1ms	0	MISSING	MED (g)
syft	0	0	2.8s	0	PASSED	MED (g)

Detailed Findings

Show 8 actionable findings

Finding 1: B108

Severity: HIGH
Scanner: bandit
Rule ID: B108
Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/coding-agent/app.py:139-141

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

region = os.environ.get("AWS_REGION", "us-east-1")
    home = os.environ.get("HOME") or "/tmp/agenthome"
    os.makedirs(os.path.join(home, ".claude"), exist_ok=True)

Finding 2: B108

Severity: HIGH
Scanner: bandit
Rule ID: B108
Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/evaluator-agent/app.py:70-72

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

region = os.environ.get("AWS_REGION", "us-east-1")
    home = os.environ.get("HOME") or "/tmp/evalhome"
    os.makedirs(os.path.join(home, ".claude"), exist_ok=True)

Finding 3: B108

Severity: HIGH
Scanner: bandit
Rule ID: B108
Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/orchestrator/handler.py:186-188

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

if runtime == "swift":
        scratch = f"/tmp/spmbuild_{ticket}"
        return (f'/bin/bash -c "cd /mnt/shared/{ticket} && '

Finding 4: B602

Severity: HIGH
Scanner: bandit
Rule ID: B602
Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/sandbox/app.py:285-290

Description:
subprocess call with shell=True identified, security issue.

Code Snippet:

proc = subprocess.Popen(
                argv, shell=popen_shell, cwd=popen_cwd, env=env,
                stdout=subprocess.PIPE, stderr=subprocess.PIPE,
            )
        except FileNotFoundError:
            # `unshare` not present on this image → degrade to unjailed (still cwd-confined).

Finding 5: B108

Severity: HIGH
Scanner: bandit
Rule ID: B108
Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/tests/test_orchestrator.py:76-78

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

scratch = cmd.split("--scratch-path", 1)[1].split()[0]
        assert scratch.startswith("/tmp/")        # microVM-local, NOT the shared mount
        assert not scratch.startswith("/mnt/")

Finding 6: B108

Severity: HIGH
Scanner: bandit
Rule ID: B108
Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/tests/test_poc_components.py:88-90

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

monkeypatch.setenv("SANDBOX_LANG", lang)
        monkeypatch.setenv("WORKSPACE_PATH", "/tmp/ws")
        import sandbox.app as app

Finding 7: B108

Severity: HIGH
Scanner: bandit
Rule ID: B108
Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/tests/test_validation.py:144-146

Description:
Probable insecure usage of temp file/directory.

Code Snippet:

link1 = os.path.join(inner_dir, "link1")
        os.symlink("/tmp", link1)
        with pytest.raises(ValidationError, match="escapes base"):

Finding 8: CKV_DOCKER_3

Severity: HIGH
Scanner: checkov
Rule ID: CKV_DOCKER_3
Location: 01-features/02-host-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/sandbox/Dockerfile.swift:1-51

Description:
Ensure that a user for the container has been created

Code Snippet:

# Sandbox (Swift toolchain) — AgentCore Runtime data plane (ARM64-only). NOT an agent.
# Same command-executor contract as the Python sandbox, but carries the Swift
# toolchain so it can `swift build` / `swift test` repo code. One image per language;
# the business-logic service picks which sandbox runtime to invoke per ticket.
#
# Security: non-root execution (via entrypoint), pinned Python deps, HEALTHCHECK.
FROM --platform=linux/arm64 swift:6.1-jammy

WORKDIR /app

ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=1 \
    SANDBOX_LANG=swift

# The Swift image is Ubuntu-based; add Python 3 to run the same app.py executor.
# git + ca-certificates let SwiftPM resolve package dependencies.
RUN apt-get update && apt-get install -y --no-install-recommends \
        python3 python3-pip python3-venv git ca-certificates \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
# swift:6.1-jammy ships Python 3.10 with an older pip (no PEP-668 / --break-system-packages).
RUN pip3 install --no-cache-dir -r requirements.txt

COPY app.py policy_engine.py entrypoint.sh .
COPY policies/ /app/policies/

# Copy shared security module (from repo root's shared/ directory)
COPY shared_libs/shared/ /app/shared_libs/shared/

ENV MOUNT_PATH=/mnt/shared \
    WORKSPACE_PATH=/mnt/workspace \
    HOME=/mnt/workspace \
    SWIFTPM_CACHE_DIR=/mnt/workspace/.spm-cache

# git refuses to operate on dependency checkouts under .build when they are owned by a
# different uid than the runner ("detected dubious ownership"). This also affects the
# test gate, which runs `swift test` via InvokeAgentRuntimeCommand (a plain shell, not our
# _run_command path), so the setting must live in the image, applied to ALL users.
RUN git config --system --add safe.directory '*'

RUN useradd -m -u 1000 -d /home/sbx sbx \
    && chmod +x /app/entrypoint.sh

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/ping')" || exit 1

EXPOSE 8080
# entrypoint.sh ensures /mnt/workspace is writable by sbx before starting the app
CMD ["/app/entrypoint.sh"]

Report generated by Automated Security Helper (ASH) at 2026-06-23T14:58:43+00:00

- Rename ambiguous variable l to lesson in list comprehensions (shared/memory.py, orchestrator/handler.py) - Split multi-import into separate statements (sandbox/app.py) - Remove unnecessary f-string prefix (cdk/stacks/storage_stack.py) - Add property-based tests verifying lint compliance and behavior preservation

smoell · 2026-06-23T15:19:38Z

fix: resolve ruff lint violations (E741, E401, F541) across four files

Rename ambiguous variable l to lesson in list comprehensions
(shared/memory.py, orchestrator/handler.py)
Split multi-import into separate statements (sandbox/app.py)
Remove unnecessary f-string prefix (cdk/stacks/storage_stack.py)
Add property-based tests verifying lint compliance and behavior preservation

…tations - Add # nosec B108 to intentional /tmp usage in isolated containers/microVMs - Add # nosec B602 to sandboxed subprocess executor (sandbox/app.py) - Add # nosec B108 to test files (assertions and fixtures, not real /tmp usage) - Add #checkov:skip=CKV_DOCKER_3 to Dockerfile.swift (entrypoint.sh handles su) - Each annotation includes justification for audit trail

evandrofranco

ok

smoell added 2 commits June 19, 2026 13:22

docs: note that sandbox examples cover Python and Swift but are exten…

e634ebc

…dable to other frameworks

github-advanced-security AI found potential problems Jun 22, 2026

View reviewed changes

Comment thread ...st-your-agent/01-runtime/04-coding-agents/05-autonomous-coding-agent-durable/demo/index.html Dismissed

smoell added 2 commits June 23, 2026 11:59

Merge branch 'awslabs:main' into main

05cd489

evandrofranco approved these changes Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`feat(04-coding-agents): add sample 05 — autonomous coding agent with durable orchestration`#1725

`feat(04-coding-agents): add sample 05 — autonomous coding agent with durable orchestration`#1725
smoell wants to merge 5 commits into
awslabs:mainfrom
smoell:main

smoell commented Jun 19, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Finding 1: B108

Finding 2: B108

Finding 3: B108

Finding 4: B602

Finding 5: B108

Finding 6: B108

Finding 7: B108

Finding 8: CKV_DOCKER_3

Uh oh!

smoell commented Jun 23, 2026

Uh oh!

evandrofranco left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

smoell commented Jun 19, 2026

Amazon Bedrock AgentCore Samples Pull Request

Concise description of the PR

User experience

Checklist

Acknowledgment

Uh oh!

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Scan Results

Scan Metadata

Summary

Scanner Results

Detailed Findings

Finding 1: B108

Finding 2: B108

Finding 3: B108

Finding 4: B602

Finding 5: B108

Finding 6: B108

Finding 7: B108

Finding 8: CKV_DOCKER_3

Uh oh!

smoell commented Jun 23, 2026

Uh oh!

evandrofranco left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 22, 2026 •

edited

Loading