Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions .github/prompts/adversarial-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Bash AST adversarial review

You are an adversarial reviewer for `bash-ast`, a Rust CLI/library that uses GNU Bash's real parser through FFI to parse shell scripts into JSON AST and convert JSON AST back to bash.

Your job is to add value beyond ordinary CI. Do not simply rerun the full test suite as your main contribution; the workflow has already captured baseline build/test logs for you. Instead, inspect the repository and the supplied context, identify parser behaviors worth challenging, and run a small number of targeted probes.

## What to inspect first

- `README.md`, `Cargo.toml`, `src/`, and relevant tests under `tests/`.
- `review-artifacts/pr-context.json` if present.
- `review-artifacts/base-diff.stat` and `review-artifacts/base-diff.patch` if present.
- `review-artifacts/build.log`, `review-artifacts/baseline-tests.log`, and status files if present.

If this run is associated with a PR, extract 2-4 concrete, testable claims from the PR title/body/diff before running probes. If there is no PR context, pick high-risk parser/round-trip behaviors from the current checkout.

## Probe guidance

Prefer edge cases involving one or more of:

- nested quotes and escaped newlines;
- command substitution and arithmetic expansion;
- heredocs and here-strings;
- process substitution;
- pipelines, negated pipelines, and lists;
- arrays and parameter expansion;
- case/select/for/while/function syntax;
- malformed syntax and graceful error handling;
- parse-to-JSON then `--to-bash` round trips.

For each probe:

1. Create temporary scripts/data only under `/tmp` or `review-artifacts/agent-probes/`.
2. Use the repository's actual binary/library/test harness whenever practical. The built CLI is usually `target/debug/bash-ast` after `cargo build`.
3. Capture concise evidence. If output is long, write full logs to `review-artifacts/agent-probes/` and summarize the relevant lines.
4. Decide whether the observed behavior supports or refutes the hypothesis.

## Constraints

- Do not modify repository source, tests, manifests, lockfiles, generated snapshots, or submodules.
- Do not install arbitrary dependencies.
- Do not run broad/unbounded commands that dump huge files or recursive listings.
- Do not use network access except GitHub context already provided by the workflow.
- Keep shell commands and outputs in the final response compact.
- If setup/build failures prevent runtime probes, perform source-level inspection and report `NEEDS_MORE_INVESTIGATION` with the best concrete blocker evidence.

## Required final response format

Return a concise human-readable review followed by a machine-readable JSON block between exact markers:

`JSON_RESULT_START`

```json
{
"recommendation": "PASS|FAIL|NEEDS_MORE_INVESTIGATION",
"why": "One or two sentences explaining the recommendation and highest risk.",
"tests": [
{
"title": "Short name",
"hypothesis": "What behavior was being tested",
"impact": "Why this matters if wrong",
"command": "Short command summary, not a giant script",
"output": "Concise observed output or pointer to artifact path",
"result": "PASS|FAIL",
"unitTestRecommendation": "What automated coverage should be added or why existing coverage is enough"
}
],
"finalMessage": "Brief operator-facing summary"
}
```

`JSON_RESULT_END`

Rules for the JSON block:

- `recommendation` must be exactly `PASS`, `FAIL`, or `NEEDS_MORE_INVESTIGATION`.
- `tests` must contain at least one substantive probe or one clearly labeled blocker probe.
- Every test object must have non-empty string fields: `title`, `hypothesis`, `impact`, `command`, `output`, `result`, and `unitTestRecommendation`.
- Per-test `result` must be exactly `PASS` or `FAIL`.
150 changes: 150 additions & 0 deletions .github/scripts/render-adversarial-review-summary.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
#!/usr/bin/env python3
"""Render a GitHub Actions summary from agent adversarial review output."""

from __future__ import annotations

import argparse
import json
import re
from pathlib import Path
from typing import Any


def read_text(path: Path | None) -> str:
if path is None or not path.exists():
return ""
return path.read_text(encoding="utf-8", errors="replace")


def extract_json_blob(response: str) -> tuple[dict[str, Any] | None, str | None]:
marker_match = re.search(
r"JSON_RESULT_START\s*(?:```(?:json)?\s*)?(.*?)(?:```\s*)?JSON_RESULT_END",
response,
flags=re.IGNORECASE | re.DOTALL,
)
candidates: list[str] = []
if marker_match:
candidates.append(marker_match.group(1).strip())

candidates.extend(match.group(1).strip() for match in re.finditer(r"```json\s*(.*?)```", response, re.DOTALL | re.IGNORECASE))

for candidate in candidates:
try:
parsed = json.loads(candidate)
except json.JSONDecodeError:
continue
if isinstance(parsed, dict):
return parsed, None

return None, "Could not find a valid JSON review block between JSON_RESULT_START and JSON_RESULT_END."


def normalize_tests(value: Any) -> list[dict[str, Any]]:
if not isinstance(value, list):
return []
return [item for item in value if isinstance(item, dict)]


def append_log_tail(lines: list[str], title: str, text: str, max_lines: int = 60) -> None:
if not text.strip():
return
tail = "\n".join(text.rstrip().splitlines()[-max_lines:])
lines.extend([
f"### {title}",
"",
"```text",
tail,
"```",
"",
])


def render_summary(args: argparse.Namespace) -> str:
response = read_text(args.response)
build_log = read_text(args.build_log)
baseline_log = read_text(args.baseline_log)
build_status = read_text(args.build_status).strip() if args.build_status and args.build_status.exists() else "unknown"
baseline_status = read_text(args.baseline_status).strip() if args.baseline_status and args.baseline_status.exists() else "unknown"
review, warning = extract_json_blob(response)

lines: list[str] = [
"## Adversarial review",
"",
f"- **Build exit code:** `{build_status or 'unknown'}`",
f"- **Baseline test exit code:** `{baseline_status or 'unknown'}`",
]

if review is None:
lines.extend([
"- **Recommendation:** `UNKNOWN`",
"",
f"> ⚠️ {warning}",
"",
])
else:
recommendation = str(review.get("recommendation", "UNKNOWN"))
why = str(review.get("why", "No rationale supplied."))
final_message = str(review.get("finalMessage", ""))
tests = normalize_tests(review.get("tests"))

lines.extend([
f"- **Recommendation:** `{recommendation}`",
f"- **Why:** {why}",
])
if final_message:
lines.append(f"- **Final message:** {final_message}")
lines.extend(["", "### Structured probes", ""])

if not tests:
lines.extend(["No structured probes were parsed from the agent response.", ""])
else:
for index, test in enumerate(tests, start=1):
title = str(test.get("title", f"Probe {index}"))
result = str(test.get("result", "UNKNOWN"))
lines.extend([
f"#### {index}. {title} — `{result}`",
"",
f"- **Hypothesis:** {test.get('hypothesis', '')}",
f"- **Impact:** {test.get('impact', '')}",
f"- **Command:** `{test.get('command', '')}`",
f"- **Output:** {test.get('output', '')}",
f"- **Coverage recommendation:** {test.get('unitTestRecommendation', '')}",
"",
])

lines.extend([
"### Full agent response",
"",
"<details>",
"<summary>Expand raw response</summary>",
"",
"````text",
response[-12000:] if response else "(no agent response captured)",
"````",
"",
"</details>",
"",
])

append_log_tail(lines, "Build log tail", build_log)
append_log_tail(lines, "Baseline test log tail", baseline_log)

return "\n".join(lines)


def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--response", type=Path, required=True)
parser.add_argument("--build-log", type=Path)
parser.add_argument("--baseline-log", type=Path)
parser.add_argument("--build-status", type=Path)
parser.add_argument("--baseline-status", type=Path)
parser.add_argument("--output", type=Path, required=True)
args = parser.parse_args()

args.output.parent.mkdir(parents=True, exist_ok=True)
args.output.write_text(render_summary(args), encoding="utf-8")


if __name__ == "__main__":
main()
Loading
Loading