allenai · charliemcgrady · Jun 2, 2026 · Jun 2, 2026 · Jun 12, 2026 · Jun 12, 2026
diff --git a/.gitignore b/.gitignore
@@ -18,4 +18,7 @@ skills-lock.json
 .idea/
 *.iml
 
+# macOS
+.DS_Store
+
 .asta
diff --git a/plugins/asta-preview/skills/research-step/SKILL.md b/plugins/asta-preview/skills/research-step/SKILL.md
@@ -1,12 +1,12 @@
 ---
 name: research-step
 description: Plan and execute autonomous research as a graph of typed tasks tracked in beads. Use when working from a mission.md to drive multi-step research with explicit dependencies and structured outputs.
-allowed-tools: Bash(bd:*) Bash(date:*) Bash(scripts/*) Read(assets/**) Read(workflows/**) Read(scripts/**) Skill(asta:*) Skill(asta-preview:*) Skill(asta-plugins:*)
+allowed-tools: Bash(bd:*) Bash(date:*) Bash(scripts/*) Bash(asta:*) Read(assets/**) Read(workflows/**) Read(scripts/**) Skill(asta:*) Skill(asta-preview:*) Skill(asta-plugins:*)
 ---
 
 # Research Step
 
-Models a research session as a beads epic. Each unit of work is a typed sub-issue whose `metadata.research_step.output` matches a JSON schema in `assets/schemas.yaml`.
+Models a research session as a beads epic. A session runs a **flow** — the composed `data_and_literature_grounded_theory_generation` (which begins with `data_provenance`), its sub-flows `reproduction` and `theorizer`, the standalone `hypothesis_driven_research` flow (literature → falsifiable hypotheses → one prespecified test per hypothesis), the standalone `auto_discovery` flow (source a cohort and run a fresh discovery; run it as its own session in a **separate workspace** — own `mission.md` and `.beads` — typically kicked off after a theory-generation run; a second epic root in the same workspace breaks `scripts/epic-root.sh`), or a custom chain (each flow's purpose is in its `mission` field in `assets/schemas.yaml`). `assets/schemas.yaml` defines the reusable `types` (immutable records — verdicts are `adjudication` records referencing their subject), the `tasks` (pure output contracts mapping each output key to its type), and the `flows` (each step carrying its `mission`, its `input` steps, and its asta `chain`). Each unit of work is a typed sub-issue whose `metadata.research_step.output_json` matches its task's output in the schema; the issue envelope carries `flow` and `task_type`.
 
 This skill is a **router**. Inspect the working directory and the user's request, pick one workflow, then read its `.md` file in `workflows/` and follow it. Do not execute a workflow from memory — always open the file first.
 
@@ -23,7 +23,7 @@ Installing `bd` and `jq`, running `bd init`, and verifying `scripts/summary-chec
 | `mission.md` | Input. The research task. |
 | `.beads/` | Source of truth for state. |
 | `summary.md` | Derived view of the session, regenerated by **update-summary**. Beads is the source of truth; this file is just a digest for humans and for **brainstorm**. Frontmatter `beads_snapshot` records the state it was rendered from. |
-| `background_knowledge.txt` | Optional. Long-form context referenced from issue metadata via `summary_path`. |
+| `.asta/<agent>/<slug>/` | Heavy artifacts (raw agent JSON, datasets, reports), referenced from `output_json` by repo-root-relative `_path` fields. |
 
 ## Workflows
 
@@ -51,7 +51,7 @@ If the user did not name a workflow, run **brainstorm**. It inspects the working
 
 - **init** → always run **plan** afterwards (which then chains to **update-summary**).
 - **plan** → always run **update-summary** afterwards so the digest reflects the new graph.
-- **execute** → if the closed task type is `literature_review`, `hypothesis`, `analysis`, or `synthesis`, chain to **plan** (which chains to **update-summary**); otherwise chain directly to **update-summary**.
+- **execute** → chain to **plan** when the closed task type unlocks new structure for its flow (see the hand-off rule in `execute.md`, last step); otherwise chain directly to **update-summary**.
 - **update-summary** and **brainstorm** → never chain.
 
 ## Boundaries

diff --git a/plugins/asta-preview/skills/research-step/assets/schemas.yaml b/plugins/asta-preview/skills/research-step/assets/schemas.yaml
diff --git a/plugins/asta-preview/skills/research-step/scripts/close-task.sh b/plugins/asta-preview/skills/research-step/scripts/close-task.sh
@@ -0,0 +1,53 @@
+#!/usr/bin/env bash
+# close-task.sh <issue-id> <output-json> <output-markdown>
+# Publish a task's output and finish it: write output_json + output_markdown into the issue
+# metadata, validate output_json against the schema, close the issue, assert it closed, then
+# close any ancestor group whose last child just closed.
+set -euo pipefail
+here="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+
+[[ $# -eq 3 ]] || { echo "usage: close-task.sh <issue-id> <output-json> <output-markdown>" >&2; exit 1; }
+id="$1"; oj="$2"; om="$3"
+[[ -f "$oj" ]] || { echo "close-task: no output-json $oj" >&2; exit 1; }
+[[ -f "$om" ]] || { echo "close-task: no output-markdown $om" >&2; exit 1; }
+jq -e . "$oj" >/dev/null 2>&1 || { echo "close-task: $oj is not valid JSON" >&2; exit 1; }
+
+# 1. publish: merge output_json + output_markdown into the existing research_step metadata
+cur="$(bd show "$id" --json | jq -c '.[0].metadata')"
+merged="$(jq -c --slurpfile oj "$oj" --rawfile om "$om" \
+  '.research_step.output_json = $oj[0] | .research_step.output_markdown = $om' <<<"$cur")"
+tmp="$(mktemp)"; trap 'rm -f "$tmp"' EXIT
+printf '%s' "$merged" > "$tmp"
+bd update "$id" --metadata @"$tmp" >/dev/null
+
+# 2. validate structurally (reads the issue back; no style lint)
+bash "$here/validate-output.sh" "$id"
+
+# 3. close and 4. assert closure
+bd close "$id" >/dev/null
+[[ "$(bd show "$id" --json | jq -r '.[0].status')" == "closed" ]] \
+  || { echo "close-task: $id did not close" >&2; exit 2; }
+echo "closed $id"
+
+# 5. cascade: close each ancestor group whose direct children are all closed.
+# The epic root is never closed here — "root open, no open tasks" is the
+# session-complete state that epic-root.sh and the workflows rely on.
+cur_id="$id"
+while [[ "$cur_id" == *.* ]]; do
+  parent="${cur_id%.*}"
+  parent_json="$(bd show "$parent" --json 2>/dev/null)" || break
+  [[ "$(jq -r '.[0].metadata.research_step.epic_root // false' <<<"$parent_json")" == "true" ]] && break
+  open_kids="$(bd list --json --limit 0 | jq --arg p "$parent" '
+    [ .[]
+      | select(.id | startswith($p + "."))
+      | select((.id[($p|length)+1:] | contains(".")) | not)
+      | select(.status != "closed") ] | length')"
+  [[ "$open_kids" -eq 0 ]] || break
+  if bd close "$parent" >/dev/null 2>&1; then
+    echo "closed group $parent"
+  else
+    echo "close-task: warning: could not close group $parent (task $id is closed; close the group manually)" >&2
+    break
+  fi
+  cur_id="$parent"
+done
diff --git a/plugins/asta-preview/skills/research-step/scripts/create-task.sh b/plugins/asta-preview/skills/research-step/scripts/create-task.sh
@@ -0,0 +1,26 @@
+#!/usr/bin/env bash
+# create-task.sh <parent-id> <task_type> <flow> <title> <brief-description> [input-id ...]
+# Create a leaf task issue under <parent-id>: hierarchical id, a brief one-line description,
+# and initialized research_step metadata. output_json / output_markdown stay null until
+# execute publishes them via close-task.sh. Prints the new issue id.
+set -euo pipefail
+here="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+
+[[ $# -ge 5 ]] || { echo "usage: create-task.sh <parent-id> <task_type> <flow> <title> <brief-desc> [input-id ...]" >&2; exit 1; }
+parent="$1"; task_type="$2"; flow="$3"; title="$4"; desc="$5"; shift 5
+
+# Validate the task_type against schemas.yaml. The helper exits 3 for an
+# unknown task_type (and prints the known ones) or 5 when the schema cannot
+# be read (e.g. PyYAML missing — run init); set -e propagates either.
+"$here/task-output-keys.sh" "$task_type" >/dev/null
+
+[[ -n "$desc" ]]            || { echo "create-task: a brief description is required" >&2; exit 4; }
+[[ "$desc" != *$'\n'* ]]    || { echo "create-task: description must be one line" >&2; exit 4; }
+[[ "${#desc}" -le 200 ]]    || { echo "create-task: description too long (${#desc} chars > 200) — keep it brief" >&2; exit 4; }
+
+if [[ $# -eq 0 ]]; then inputs_json="[]"; else inputs_json="$(printf '%s\n' "$@" | jq -R . | jq -cs .)"; fi
+meta="$(jq -nc --arg f "$flow" --arg tt "$task_type" --argjson inp "$inputs_json" \
+  '{research_step: {flow: $f, task_type: $tt, inputs: $inp, output_schema_version: 2, output_json: null, output_markdown: null}}')"
+tmp="$(mktemp)"; trap 'rm -f "$tmp"' EXIT
+printf '%s' "$meta" > "$tmp"
+bd create "$title" --parent "$parent" -d "$desc" --metadata @"$tmp" --silent
diff --git a/plugins/asta-preview/skills/research-step/scripts/epic-root.sh b/plugins/asta-preview/skills/research-step/scripts/epic-root.sh
@@ -33,7 +33,7 @@ if ! command -v jq >/dev/null 2>&1; then
   exit 3
 fi
 
-ids=$(bd list --json | jq -r '.[] | select(.metadata.research_step.epic_root == true) | .id')
+ids=$(bd list --json --limit 0 | jq -r '.[] | select(.metadata.research_step.epic_root == true) | .id')
 count=$(printf '%s' "$ids" | grep -c . || true)
 
 case "$count" in

diff --git a/plugins/asta-preview/skills/research-step/scripts/next-task.sh b/plugins/asta-preview/skills/research-step/scripts/next-task.sh
@@ -0,0 +1,34 @@
+#!/usr/bin/env bash
+# next-task.sh — the single definition of task ordering. Prints the open task
+# issues (status == open, metadata.research_step.task_type set), sorted
+# *numerically* by hierarchical id (wf.1.2 before wf.1.10 — a plain lexical
+# sort would get this wrong past 9 siblings). Groups (no task_type) are never
+# listed; there are no dependency edges, so this order is the ordering signal.
+#
+# Used by execute (pick the next task) and update-summary (render the queue),
+# so the two never disagree about what runs next.
+#
+# Output (stdout, key: value lines):
+#   next:  <bd-id> | none
+#   queue: <space-separated bd-ids>   (omitted when empty)
+# Exit: 0 (even when next: none) · 3 bd/jq missing
+set -euo pipefail
+
+command -v bd >/dev/null 2>&1 || { echo "next-task: 'bd' not found on PATH" >&2; exit 3; }
+command -v jq >/dev/null 2>&1 || { echo "next-task: 'jq' not found on PATH" >&2; exit 3; }
+
+ids="$(bd list --json --limit 0 | jq -r '
+  [ .[]
+    | select(.status == "open")
+    | select(.metadata.research_step.task_type != null) ]
+  | sort_by(.id | split(".") | map(tonumber? // .))
+  | .[].id')"
+
+if [[ -z "$ids" ]]; then
+  echo "next: none"
+  exit 0
+fi
+
+echo "next: $(head -n1 <<<"$ids")"
+rest="$(tail -n +2 <<<"$ids" | tr '\n' ' ' | sed 's/ $//')"
+[[ -n "$rest" ]] && echo "queue: $rest" || true
diff --git a/plugins/asta-preview/skills/research-step/scripts/summary-check.sh b/plugins/asta-preview/skills/research-step/scripts/summary-check.sh
@@ -30,7 +30,7 @@ if ! command -v jq >/dev/null 2>&1; then
   exit 3
 fi
 
-current=$(bd list --json \
+current=$(bd list --json --limit 0 \
   | jq -r '.[] | select(.status != "closed") | .id' \
   | sort \
   | shasum -a 256 \

diff --git a/plugins/asta-preview/skills/research-step/scripts/task-output-keys.sh b/plugins/asta-preview/skills/research-step/scripts/task-output-keys.sh
@@ -0,0 +1,37 @@
+#!/usr/bin/env bash
+# task-output-keys.sh <task_type> — print the space-separated output keys for a
+# task from assets/schemas.yaml. The single schema reader for scripts:
+# create-task.sh uses it to validate a task_type, validate-output.sh to get the
+# expected output_json keys.
+# Exit: 0 ok · 1 usage · 3 unknown task_type · 5 cannot read schema
+#       (python3/PyYAML missing or schemas.yaml unreadable — run init)
+set -euo pipefail
+here="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+schemas="$here/../assets/schemas.yaml"
+
+[[ $# -eq 1 ]] || { echo "usage: task-output-keys.sh <task_type>" >&2; exit 1; }
+
+python3 - "$schemas" "$1" <<'PY'
+import sys
+
+try:
+    import yaml
+except ImportError:
+    print("task-output-keys: python3 cannot import yaml (PyYAML) - run the init workflow", file=sys.stderr)
+    sys.exit(5)
+
+try:
+    with open(sys.argv[1]) as f:
+        d = yaml.safe_load(f)
+except Exception as e:
+    print(f"task-output-keys: cannot read {sys.argv[1]}: {e}", file=sys.stderr)
+    sys.exit(5)
+
+tasks = d.get("tasks") or {}
+t = tasks.get(sys.argv[2])
+if t is None:
+    print(f"task-output-keys: unknown task_type '{sys.argv[2]}'", file=sys.stderr)
+    print(f"task-output-keys: known: {' '.join(sorted(tasks))}", file=sys.stderr)
+    sys.exit(3)
+print(" ".join(t["output"]))
+PY
diff --git a/plugins/asta-preview/skills/research-step/scripts/validate-output.sh b/plugins/asta-preview/skills/research-step/scripts/validate-output.sh
@@ -1,102 +1,65 @@
 #!/usr/bin/env bash
-# validate-output.sh — structural validation of a research_step output JSON.
-#
-# Usage: validate-output.sh <task_type> <metadata-json-file>
-#
-# Verifies that the JSON file:
-#   1. parses
-#   2. carries the canonical metadata envelope
-#      ({research_step: {task_type, inputs, output_schema_version, output}})
-#   3. has every required `output.<key>` for the given <task_type> per
-#      assets/schemas.yaml (schema_version: 1)
-#
-# Exit codes:
-#   0  — valid
-#   2  — JSON parse error
-#   3  — unknown task_type
-#   4  — missing required field
-#   5  — task_type mismatch with envelope
-#
-# This is structural validation only. Quality validation (sound prediction,
-# sane confidence, valid citations) is out of scope per execute.md.
+# validate-output.sh <issue-id> — structural check of a task's stored output_json.
+# Reads the issue from beads and deep-validates metadata.research_step.output_json
+# against the compiled JSON Schema (assets/compiled/<task_type>.schema.json,
+# regenerated from schemas.yaml by scripts/compile-schemas.py at build time):
+# top-level keys closed, declared nested fields required, extra nested fields
+# permitted (payloads nest verbatim). No style or quality linting.
+# Exit: 0 ok · 1 usage · 2 bad issue/metadata · 3 unknown task
+#       · 4 schema violation
+#       · 5 schema unreadable (PyYAML/jsonschema missing or compiled schema
+#         absent — run the init workflow, or update the plugin)
 set -euo pipefail
+here="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 
-if [[ $# -ne 2 ]]; then
-  echo "usage: validate-output.sh <task_type> <metadata-json-file>" >&2
-  exit 1
-fi
+[[ $# -eq 1 ]] || { echo "usage: validate-output.sh <issue-id>" >&2; exit 1; }
+id="$1"
 
-task_type="$1"
-file="$2"
+rs="$(bd show "$id" --json 2>/dev/null | jq -c '.[0].metadata.research_step // empty')"
+[[ -n "$rs" ]] || { echo "validate-output: $id has no metadata.research_step" >&2; exit 2; }
+task_type="$(jq -r '.task_type // empty' <<<"$rs")"
+[[ -n "$task_type" ]] || { echo "validate-output: $id has no task_type" >&2; exit 2; }
 
-if ! jq -e . "$file" > /dev/null 2>&1; then
-  echo "validate-output: $file is not valid JSON" >&2
-  exit 2
-fi
+# Exits 3 (unknown task_type) or 5 (schema unreadable) with its own message.
+"$here/task-output-keys.sh" "$task_type" >/dev/null
 
-# Required output fields, mirroring assets/schemas.yaml (schema_version: 1).
-case "$task_type" in
-  scope)              required="question boundaries success_criteria" ;;
-  definitions)        required="terms" ;;
-  literature_review)  required="summary_path key_findings gaps citations" ;;
-  hypothesis)         required="statement rationale falsifiable_prediction expected_evidence" ;;
-  experiment_design)  required="method procedure variables artifacts_expected" ;;
-  evidence_gathering) required="artifacts log_path deviations" ;;
-  analysis)           required="verdict confidence reasoning caveats" ;;
-  synthesis)          required="answer supporting_hypotheses refuted_hypotheses open_questions report_path" ;;
-  *)
-    echo "validate-output: unknown task_type '$task_type'" >&2
-    echo "validate-output: expected one of scope|definitions|literature_review|hypothesis|experiment_design|evidence_gathering|analysis|synthesis" >&2
-    exit 3
-    ;;
-esac
+got="$(jq -c '.output_json // empty' <<<"$rs")"
+[[ -n "$got" && "$got" != "null" ]] || { echo "validate-output: $id has no output_json" >&2; exit 4; }
 
-# Envelope must carry the matching task_type so we don't validate scope JSON
-# against an analysis schema by accident.
-envelope_type=$(jq -r '.research_step.task_type // empty' "$file")
-if [[ -z "$envelope_type" ]]; then
-  echo "validate-output: $file missing .research_step.task_type" >&2
+schema="$here/../assets/compiled/${task_type}.schema.json"
+[[ -r "$schema" ]] || {
+  echo "validate-output: compiled schema missing for '$task_type' ($schema) — update the plugin (it is regenerated at build time)" >&2
   exit 5
-fi
-if [[ "$envelope_type" != "$task_type" ]]; then
-  echo "validate-output: envelope task_type='$envelope_type' but expected '$task_type'" >&2
-  exit 5
-fi
+}
+OUTPUT_JSON="$got" python3 - "$schema" "$task_type" <<'PY'
+import json
+import os
+import sys
 
-# Envelope shape sanity.
-for key in inputs output_schema_version output; do
-  if ! jq -e ".research_step | has(\"$key\")" "$file" >/dev/null; then
-    echo "validate-output: $file missing .research_step.$key" >&2
-    exit 5
-  fi
-done
+try:
+    import jsonschema
+except ImportError:
+    print("validate-output: python3 cannot import jsonschema - run the init workflow", file=sys.stderr)
+    sys.exit(5)
 
-# Required output fields.
-for key in $required; do
-  if ! jq -e ".research_step.output | has(\"$key\")" "$file" >/dev/null; then
-    echo "validate-output: missing required field 'output.$key' for task_type '$task_type'" >&2
-    exit 4
-  fi
-done
+with open(sys.argv[1]) as f:
+    schema = json.load(f)
+data = json.loads(os.environ["OUTPUT_JSON"])
 
-# Type spot-checks for the high-leverage cases. Not exhaustive — just the
-# fields where a wrong type at this layer would silently break update-summary rendering
-# or downstream tasks.
-case "$task_type" in
-  literature_review)
-    jq -e '.research_step.output.key_findings | type == "array"' "$file" >/dev/null \
-      || { echo "validate-output: output.key_findings must be an array" >&2; exit 4; }
-    jq -e '.research_step.output.gaps | type == "array"' "$file" >/dev/null \
-      || { echo "validate-output: output.gaps must be an array" >&2; exit 4; }
-    jq -e '.research_step.output.citations | type == "array"' "$file" >/dev/null \
-      || { echo "validate-output: output.citations must be an array" >&2; exit 4; }
-    ;;
-  analysis)
-    jq -e '.research_step.output.verdict | IN("supported", "refuted", "inconclusive")' "$file" >/dev/null \
-      || { echo "validate-output: output.verdict must be one of supported|refuted|inconclusive" >&2; exit 4; }
-    jq -e '.research_step.output.confidence | type == "number" and . >= 0 and . <= 1' "$file" >/dev/null \
-      || { echo "validate-output: output.confidence must be a number in [0, 1]" >&2; exit 4; }
-    ;;
-esac
+validator = jsonschema.Draft202012Validator(schema)
+errors = sorted(validator.iter_errors(data), key=lambda e: list(map(str, e.absolute_path)))
+if errors:
+    for e in errors[:5]:
+        path = ".".join(str(p) for p in e.absolute_path)
+        where = f"output_json.{path}" if path else "output_json"
+        hint = ""
+        if e.validator == "additionalProperties" and not path:
+            hint = " - byproducts go in artifacts"
+        print(f"validate-output: {where}: {e.message}{hint}", file=sys.stderr)
+    if len(errors) > 5:
+        print(f"validate-output: ... and {len(errors) - 5} more schema violation(s)", file=sys.stderr)
+    print(f"validate-output: output_json does not satisfy the '{sys.argv[2]}' schema", file=sys.stderr)
+    sys.exit(4)
+PY
 
 echo "ok"
-Original file line number
+Diff line change
@@ Expand Up / @@ -18,4 +18,7 @@ skills-lock.json @@
     .idea/
     *.iml
+    # macOS
+    .DS_Store
     .asta