allenai · charliemcgrady · May 20, 2026
diff --git a/plugins/asta-preview/skills/research-step/SKILL.md b/plugins/asta-preview/skills/research-step/SKILL.md
@@ -1,41 +1,42 @@
 ---
 name: research-step
-description: Plan and execute autonomous research as a graph of typed tasks tracked in beads. Use when working from a mission.md to drive multi-step research with explicit dependencies and structured outputs.
-allowed-tools: Bash(bd:*) Bash(date:*) Bash(scripts/*) Read(assets/**) Read(workflows/**) Read(scripts/**) Skill(asta:*) Skill(asta-preview:*) Skill(asta-plugins:*)
+description: Plan and execute autonomous research as a graph of typed tasks tracked in beads, driven by a YAML template (`hypothesis_driven_research` or `grounded_theory_generation`). Use when working from a `mission.md` to drive multi-step research with explicit dependencies and structured outputs.
+allowed-tools: Bash(bd:*) Bash(date:*) Bash(scripts/*) Bash(asta autodiscovery *) Bash(asta literature *) Bash(asta generate-theories *) Bash(jsonschema:*) Read(assets/**) Read(workflows/**) Read(scripts/**) Read(templates/**) Skill(asta:*) Skill(asta-preview:*) Skill(asta-plugins:*)
 ---
 
 # Research Step
 
-Models a research session as a beads epic. Each unit of work is a typed sub-issue whose `metadata.research_step.output` matches a JSON schema in `assets/schemas.yaml`.
+Models a research session as a beads epic. Each unit of work is a typed sub-issue whose structured output (`.asta/tasks/<bd-id>/output.json`) matches a JSON schema in `assets/schemas.yaml`.
 
 This skill is a **router**. Inspect the working directory and the user's request, pick one workflow, then read its `.md` file in `workflows/` and follow it. Do not execute a workflow from memory — always open the file first.
 
 ## Setup
 
-There are no hard preconditions. If `mission.md` does not exist, the **brainstorm** workflow will help the user draft one.
+There are no hard preconditions. If `mission.md` does not exist, the **brainstorm** workflow will help the user draft one and pick a template.
 
 Installing `bd` and `jq`, running `bd init`, and verifying `scripts/summary-check.sh` works are the responsibility of the **init** workflow. Once `init` has run, subsequent workflows assume the environment is ready.
 
 ## Files
 
 | Path | Role |
 |---|---|
-| `mission.md` | Input. The research task. |
+| `mission.md` | Input. The research task. May carry a `template:` hint in frontmatter (chosen by brainstorm). |
 | `.beads/` | Source of truth for state. |
 | `summary.md` | Derived view of the session, regenerated by **update-summary**. Beads is the source of truth; this file is just a digest for humans and for **brainstorm**. Frontmatter `beads_snapshot` records the state it was rendered from. |
-| `background_knowledge.txt` | Optional. Long-form context referenced from issue metadata via `summary_path`. |
+| `templates/` | YAML plan templates (`hypothesis_driven_research.yaml`, `grounded_theory_generation.yaml`, plus optional `strategies/`). The epic's `metadata.research_step.template` field names which one drives the session. |
+| `.asta/tasks/<bd-id>/` | Per-task working directory. Holds `input.md`, `input.json`, `output.md`, `output.json`, and any task-type-specific sidecar files (e.g., `extraction_schema.json`, `theories.json`). |
 
 ## Workflows
 
 | Name | Purpose | Detailed instructions |
 |---|---|---|
-| **brainstorm** | Default. Conversational exploration of current state; drafts/refines `mission.md`; hands off to other workflows when the user is ready to act. | `workflows/brainstorm.md` |
-| **init** | Set up the environment: install `bd`/`jq`, run `bd init`, verify `scripts/summary-check.sh`. Hands off to **plan**. | `workflows/init.md` |
-| **plan** | Create or extend the graph. Bootstraps the epic + initial frontier from `mission.md`, or replans downstream tasks after a closed task. | `workflows/plan.md` |
-| **execute** | Run one ready task end-to-end. Hands off to **plan** when the closed task type unlocks new structure; otherwise to **update-summary**. | `workflows/execute.md` |
+| **brainstorm** | Default. Conversational exploration of current state; drafts/refines `mission.md`; selects a template; hands off to other workflows when the user is ready to act. | `workflows/brainstorm.md` |
+| **init** | Set up the environment: install `bd`/`jq`, run `bd init`, create `.asta/` skeleton, verify `scripts/summary-check.sh`. Hands off to **plan**. | `workflows/init.md` |
+| **plan** | Create or extend the graph. Reads the epic's chosen template and walks its YAML to bootstrap or replan downstream tasks. The agent is the walker — no separate walker script. | `workflows/plan.md` |
+| **execute** | Run one ready task end-to-end. Renders `input.md` / `input.json` from the issue's metadata + upstream task outputs, invokes the agent, validates the result, and closes the issue. Hands off to **plan** when the closed task type unlocks new structure; otherwise to **update-summary**. | `workflows/execute.md` |
 | **update-summary** | Regenerate `summary.md` from beads. Idempotent — no-op when `scripts/summary-check.sh` reports `status: fresh`. | `workflows/update-summary.md` |
 
-Task-type schemas live in `assets/schemas.yaml`.
+Output schemas live in `assets/schemas.yaml`. Output schemas for theorizer-derived task types are referenced via `bash_ref:` (the validator runs the upstream CLI at validate time to fetch the canonical shape; see §6.3 of `spec.md`).
 
 ## Routing
 
@@ -45,14 +46,31 @@ If the user named a workflow ("init the research", "refresh the summary", "run t
 
 ### 2. Otherwise → brainstorm
 
-If the user did not name a workflow, run **brainstorm**. It inspects the working directory, answers the user's question, drafts or refines `mission.md` when appropriate, and hands off to `init` / `plan` / `execute` / `update-summary` once the user is ready to act.
+If the user did not name a workflow, run **brainstorm**. It inspects the working directory, answers the user's question, drafts or refines `mission.md` when appropriate, **selects a plan template** when one isn't yet chosen, and hands off to `init` / `plan` / `execute` / `update-summary` once the user is ready to act.
 
 ### 3. Chaining
 
 - **init** → always run **plan** afterwards (which then chains to **update-summary**).
 - **plan** → always run **update-summary** afterwards so the digest reflects the new graph.
-- **execute** → if the closed task type is `literature_review`, `hypothesis`, `analysis`, or `synthesis`, chain to **plan** (which chains to **update-summary**); otherwise chain directly to **update-summary**.
+- **execute** → if the closed task type is `literature_review`, `hypothesis`, `analysis`, `synthesis`, `auto_discovery`, `extraction_schema_design`, `theorizer_extraction`, `theory_generation`, `grounded_theory_generation`, or `novelty_assessment`, chain to **plan** (which chains to **update-summary**); for `scope`, `definitions`, `experiment_design`, or `evidence_gathering`, chain directly to **update-summary**.
 - **update-summary** and **brainstorm** → never chain.
+- **brainstorm** performs template selection via `AskUserQuestion` once a `mission.md` exists but no epic does; the chosen template name is stored on the epic at bootstrap time and read by `plan.md` on every invocation.
+
+## Conventions for `input.md` / `output.md` files
+
+Both files live in each task's working dir at `.asta/tasks/<bd-id>/`. They are the human-readable surface of the task; the structured surface lives in `input.json` / `output.json`. Three conventions apply to whoever writes them:
+
+- **`input.md` is a short brief** (~2-4 sentences). Plan writes it at task-create time, once the upstream `output.json` files are on disk and the task's own `input_instructions` have been interpolated. It says what this task is about and where its inputs come from. Not the full prompt — the full prompt lives in `metadata.research_step.input_instructions` and is what the executing agent works against.
+- **`output.md` is the narrative artifact** for the task. Written by the executing agent during the work step. Length depends on the task.
+- **Data-file references in markdown links are file-relative.** Any markdown link to a local data file (CSV, JSON, log, figure, notebook) in `input.md` or `output.md` must be relative to the file containing the link — the same convention standard markdown viewers and the asta-flows web UI use. From a task's `output.md` at `.asta/tasks/<bd-id>/output.md`, link the AutoDS metadata as `[label](../../autods-run/metadata.json)`, not `[label](.asta/autods-run/metadata.json)` (the second form resolves to `.asta/tasks/<bd-id>/.asta/autods-run/...` and breaks). Absolute paths under `/Users/`, `/home/`, or `/private/`, and `file://` URLs, are rejected by `scripts/validate-output.sh` (the asta-flows `/api/artifact` endpoint refuses anything outside the run dir). If an upstream tool emits a file outside the run dir, the importing task is responsible for copying or symlinking it under `.asta/` before referencing it.
+- **Path field values inside `.json` files are run-root-relative.** Fields like `metadata_path`, `nodes_path`, `log_path`, `schema_path`, `extraction_results_path`, etc. inside `output.json` (and any sidecar) take paths relative to the run root (e.g. `.asta/autods-run/metadata.json`). These are machine-read; the consumer joins them against the run root.
+- **Citation strategy (applies to both `input.md` and `output.md`).** Back up every non-obvious statement with a hyperlink to the file that grounds it. Two kinds:
+  - **Task-internal citations** — references to another task in this epic. Form: `` [`bd-id`](.asta/tasks/<bd-id>/output.md) ``. Prefer the task's `output.md`. Use a sidecar (`theories.json`, `novelty_results.json`, `extraction_schema.json`, `extraction_results.json`) only when the claim lives only in structured form.
+  - **Literature citations** — references to a published paper. Form: `[Author Year](url)` for first authors (`[Bahr 1997](https://…)`), `[Author1, Author2 & Author3 Year](url)` for two-or-three-author cites (`[Bahr, Pfeffer & Kaser 2015](https://…)`), and `[Author1, Author2 et al. Year](url)` for four-or-more (`[Christian, Whorton et al. 2022](https://…)`). The `url` comes from the citing task's `output.json` `citations[].url`; the literature_review task that surfaced the paper is the canonical source for that URL. If a local copy of the paper exists (e.g. under `.asta/literature/`), link to that path instead; otherwise use the DOI, arXiv URL, or Semantic Scholar URL recorded in the citation.
+
+  What counts as "non-obvious": any quantitative claim, comparison to a published result, methodological choice, or domain assertion that isn't shared common knowledge in the field. Framing sentences ("This section is about X") don't need citations; specific claims about how things work or what the data shows do.
+
+The citation convention itself is enforced by example, not by `validate-output.sh`; the run-relative-path rule above **is** enforced — see `scripts/validate-output.sh`.
 
 ## Boundaries