Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 31 additions & 13 deletions plugins/asta-preview/skills/research-step/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,42 @@
---
name: research-step
description: Plan and execute autonomous research as a graph of typed tasks tracked in beads. Use when working from a mission.md to drive multi-step research with explicit dependencies and structured outputs.
allowed-tools: Bash(bd:*) Bash(date:*) Bash(scripts/*) Read(assets/**) Read(workflows/**) Read(scripts/**) Skill(asta:*) Skill(asta-preview:*) Skill(asta-plugins:*)
description: Plan and execute autonomous research as a graph of typed tasks tracked in beads, driven by a YAML template (`hypothesis_driven_research` or `grounded_theory_generation`). Use when working from a `mission.md` to drive multi-step research with explicit dependencies and structured outputs.
allowed-tools: Bash(bd:*) Bash(date:*) Bash(scripts/*) Bash(asta autodiscovery *) Bash(asta literature *) Bash(asta generate-theories *) Bash(jsonschema:*) Read(assets/**) Read(workflows/**) Read(scripts/**) Read(templates/**) Skill(asta:*) Skill(asta-preview:*) Skill(asta-plugins:*)
---

# Research Step

Models a research session as a beads epic. Each unit of work is a typed sub-issue whose `metadata.research_step.output` matches a JSON schema in `assets/schemas.yaml`.
Models a research session as a beads epic. Each unit of work is a typed sub-issue whose structured output (`.asta/tasks/<bd-id>/output.json`) matches a JSON schema in `assets/schemas.yaml`.

This skill is a **router**. Inspect the working directory and the user's request, pick one workflow, then read its `.md` file in `workflows/` and follow it. Do not execute a workflow from memory — always open the file first.

## Setup

There are no hard preconditions. If `mission.md` does not exist, the **brainstorm** workflow will help the user draft one.
There are no hard preconditions. If `mission.md` does not exist, the **brainstorm** workflow will help the user draft one and pick a template.

Installing `bd` and `jq`, running `bd init`, and verifying `scripts/summary-check.sh` works are the responsibility of the **init** workflow. Once `init` has run, subsequent workflows assume the environment is ready.

## Files

| Path | Role |
|---|---|
| `mission.md` | Input. The research task. |
| `mission.md` | Input. The research task. May carry a `template:` hint in frontmatter (chosen by brainstorm). |
| `.beads/` | Source of truth for state. |
| `summary.md` | Derived view of the session, regenerated by **update-summary**. Beads is the source of truth; this file is just a digest for humans and for **brainstorm**. Frontmatter `beads_snapshot` records the state it was rendered from. |
| `background_knowledge.txt` | Optional. Long-form context referenced from issue metadata via `summary_path`. |
| `templates/` | YAML plan templates (`hypothesis_driven_research.yaml`, `grounded_theory_generation.yaml`, plus optional `strategies/`). The epic's `metadata.research_step.template` field names which one drives the session. |
| `.asta/tasks/<bd-id>/` | Per-task working directory. Holds `input.md`, `input.json`, `output.md`, `output.json`, and any task-type-specific sidecar files (e.g., `extraction_schema.json`, `theories.json`). |

## Workflows

| Name | Purpose | Detailed instructions |
|---|---|---|
| **brainstorm** | Default. Conversational exploration of current state; drafts/refines `mission.md`; hands off to other workflows when the user is ready to act. | `workflows/brainstorm.md` |
| **init** | Set up the environment: install `bd`/`jq`, run `bd init`, verify `scripts/summary-check.sh`. Hands off to **plan**. | `workflows/init.md` |
| **plan** | Create or extend the graph. Bootstraps the epic + initial frontier from `mission.md`, or replans downstream tasks after a closed task. | `workflows/plan.md` |
| **execute** | Run one ready task end-to-end. Hands off to **plan** when the closed task type unlocks new structure; otherwise to **update-summary**. | `workflows/execute.md` |
| **brainstorm** | Default. Conversational exploration of current state; drafts/refines `mission.md`; selects a template; hands off to other workflows when the user is ready to act. | `workflows/brainstorm.md` |
| **init** | Set up the environment: install `bd`/`jq`, run `bd init`, create `.asta/` skeleton, verify `scripts/summary-check.sh`. Hands off to **plan**. | `workflows/init.md` |
| **plan** | Create or extend the graph. Reads the epic's chosen template and walks its YAML to bootstrap or replan downstream tasks. The agent is the walker — no separate walker script. | `workflows/plan.md` |
| **execute** | Run one ready task end-to-end. Renders `input.md` / `input.json` from the issue's metadata + upstream task outputs, invokes the agent, validates the result, and closes the issue. Hands off to **plan** when the closed task type unlocks new structure; otherwise to **update-summary**. | `workflows/execute.md` |
| **update-summary** | Regenerate `summary.md` from beads. Idempotent — no-op when `scripts/summary-check.sh` reports `status: fresh`. | `workflows/update-summary.md` |

Task-type schemas live in `assets/schemas.yaml`.
Output schemas live in `assets/schemas.yaml`. Output schemas for theorizer-derived task types are referenced via `bash_ref:` (the validator runs the upstream CLI at validate time to fetch the canonical shape; see §6.3 of `spec.md`).

## Routing

Expand All @@ -45,14 +46,31 @@ If the user named a workflow ("init the research", "refresh the summary", "run t

### 2. Otherwise → brainstorm

If the user did not name a workflow, run **brainstorm**. It inspects the working directory, answers the user's question, drafts or refines `mission.md` when appropriate, and hands off to `init` / `plan` / `execute` / `update-summary` once the user is ready to act.
If the user did not name a workflow, run **brainstorm**. It inspects the working directory, answers the user's question, drafts or refines `mission.md` when appropriate, **selects a plan template** when one isn't yet chosen, and hands off to `init` / `plan` / `execute` / `update-summary` once the user is ready to act.

### 3. Chaining

- **init** → always run **plan** afterwards (which then chains to **update-summary**).
- **plan** → always run **update-summary** afterwards so the digest reflects the new graph.
- **execute** → if the closed task type is `literature_review`, `hypothesis`, `analysis`, or `synthesis`, chain to **plan** (which chains to **update-summary**); otherwise chain directly to **update-summary**.
- **execute** → if the closed task type is `literature_review`, `hypothesis`, `analysis`, `synthesis`, `auto_discovery`, `extraction_schema_design`, `theorizer_extraction`, `theory_generation`, `grounded_theory_generation`, or `novelty_assessment`, chain to **plan** (which chains to **update-summary**); for `scope`, `definitions`, `experiment_design`, or `evidence_gathering`, chain directly to **update-summary**.
- **update-summary** and **brainstorm** → never chain.
- **brainstorm** performs template selection via `AskUserQuestion` once a `mission.md` exists but no epic does; the chosen template name is stored on the epic at bootstrap time and read by `plan.md` on every invocation.

## Conventions for `input.md` / `output.md` files

Both files live in each task's working dir at `.asta/tasks/<bd-id>/`. They are the human-readable surface of the task; the structured surface lives in `input.json` / `output.json`. Three conventions apply to whoever writes them:

- **`input.md` is a short brief** (~2-4 sentences). Plan writes it at task-create time, once the upstream `output.json` files are on disk and the task's own `input_instructions` have been interpolated. It says what this task is about and where its inputs come from. Not the full prompt — the full prompt lives in `metadata.research_step.input_instructions` and is what the executing agent works against.
- **`output.md` is the narrative artifact** for the task. Written by the executing agent during the work step. Length depends on the task.
- **Data-file references in markdown links are file-relative.** Any markdown link to a local data file (CSV, JSON, log, figure, notebook) in `input.md` or `output.md` must be relative to the file containing the link — the same convention standard markdown viewers and the asta-flows web UI use. From a task's `output.md` at `.asta/tasks/<bd-id>/output.md`, link the AutoDS metadata as `[label](../../autods-run/metadata.json)`, not `[label](.asta/autods-run/metadata.json)` (the second form resolves to `.asta/tasks/<bd-id>/.asta/autods-run/...` and breaks). Absolute paths under `/Users/`, `/home/`, or `/private/`, and `file://` URLs, are rejected by `scripts/validate-output.sh` (the asta-flows `/api/artifact` endpoint refuses anything outside the run dir). If an upstream tool emits a file outside the run dir, the importing task is responsible for copying or symlinking it under `.asta/` before referencing it.
- **Path field values inside `.json` files are run-root-relative.** Fields like `metadata_path`, `nodes_path`, `log_path`, `schema_path`, `extraction_results_path`, etc. inside `output.json` (and any sidecar) take paths relative to the run root (e.g. `.asta/autods-run/metadata.json`). These are machine-read; the consumer joins them against the run root.
- **Citation strategy (applies to both `input.md` and `output.md`).** Back up every non-obvious statement with a hyperlink to the file that grounds it. Two kinds:
- **Task-internal citations** — references to another task in this epic. Form: `` [`bd-id`](.asta/tasks/<bd-id>/output.md) ``. Prefer the task's `output.md`. Use a sidecar (`theories.json`, `novelty_results.json`, `extraction_schema.json`, `extraction_results.json`) only when the claim lives only in structured form.
- **Literature citations** — references to a published paper. Form: `[Author Year](url)` for first authors (`[Bahr 1997](https://…)`), `[Author1, Author2 & Author3 Year](url)` for two-or-three-author cites (`[Bahr, Pfeffer & Kaser 2015](https://…)`), and `[Author1, Author2 et al. Year](url)` for four-or-more (`[Christian, Whorton et al. 2022](https://…)`). The `url` comes from the citing task's `output.json` `citations[].url`; the literature_review task that surfaced the paper is the canonical source for that URL. If a local copy of the paper exists (e.g. under `.asta/literature/`), link to that path instead; otherwise use the DOI, arXiv URL, or Semantic Scholar URL recorded in the citation.

What counts as "non-obvious": any quantitative claim, comparison to a published result, methodological choice, or domain assertion that isn't shared common knowledge in the field. Framing sentences ("This section is about X") don't need citations; specific claims about how things work or what the data shows do.

The citation convention itself is enforced by example, not by `validate-output.sh`; the run-relative-path rule above **is** enforced — see `scripts/validate-output.sh`.

## Boundaries

Expand Down
Loading
Loading