Asta flows integration into research step#68
Conversation
| rationale: string | ||
|
|
||
| literature_review: | ||
| inputs: [scope, definitions] |
There was a problem hiding this comment.
Strict inputs don't really make sense in a world where multiple templates exist for research step. e.g. synthesis can happen after literature review or analysis steps. Update schemas to just document output shapes, which can then be used by LLMs to glue into downstream tasks.
| # ({research_step: {task_type, inputs, output_schema_version, output}}) | ||
| # 3. has every required `output.<key>` for the given <task_type> per | ||
| # assets/schemas.yaml (schema_version: 1) | ||
| # If [task-dir] (e.g. .asta/tasks/<id>) is given, also runs document-quality |
| # 5 — task_type mismatch with envelope | ||
| # 6 — required output.md missing (only when [task-dir] supplied) | ||
| # 7 — output.md empty or a stub (only when [task-dir] supplied) | ||
| # 8 — output.md has no markdown links (only when [task-dir] supplied) |
There was a problem hiding this comment.
I explored a lot of different approaches to getting the output markdown to be human understandable and to contain rich citations. Ultimately, prompting alone was insufficient and often ignored by the LLM. The validate output "linting" approach seems to do the best for steering the agent towards the quality we want in the outputs:
There was a problem hiding this comment.
That's pretty interesting. Effectively, the validation shell script is the documentation for the output format
| exit 8 | ||
| fi | ||
| # Strip links, then flag any named entity still bare in output.md / report.tex. | ||
| unlinked=$(for f in "$md" "$task_dir/artifacts/report.tex" "$task_dir/report.tex"; do |
There was a problem hiding this comment.
LLMs love to just refer to entities (e.g. files, literature review results) and not actually link to them. This guard makes sure that known entities are actually hyperllinked, which really helps a user navigate the output.
| exit 9 | ||
| fi | ||
|
|
||
| # The report's basics. Only the report node makes report.tex; when it exists, |
There was a problem hiding this comment.
I'm not sure if putting step-specific validations in the validate-output.sh script makes sense. It's confusing to the LLM to provide step specific scripts to run, and the likelyhood of it deciding not to run them is increased. Some sort of extension system where the template provides step-specific linters might be a good solution.
There was a problem hiding this comment.
I would have recommended making a different validation script for each output type. But I think you're saying that you tried this and it didn't work? Still, I think it's the only solution that scales. It seems straightforward to associate the script with the task type somewhere, like in the schemas.yaml or the template file
| @@ -0,0 +1,78 @@ | |||
| # Example theorizer mission statement | |||
There was a problem hiding this comment.
I have found providing examples of inputs to agents really helpful in getting consistency across workflows.
| @@ -0,0 +1,118 @@ | |||
| --- | |||
| name: data_driven_theory_generation | |||
There was a problem hiding this comment.
This is the core template for the auto-ds => theorizer workflow. I explored expressing these workflows using a strongly-typed pydantic workflow engine. I had good results, and I think a more strictly typed system might be the best long term approach. However, I ended up creating a pretty elaborate DSL, and the engine code itself was over 1000 lines.
Markdown templates are remarkably effective and seem like a good approach for now until we have a good sense of the patterns and number of templates we'd like to support.
There was a problem hiding this comment.
Those task types are pretty elaborate! Very cool, actually. I like the pattern of identifying the upstream inputs by just the type names.
I'm not clear on the node id vs the type. Why have both?
Many of these node types are not listed in schemas.yaml. Should they be?
I'm a little unclear about the role of schemas.yaml, actually. I think it's really useful to have an explicit description of the output format of each task, but it also looks like the validation script is serving this purpose?
Off the top of my head, what makes sense to me is for schemas.yaml to be the source of truth for the set of all possible nodes and what they produce. It certainly seems cleaner for the agent to consult schemas.yaml instead of the validation script. The template files can describe ways in which the nodes can be chained together to accomplish the research goal. That means it's unclear where to document the process for executing each of the task types. Many of them should be shared, I think. I'm shifting my view on whether a template should customize the instructions for executing a node. I think it's cleaner just introduce a new task type if there's a variant (or maybe parameterize the task), instead of overriding the implementation in the template. Maybe task types that are unique to a template can be described local to it, and we have some way of promoting them to a shared space.
| |---|---| | ||
| | `literature_review`, `hypothesis`, `analysis`, `synthesis` | **plan** (with this issue as the source). `plan` then chains to **update-summary**. Note: `hypothesis` only reaches this branch in the rare case it was left open at creation; the normal path is plan→auto-resolve. | | ||
| | `scope`, `definitions`, `experiment_design`, `evidence_gathering` | **update-summary** directly. | | ||
| 5. **Do the work.** Produce all three task outputs under `.asta/tasks/<id>/` — see the skill's "Task outputs" table for their roles. **All three are mandatory:** `output.json` (matches the schema), `output.md` (the readable result, with links per the template's writing rules), and `artifacts/` (every other file produced). For schema fields ending in `_path`, write the file first and put the relative path in the JSON. |
There was a problem hiding this comment.
This is the core update to support templates. TLDR is instead of hardcoding the steps, point the agent to the template.
I also added instructions to persist the output of each step in a specific task directly, which enables us to build a really useful flow visualization on top of the workflow:
As a human, persisting the outputs into folders which i can navigate myself is invaluable.
There was a problem hiding this comment.
Love the workflow visualization. Love the accompanying output.md.
I notice you're not consulting the template for how to execute a task, just for preferences on output.md. That seems like a good trade-off. I'm unclear on how the agent knows which template is being used across sessions. It that somewhere in the beads DB?
…hypothesis-driven flow schemas.yaml v2: tasks are pure output contracts (key -> type maps), one outcome verdict vocabulary, immutable adjudication records, A2A 1.0 artifact/part types, config block, and the hypothesis_driven_research flow. Ships the compiled assets (per-task JSON Schemas, flows.json, flow diagrams); validate-output.sh deep-validates against them. New next-task.sh (single ordering definition) and task-output-keys.sh (single schema reader); bd list --limit 0 throughout; close-task.sh never closes the epic root. Workflows updated to match; execute.md adds report conventions.
assets/compiled/ is generated from schemas.yaml by the schema compiler at build time; keep the source of truth only.
94fe2c4 to
efd94ea
Compare
…ow diagram, and compiler for review
efd94ea to
e596be5
Compare
| flows { | ||
| tool_name = "asta-flows" | ||
| install_type = "local" | ||
| install_source = "~/workspace/asta-flows" |
There was a problem hiding this comment.
You want to install from git, presumably. For local testing, you can set ASTA_CONFIG_FILE to a file that points to your local directory
|
Things that are great:
More observations: The opening paragraph In Actually, "task" is confusing to me, because it just looks like a compound data type. It seems like each "task" could equally well be defined as as "type". The main purpose seems to be to attach a list of Are flow step inputs earlier flow steps, or data types? I see that most flow step names have corresponding task names, but some don't (reproduction vs reproduction_synthesis). Maybe this is just an oversight, but if so it should be caught by I don't see any description of what gets written into I don't see where I notice you abandoned beads native way of tracking task dependencies in favor of an id-based hierarchy. Curious what drove this decision, since I thought agents were able to use it pretty well. Using literal beads task IDs seem like it would make it hard to do dynamic replanning. Do you need to change a task ID to change its order in the flow? The |
|
Notes from a test run: I gave it the mission of generating theories to explain an AD result from Ai1 behavioral experiments. Agent presented a few different possible flow and recommended a stripped-down theorizer. Very cool to be able to customize a theorizer workflow! Agent successfully built an extraction schema and constructed novelty and accuracy-focused theories from the literature. I noticed a lot of one-off code generation, to produce step outputs I guess? I was surprised that it was needed, as I would have expected the From Claude's self-reflection:
Help text claims it just builds the schema. The CLI call ran schema build + 56 paper extractions + 8 theory formations + 18 novelty assessments — ~38 min, $39.85. Three downstream tasks now
theory_formation close failed on the first try because 8 theories with full content trees (statements, evidence bullets, predictions, unaccounted) overran the cap. The execute workflow says "keep it slim" but
After evidence_extraction (the only child of bje.2) closed, close-task.sh cascade-closed bje.2. I had to reopen it manually before laying theory_generation under it. Happened a second time when bje.2.2.1.1
The schema is shaped for a single paper (paper_id: string), but find-and-extract returns extractions from many. Coerced with paper_id: "multi" and per-paper provenance via citation_title in each row. Validator |

This adds a
flowssection to the research-step schemas and expands the tasktaxonomy behind it. It replaces the markdown plan templates from earlier in
this branch.
Markdown vs YAML
I tried markdown-based workflow definitions first. Rewriting the same
workflows as YAML in
schemas.yamlworked much better: the structure holds upacross runs, and the definition doubles as something we can validate against.
Scripts vs Prose
Same lesson with the task graph. Prose descriptions of how beads should be
created and closed produced a slightly different graph shape every run. Task
creation and resolution are now deterministic scripts (
create-task.sh,close-task.sh): hierarchical ids, metadata initialized from the schema,outputs validated and published on close, parent groups closed automatically
when their last child finishes.
Testing
To shake this out I ran two complete workflows end to end (theory generation
grounded in an auto-ds run, then a follow-up discovery run) and published them
with the asta workspace skill — each run page is the report series plus a
browsable task graph:
https://animated-couscous-7pqjqog.pages.github.io/
Common vs Custom taxonomy
I started out trying to reuse the existing schemas and repurpose them for
these workflows, and it was really hard — the generic task types never quite
fit what a step actually produces. I've come around to the opinion that we
should lean toward rich, workflow-specific taxonomies instead of a small
common one. Expanding the taxonomy did two jobs at once: it surfaced gaps in
asta's capabilities (task types we want but have no skill for yet), and it
let the workflows get rich without the long markdown agents drift away from.
The process was repetition: run the workflow, ask the agent to reflect on
where the schema was limiting, expand the schema, run again — until the
reports consistently came out in good shape.