Skip to content

Hyper: light/medium tiers, Run: auto autonomous-run engine, and supporting fixes#20

Open
galatanovidiu wants to merge 7 commits into
mainfrom
feat/hyper-adaptive-improvements
Open

Hyper: light/medium tiers, Run: auto autonomous-run engine, and supporting fixes#20
galatanovidiu wants to merge 7 commits into
mainfrom
feat/hyper-adaptive-improvements

Conversation

@galatanovidiu

Copy link
Copy Markdown
Owner

Summary

This branch grew well past its original hyper-light scope. It now lands two new workflow tiers, a governed autonomous-run engine for hyper, and supporting build/probe fixes.

What's in here (by commit)

  • refactor(skills) — every shipped skill is self-contained; shared content (state probe, references, templates) is authored once in shared/ and vendored by scripts/sync-shared.mjs. --check guards drift in CI.
  • refactor(hyper) — reduce loop ceremony without losing approval safety.
  • fix(state-probe) — the probe wrongly exited non-zero on a healthy "all loops done" project (done loops weren't counted as successful parses). Now counts parsedCount; a genuinely unparseable folder still exits 2. Covered by evals/harness/state-probe.test.mjs.
  • feat(hyper-light) — lightest tier: small single-session work, no .hyper/ state, no parts, no approval gates.
  • feat(hyper-medium) — middle tier: persisted OODA loops, one alignment gate + one verify gate, single track of cycles, no parts/proxies/dispatch.
  • feat(hyper) — Run: auto autonomous-run engine. A new Run: manual | auto axis (independent of Mode). In auto, a separate cheap bar-check evaluator runs after each cycle and returns continue | done | course-correct | stop-for-user; the loop drives itself until the bar is met or a stop boundary fires. Requires a machine-checkable bar — the Phase 2 auto-run gate refuses auto without one. Stop-for-user breaks the loop in every mode; checkpoints surface (interactive) or are proxy-resolved (delegated). Contract in skills/hyper/reference/autonomous-run.md.
  • docs(hyper) — wire the above into README, AGENTS.md, maintaining-hyper.md, the data model (re-synced), and CHANGELOG.

Testing

  • Auto-run engine: validated end-to-end with live models. Happy path — a real /hyper loop drove three cycles to a passing bar, verified, and closed with zero per-cycle prompts; the done verdict correctly beat a 3-cycle checkpoint. Safety path — the gate refused a vague bar, a stop-for-user trigger broke the loop on an unplanned dependency, and a checkpoint surfaced (interactive) vs proxy-resolved (delegated). Every verdict came from a separate Haiku evaluator.
  • State probe: evals/harness/state-probe.test.mjs passes (4/4); sync-shared.mjs --check is clean.

⚠️ Reviewer note

hyper-light and hyper-medium are not yet exercised in a real session — they are committed here for review but should be dry-run in a throwaway project before relying on them. The Run: auto engine and the probe fix are validated.

…core build

WHAT:
- Add shared/ as the single authoring source for cross-skill content (state probe, state-root, data-model, memory, gates, archive, intake-triage, bootstrap, templates) and scripts/sync-shared.mjs, which vendors byte-identical copies into each consuming skill and guards drift with --check.
- Fold the 10 internal phase skills into hyper-build as flat reference/phase-*.md; rewire hyper-build dispatch to read its own phase files instead of invoking sibling skills.
- Make every standalone skill self-contained: no SKILL.md or reference file references a sibling skill via ../ paths (grep is empty repo-wide).
- Localize the loop's docs and code-review capabilities into reference/docs.md and the shared reference/change-review.md (used by hyper's verify and hyper-build's verify phase).
- Remove hyper-short-story, hyper-digest, hyper-code-review, and the old test/validation scripts.
- Rewrite README, AGENTS.md (reverse the suite-internal-reference stance), CHANGELOG, docs/maintaining-hyper.md, and the data-model inventory to the self-contained, build-process model.

WHY:
- skills.sh installs each skill independently and does not copy sibling files, so cross-skill ../ references broke single-skill installs. Authoring shared content once and vendoring it at build time makes every skill installable standalone while keeping one source of truth, and shrinks the public skill list to the workflows users actually invoke.
WHAT: Add a bounded Phase-2 alignment-probe lane (read-only free; code
spikes scratch-only with a hard promotion boundary; loops start single-part
and decompose after approval). Derive single-part part approval from the
loop-plan approval, expiring on all four invalidation paths (split, reframe,
loop-plan rework, close). Delete the Current focus section in favor of a
single canonical Next atomic move. Make implement-cycle sub-agent dispatch
opt-in with recommendation guards and an explicit writes boundary.

WHY: The skill paid fixed ceremony regardless of task size and structurally
blocked its own "probe before committing" use case. These changes cut the
redundant second approval and duplicated next-move/context state while keeping
the safety property (no unapproved production change) machine-checkable.
The probe exited 2 ("every candidate task/loop folder failed to parse")
for a healthy project whose only loops were status: done and that had no
active tasks. The exit-2 accounting counted only active loops as
successful parses (loopResult.active.length), so a parsed-but-done loop
looked like a parse failure. Done tasks avoided this because archived
records count them; done loops had no equivalent term.

collectLoopFolders now returns parsedCount (loops whose loop.md yielded
valid frontmatter, any status), and the exit-2 check uses it. A project
where all folders parsed but none are active now exits 0; a project whose
folders genuinely all fail to parse still exits 2.

Adds evals/harness/state-probe.test.mjs covering done-only-exits-0,
all-unparseable-exits-2, and the mixed case.
WHAT: New hyper-light skill. Align on the goal and its done-conditions,
work in small evidence-backed moves, checkpoint only when the route
drifts, close with an honest check. No .hyper/ persistence, no parts, no
approval gates; keeps the one rail of pausing before irreversible or
outward-facing actions.

WHY: Fill the gap below hyper for small, single-session work that needs
no persistence or gates, so the full loop machinery is not paid for tiny
tasks. Escalate to hyper when work must persist or split into parts.
WHAT: New hyper-medium skill. Persisted observe-orient-decide-act loops
with one alignment gate and one verify gate, a single track of cycles —
no parts, authority proxies, or sub-agent dispatch.

WHY: Bridge hyper-light and hyper for adaptive work that must survive
across sessions but does not need multi-part decomposition or delegated
authority.
…check

WHAT: New Run: manual | auto axis in hyper's ## Authority, independent of
Mode. Run: auto drives its own cycles: after each cycle a separate, cheap
bar-check evaluator (not the doer) returns continue | done |
course-correct | stop-for-user, and the loop continues until the bar is
met or a stop boundary fires. Requires a machine-checkable bar — every
Definition-of-done line carries a check: predicate — and the Phase 2
auto-run gate refuses auto without one, falling back to manual.
Stop-for-user breaks the loop in every mode; zoom-out checkpoints surface
to the user (interactive) or are proxy-resolved (delegated). Full
contract in reference/autonomous-run.md; SKILL.md gains the Run field,
the bar-check capability, the auto-run gate, and the Phase 3 engine;
templates/loop.md gains the Run field and the bar Check.

WHY: Let a loop run cycle-after-cycle without a per-cycle prompt while
keeping approval safety — the governed form of "stop prompting, write
loops". Maker is not the checker: the agent doing the work never grades
its own completion.
…obe fix

WHAT: Wire the new surfaces into the human-facing docs and the canonical
data model. README gains three-tier guidance, the manual-vs-auto runs
section, and the hyper-light command/skill entries. AGENTS.md and
maintaining-hyper.md move to the ten-skill inventory and list
autonomous-run.md in the loop-contract fragile surfaces. data-model.md
documents the Run axis and the machine-checkable bar, re-synced to the
hyper-build and hyper-task copies. CHANGELOG records hyper-light, the
Run: auto engine, and the state-probe done-loops fix.

WHY: Keep the docs, changelog, and data model in step with the shipped
hyper-light tier and Run: auto engine, and record the probe fix that
landed in 1859e97 without a changelog entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant