Hyper: light/medium tiers, Run: auto autonomous-run engine, and supporting fixes#20
Open
galatanovidiu wants to merge 7 commits into
Open
Hyper: light/medium tiers, Run: auto autonomous-run engine, and supporting fixes#20galatanovidiu wants to merge 7 commits into
galatanovidiu wants to merge 7 commits into
Conversation
…core build WHAT: - Add shared/ as the single authoring source for cross-skill content (state probe, state-root, data-model, memory, gates, archive, intake-triage, bootstrap, templates) and scripts/sync-shared.mjs, which vendors byte-identical copies into each consuming skill and guards drift with --check. - Fold the 10 internal phase skills into hyper-build as flat reference/phase-*.md; rewire hyper-build dispatch to read its own phase files instead of invoking sibling skills. - Make every standalone skill self-contained: no SKILL.md or reference file references a sibling skill via ../ paths (grep is empty repo-wide). - Localize the loop's docs and code-review capabilities into reference/docs.md and the shared reference/change-review.md (used by hyper's verify and hyper-build's verify phase). - Remove hyper-short-story, hyper-digest, hyper-code-review, and the old test/validation scripts. - Rewrite README, AGENTS.md (reverse the suite-internal-reference stance), CHANGELOG, docs/maintaining-hyper.md, and the data-model inventory to the self-contained, build-process model. WHY: - skills.sh installs each skill independently and does not copy sibling files, so cross-skill ../ references broke single-skill installs. Authoring shared content once and vendoring it at build time makes every skill installable standalone while keeping one source of truth, and shrinks the public skill list to the workflows users actually invoke.
WHAT: Add a bounded Phase-2 alignment-probe lane (read-only free; code spikes scratch-only with a hard promotion boundary; loops start single-part and decompose after approval). Derive single-part part approval from the loop-plan approval, expiring on all four invalidation paths (split, reframe, loop-plan rework, close). Delete the Current focus section in favor of a single canonical Next atomic move. Make implement-cycle sub-agent dispatch opt-in with recommendation guards and an explicit writes boundary. WHY: The skill paid fixed ceremony regardless of task size and structurally blocked its own "probe before committing" use case. These changes cut the redundant second approval and duplicated next-move/context state while keeping the safety property (no unapproved production change) machine-checkable.
The probe exited 2 ("every candidate task/loop folder failed to parse")
for a healthy project whose only loops were status: done and that had no
active tasks. The exit-2 accounting counted only active loops as
successful parses (loopResult.active.length), so a parsed-but-done loop
looked like a parse failure. Done tasks avoided this because archived
records count them; done loops had no equivalent term.
collectLoopFolders now returns parsedCount (loops whose loop.md yielded
valid frontmatter, any status), and the exit-2 check uses it. A project
where all folders parsed but none are active now exits 0; a project whose
folders genuinely all fail to parse still exits 2.
Adds evals/harness/state-probe.test.mjs covering done-only-exits-0,
all-unparseable-exits-2, and the mixed case.
WHAT: New hyper-light skill. Align on the goal and its done-conditions, work in small evidence-backed moves, checkpoint only when the route drifts, close with an honest check. No .hyper/ persistence, no parts, no approval gates; keeps the one rail of pausing before irreversible or outward-facing actions. WHY: Fill the gap below hyper for small, single-session work that needs no persistence or gates, so the full loop machinery is not paid for tiny tasks. Escalate to hyper when work must persist or split into parts.
WHAT: New hyper-medium skill. Persisted observe-orient-decide-act loops with one alignment gate and one verify gate, a single track of cycles — no parts, authority proxies, or sub-agent dispatch. WHY: Bridge hyper-light and hyper for adaptive work that must survive across sessions but does not need multi-part decomposition or delegated authority.
…check WHAT: New Run: manual | auto axis in hyper's ## Authority, independent of Mode. Run: auto drives its own cycles: after each cycle a separate, cheap bar-check evaluator (not the doer) returns continue | done | course-correct | stop-for-user, and the loop continues until the bar is met or a stop boundary fires. Requires a machine-checkable bar — every Definition-of-done line carries a check: predicate — and the Phase 2 auto-run gate refuses auto without one, falling back to manual. Stop-for-user breaks the loop in every mode; zoom-out checkpoints surface to the user (interactive) or are proxy-resolved (delegated). Full contract in reference/autonomous-run.md; SKILL.md gains the Run field, the bar-check capability, the auto-run gate, and the Phase 3 engine; templates/loop.md gains the Run field and the bar Check. WHY: Let a loop run cycle-after-cycle without a per-cycle prompt while keeping approval safety — the governed form of "stop prompting, write loops". Maker is not the checker: the agent doing the work never grades its own completion.
…obe fix WHAT: Wire the new surfaces into the human-facing docs and the canonical data model. README gains three-tier guidance, the manual-vs-auto runs section, and the hyper-light command/skill entries. AGENTS.md and maintaining-hyper.md move to the ten-skill inventory and list autonomous-run.md in the loop-contract fragile surfaces. data-model.md documents the Run axis and the machine-checkable bar, re-synced to the hyper-build and hyper-task copies. CHANGELOG records hyper-light, the Run: auto engine, and the state-probe done-loops fix. WHY: Keep the docs, changelog, and data model in step with the shipped hyper-light tier and Run: auto engine, and record the probe fix that landed in 1859e97 without a changelog entry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch grew well past its original
hyper-lightscope. It now lands two new workflow tiers, a governed autonomous-run engine forhyper, and supporting build/probe fixes.What's in here (by commit)
refactor(skills)— every shipped skill is self-contained; shared content (state probe, references, templates) is authored once inshared/and vendored byscripts/sync-shared.mjs.--checkguards drift in CI.refactor(hyper)— reduce loop ceremony without losing approval safety.fix(state-probe)— the probe wrongly exited non-zero on a healthy "all loops done" project (done loops weren't counted as successful parses). Now countsparsedCount; a genuinely unparseable folder still exits 2. Covered byevals/harness/state-probe.test.mjs.feat(hyper-light)— lightest tier: small single-session work, no.hyper/state, no parts, no approval gates.feat(hyper-medium)— middle tier: persisted OODA loops, one alignment gate + one verify gate, single track of cycles, no parts/proxies/dispatch.feat(hyper)— Run: auto autonomous-run engine. A newRun: manual | autoaxis (independent ofMode). Inauto, a separate cheapbar-checkevaluator runs after each cycle and returnscontinue | done | course-correct | stop-for-user; the loop drives itself until the bar is met or a stop boundary fires. Requires a machine-checkable bar — the Phase 2 auto-run gate refusesautowithout one. Stop-for-user breaks the loop in every mode; checkpoints surface (interactive) or are proxy-resolved (delegated). Contract inskills/hyper/reference/autonomous-run.md.docs(hyper)— wire the above into README, AGENTS.md, maintaining-hyper.md, the data model (re-synced), and CHANGELOG.Testing
/hyperloop drove three cycles to a passing bar, verified, and closed with zero per-cycle prompts; thedoneverdict correctly beat a 3-cycle checkpoint. Safety path — the gate refused a vague bar, a stop-for-user trigger broke the loop on an unplanned dependency, and a checkpoint surfaced (interactive) vs proxy-resolved (delegated). Every verdict came from a separate Haiku evaluator.evals/harness/state-probe.test.mjspasses (4/4);sync-shared.mjs --checkis clean.hyper-lightandhyper-mediumare not yet exercised in a real session — they are committed here for review but should be dry-run in a throwaway project before relying on them. TheRun: autoengine and the probe fix are validated.