Skip to content

release: publish zeroshot under @the-open-engine scope#499

Merged
tomdps merged 189 commits into
mainfrom
dev
Jun 16, 2026
Merged

release: publish zeroshot under @the-open-engine scope#499
tomdps merged 189 commits into
mainfrom
dev

Conversation

@tomdps

@tomdps tomdps commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Promote dev to main after switching npm publishing to @the-open-engine/zeroshot.
  • Keep GitHub Actions trusted publishing as the primary path with NPM_TOKEN fallback.
  • Clear the production audit gate by replacing md-to-pdf with direct Puppeteer PDF rendering.

Verification

EivMeyer and others added 30 commits January 16, 2026 14:59
## Summary
- Add `eslint-plugin-security` and `eslint-plugin-sonarjs` for code
quality
- Set `noInlineConfig: true` - prevents eslint-disable (fix code, not
rules)
- Fix 26 warnings (92 → 66 remaining)

## Changes
- **no-param-reassign (9 → 0)**: Use local variables instead of mutating
params
- **no-nested-conditional (11 → 0)**: Extract nested ternaries to helper
functions
- **detect-unsafe-regex (1 fixed)**: Replace ReDoS-vulnerable nested
quantifier
- **no-invariant-returns (2 → 0)**: File override for message handler
pattern
- **Misc (3)**: Dead eslint-disable, collection size, character class

## Remaining Warnings (66)
Architectural (complexity, max-lines) or false positives for CLI
patterns.

Closes #100

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Summary

Reduces planner verbosity from 12KB to ~3KB, fixing "No messages
returned" Claude CLI bug.

## Problem

Planner generates verbose plans (12KB+) that bloat worker context:
- Paragraphs explaining WHY instead of WHAT
- Redundant file descriptions
- Code examples for trivial changes
- 200+ word acceptance criteria

This triggers Claude CLI bug: "No messages returned"

## Solution

Added **OUTPUT CONCISENESS (CRITICAL)** section to planner prompt:
- Target: <3000 words total
- Forbidden: paragraphs, background, redundant descriptions
- Required: bullet points, imperative commands, <50 word criteria
- Examples: verbose (bad) vs concise (good)

## Expected Impact

- PLAN_READY message: 12KB → 3-4KB (~70% reduction)
- Worker context stays under threshold
- No more "No messages returned" errors

## Testing

- ✅ Template parses correctly (no JSON errors)
- ✅ Pre-commit validation passes
- ✅ All emojis preserved (UTF-8 encoding correct)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Fixes CI security audit failures by:

- Adding `--omit=dev` flag to `npm audit` command (only audit production
dependencies)
- Adding npm overrides for vulnerable dev dependencies (diff, tar,
undici)

These vulnerabilities are in dev dependencies (semantic-release, mocha)
and don't affect the published package. The `--omit=dev` flag allows CI
to pass while still catching real production security issues.

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
…r status reports (#107)

## Summary
- Remove phase-focused instructions from planner - output flat numbered
steps instead
- Tighten CRITICAL classification - only when DIRECTLY modifying
auth/billing/secrets/destructive ops
- Fix worker outputting status reports instead of doing work - add "DO
THE WORK. DON'T REPORT STATUS."

## Changes

### Planner
- Remove SCOPE REDUCTION, SILENT PHASE OMISSION sections
- Remove delegation/sub-agent schema
- Add simple flat numbered steps format
- Forbid: "Phase 1", "Phase 2", "Future work", delegation

### Worker  
- Add prominent "DO THE WORK. DON'T REPORT STATUS." warning
- Forbid outputs like "Infrastructure exists but 0% completed..."
- Require every response to include tool calls that MAKE CHANGES

### Conductor
- Tighten CRITICAL: only when code DIRECTLY modifies auth logic, payment
processing, secrets, destructive DB ops
- Reverse bias: "If unsure between STANDARD and CRITICAL, choose
STANDARD"
- Add cost context: CRITICAL uses Opus + 4 validators = expensive
- Add NOT CRITICAL examples: refactoring, types, tests, code
organization

## Test plan
- [x] JSON syntax valid
- [x] All tests pass
- [ ] Test planner produces flat steps (manual)
- [ ] Test conductor classifies refactors as STANDARD (manual)
- [ ] Test worker executes instead of reporting (manual)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Problem

The adversarial-tester validator caused 21-minute Claude API calls that
ended with `No messages returned` errors when validating large
implementations (197+ routes).

**Incident:** Cluster `rushing-sphinx-39` crashed at 15:41:57 with:
```
Error: No messages returned
    at aAB (/$bunfs/root/claude:5327:813)
```

**Root cause:** Prompt too large for Claude API context window.

## Solution

Removed adversarial-tester agent definition from full-workflow.json
(lines 580-705).

**New validator lineup:**
- validator-requirements (always)
- validator-code (validator_count >= 2)
- validator-security (validator_count >= 3)
- validator-tester (validator_count >= 4)

**With default validator_count=2:**
- 4 agents total: planner, worker, validator-requirements,
validator-code
- Down from 5 agents (removed adversarial-tester)

## Verification

- ✅ Template validation passed
- ✅ JSON structure valid
- ✅ All 6 remaining agents present
- ✅ Pre-commit hooks passed (prettier, typecheck, template validation)
Add single-session execution scope constraint to planner
Branch protection requires PRs, but semantic-release git plugin pushes
directly. Remove it - npm publish and GitHub release still work.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Eivind Meyer <eivind.meyer@ksat.no>
Co-authored-by: Michael Eichelbeck <141341133+mkceichelbeck@users.noreply.github.com>
Co-authored-by: tomdps <tom.dupuis24@gmail.com>
Co-authored-by: tomdps <60640908+tomdps@users.noreply.github.com>
Co-authored-by: Michael Eichelbeck <michael.eichelbeck.ext@wtsde.onmicrosoft.de>
…rt (#116)

Updates the README announcement to highlight:
- OpenCode CLI support
- Multi-platform issue backends (GitHub, GitLab, Jira, Azure DevOps)
Removes 'mix providers in multi-agent workflows' claim - technically
supported but not practically used or tested.
Resolves conflicts between main and dev, keeps fixed provider claim

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Eivind Meyer <eivind.meyer@ksat.no>
Co-authored-by: Michael Eichelbeck <141341133+mkceichelbeck@users.noreply.github.com>
Co-authored-by: tomdps <tom.dupuis24@gmail.com>
Co-authored-by: tomdps <60640908+tomdps@users.noreply.github.com>
Co-authored-by: Michael Eichelbeck <michael.eichelbeck.ext@wtsde.onmicrosoft.de>
## Summary
- add context metrics helper and emit JSON metrics with truncation
tracking
- refactor context builder to compute section breakdowns without
changing prompt output
- add unit + integration tests for context metrics and ledger emission

## Testing
- npm run lint (warnings only, existing)
- npm run typecheck
- npm run test:all (fails in this environment: existing suite failures +
env issues; ran before fixing the new integration test)
- npx mocha --no-config tests/integration/context-metrics.test.js
--timeout 180000
Closes #128.

## Summary
- Implement context source selection semantics (latest/oldest/all) with
amount/limit handling and updated validation.
- Apply the same selection logic to sub-cluster parent topic selection
and templates.
- Fix PR-mode completion for git-pusher, output extraction, and
orchestrator storage dir handling.
- Avoid blessed-contrib picture widget import in TUI layouts to prevent
Node 20 MemoryReadableStream crash.
- Add/adjust tests for new selection behavior and template expectations.

## Testing
- npm test
- npm run lint (warnings only)
## Summary
- replace legacy truncation with context pack budgeting
- add pack metrics + priority/compact validation
- update templates and add tests for pack behavior

## Testing
- npm run lint
- npm run test:all

Fixes #131
## Summary
- add state snapshot builder and publisher with durable STATE_SNAPSHOT
updates
- wire snapshotter into orchestrator start/load/resume and stop/kill
paths
- update base templates/context validation and add tests

## Testing
- npm run lint
- npm run test:all
## Summary
- document context selection, packs, state snapshot, and metrics with
Mermaid diagrams
- align contextStrategy sources to explicit latest semantics and add
STATE_SNAPSHOT for debug investigator
- update contributor example and link new docs

## Testing
- npm run lint (warnings only)
- npm run test:all
- npm run validate:templates
- npm run typecheck (pre-commit hook)
## Summary
- only apply provider override when explicitly set (CLI flag/env)
- add unit test covering provider override resolution

## Testing
- npx mocha tests/unit/cli-provider-override.test.js

Fixes #140
## Summary
- detect platform-mismatch CANNOT_VALIDATE results and retry validators
in docker isolation
- skip platform-mismatch reasons when validator runs in docker
- add platform mismatch detection tests

## Testing
- npm run lint
- npm run test

Fixes #142
)

## Summary
- Bump codex provider reasoning effort levels for better quality on
complex tasks
- level1: low → medium
- level2: medium → high  
- level3: high → xhigh

The `xhigh` reasoning effort allows the model to think longer for better
answers on complex tasks.

## Test plan
- [x] Existing tests pass (reasoning effort validation allows all four
values)
- [ ] Manual test with `zeroshot run --provider codex` to verify xhigh
is passed correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- fail fast on Windows in preflight with clear guidance
- document Windows deferral rationale in README

## Testing
- npm run typecheck
- npm run validate:templates
- npm run lint (warnings only)
## Summary\n- load persisted clusters before CLI resume fallback to
task\n- add unit test to guard resume loading\n\n## Testing\n- npx mocha
tests/unit/cli-resume-loads-clusters.test.js\n\nFixes #103
## Summary\n- detect "No messages returned" from Claude CLI in task
watchers, terminate the child, and mark the task failed\n- surface the
error in agent error context and grant a one-time retry for this
transient failure\n- add unit coverage for fatal error detection\n\n##
Testing\n- `npx mocha tests/unit/claude-fatal-error-detection.test.js`
(failed: mocha picked up the full suite due to .mocharc; failures
include missing better-sqlite3 and existing test failures in this
environment)
Adds PRD and multi-stage implementation plan for the Ink-based Zeroshot
TUI replacement.

Docs:
- docs/tui-v2/PRD.md
- docs/tui-v2/IMPLEMENTATION_PLAN.md
## Summary\n- log detailed diagnostics when Claude CLI returns "No
messages returned"\n- include latest Claude debug file path + tail,
status output tail, and task metadata\n\n## Testing\n- pre-commit hooks
(eslint/prettier, typecheck, template validation)\n- pre-push lint +
typecheck
…160)

## Summary

The `validator-requirements` agent was crashing with
`error_max_structured_output_retries` because its JSON schema was too
complex for Claude CLI's `--json-schema` structured output feature.

**Root cause:** The nested `criteriaResults` array (objects with nested
`evidence` object and enum constraints) was too hard for the model to
produce reliably. After 5 internal retries, the CLI threw the error.

**Fix:** Make `criteriaResults` optional (removed from `required` array)
in both:
- `full-workflow.json`
- `quick-validation.json`

This means:
- `approved` and `summary` are still required (model produces these
correctly)
- `criteriaResults` becomes best-effort (produced when model can,
gracefully omitted when not)
- Downstream consumers already handle missing/partial `criteriaResults`

Fixes #159

## Test plan
- [x] Validated templates pass (`npm run validate:templates`)
- [ ] Re-run a STANDARD task to verify validator-requirements completes
without crashing

---
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Provider-level retryable error detection (Anthropic, OpenAI, Google)
- Orchestrator robustness improvements with proper error propagation
- Agent lifecycle improvements with better state management
- TUI renderer enhancements for error visibility
- Settings handling improvements

## Test plan
- [x] Unit tests for provider retryable errors
- [x] Orchestrator tests for error scenarios

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- avoid treating "Task not found"/"Process terminated" substrings inside
JSON logs as fatal
- only treat standalone fatal lines as no-output
- add output-extraction unit tests for fatal-string handling

## Testing
- npm test

Fixes #165
EivMeyer and others added 23 commits March 5, 2026 13:27
## Summary
- protect the active cluster id (from `ZEROSHOT_CLUSTER_ID`) from orphan
GC decisions
- make GC default skip DB-file deletion while running in active cluster
context
- add unit coverage for extraKnownIds + env-based protection behavior

## Validation
- npx eslint src/lib/gc.js tests/unit/gc-orphan-protection.test.js
- npx mocha tests/unit/gc-orphan-protection.test.js --timeout 120000
- npx mocha tests/unit/detached-startup.test.js --timeout 120000
- npx mocha tests/orchestrator.test.js --grep "missing \\.db file for
cluster" --timeout 120000
## Summary
- add `zeroshot inspect <id>` for cluster/task process inspection
- include task staleness and missing-log diagnostics
- expose live process activity from the CLI and keep human/json output
paths

## Validation
- `npm test -- tests/unit/inspect-command.test.js
tests/unit/cli-inspect-command.test.js`
- `npm run typecheck`
- `npm run validate:templates`
- `node cli/index.js inspect turquoise-pulse-17 --json --sample-ms 200`

---------

Co-authored-by: CI Test <ci-test@covibes.ai>
Normalize orchestrator PR completion detection so equivalent completion
handlers resolve consistently.\n\nThis avoids divergent completion state
when the same handler is represented with different shapes across
cluster state and config.

Co-authored-by: CI Test <ci-test@covibes.ai>
Fixes the ship-mode failure where TRIVIAL TASK/DEBUG routes produced an
invalid topology after git-pusher injection. Adds router and
template-validation regression tests for PR-mode conductor routes.

Co-authored-by: CI Test <ci-test@covibes.ai>
Prevent auto-PR debug routes from wedging after validator approval.

- normalize debug workflow handoff onto IMPLEMENTATION_READY
- make template preflight validate real stage-start producers instead of
synthesizing them
- add regression coverage for the broken STANDARD DEBUG autoPr route

Co-authored-by: CI Test <ci-test@covibes.ai>
## Summary\n- await shutdown when replacing an existing agent with the
same id\n- add a regression covering async stop/start handoff
ordering\n- harden the stale-cluster replacement path observed after
validator rejection\n\n## Verification\n- npm test -- --grep "add_agents
duplicate ID handling"\n- npx mocha
tests/add-agents-trigger-merge.test.js
tests/message-buffering-while-busy.test.js
tests/two-stage-validation.test.js tests/cluster-operations.test.js\n-
npm run check\n- GIT_AUTHOR_NAME=Codex
GIT_AUTHOR_EMAIL=codex@example.com GIT_COMMITTER_NAME=Codex
GIT_COMMITTER_EMAIL=codex@example.com npm test

Co-authored-by: Codex <codex@example.com>
Clarify that development work integrates into dev and main is
release-only promotion from dev.\n\n- reinforce dev -> main release flow
in CLAUDE.md\n- add postmortem prevention note for detached PR-base
incident
## Summary
- Upgrade Codex provider default model from `gpt-5.3-codex` to
`gpt-5.4-codex` in model catalog and level mappings
- Update corresponding test assertions

## Test plan
- [x] `settings-providers.test.js` passes with updated model name

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
#467)

## Summary

- Add npm overrides for `lodash@>=4.17.24` and `marked@>=15.0.7`
- `blessed-contrib` depends on vulnerable versions that can't be
upgraded without breaking changes
- Overrides force safe transitive dependency versions

## Context

`npm audit --audit-level=moderate --omit=dev` was failing on all PRs due
to lodash Code Injection (GHSA-r5fr-rjxr-66jc) and Prototype Pollution
(GHSA-f23m-r3pf-42rh). This blocked the merge queue for all PRs.

## Test plan

- [x] `npm audit --audit-level=moderate --omit=dev` returns 0
vulnerabilities
- [x] `npm test` passes
## Summary

- Add `worktree-tooling-env.js` — resolves worktree-local tool bin
directories from `.zeroshot/tooling-env.json` metadata
- Prepend resolved tool bins to PATH in `claude-task-runner.js` and
`agent-task-executor.js` before spawning agents
- Handles submodule `.git` traversal (prefers ancestor with tooling
metadata over nested submodule roots)
- Symlink escape protection (realpath validation against worktree root)

## Context

When zeroshot runs in `--worktree` mode, a bootstrap script may install
tool wrappers (e.g. `crg`, `rox`, `cix`) into `.zeroshot/bin/` and
persist metadata in `.zeroshot/tooling-env.json`. Without this change,
spawned agent tasks don't have that bin on PATH and can't find the
tools.

## Test plan

- [x] Unit tests for `worktree-tooling-env.js` (nested cwd, submodule
traversal, symlink escape)
- [x] Unit test for `ClaudeTaskRunner._buildSpawnEnv` worktree
forwarding
## Summary
- validate nested sub-cluster configs from the main config validator to
avoid circular loading through the schema module
- inject child orchestrator creation into the sub-cluster wrapper
instead of requiring the orchestrator directly
- keep sub-cluster schema focused on shape validation while preserving
recursive validation and wrapper behavior

## Verification
- npm run typecheck
- npx mocha tests/config-validator.test.js tests/nested-cluster.test.js
Curate the provider-helper, runtime reliability, and pusher repair work
onto public dev while preserving #470.

Remove the Rust TUI release surface. Keep package metadata limited to
registry dependencies, omit tarball inputs, and keep external tool
assumptions out of published artifacts.

BREAKING CHANGE: Rust TUI commands, binaries, release assets, and
install-time TUI download support are removed from this release.
Harden the CodeQL findings blocking the release PR without changing
provider or orchestration behavior.

This keeps config-script execution as an explicit sandboxed Zeroshot
contract, creates release helper files through safer filesystem
primitives, and preserves existing detached setup state when the cluster
registry already exists.

Validation:
- npm run check:agent-cli-provider:ci
- npx mocha tests/unit/detached-startup.test.js
tests/config-validator.test.js tests/substitute-template.test.js
--timeout 30000
- npx eslint lib/detached-startup.js src/agent-cli-provider/schema.ts
src/agent/agent-hook-executor.js src/config-validator.js
tests/agent-cli-provider/parity.test.js
tests/unit/detached-startup.test.js
Remove stale Rust/Node TUI architecture and debugging references from
public repo guidance now that the TUI is intentionally unavailable in
this release.

This keeps the explicit unavailable-command/user-facing note, but
removes references to deleted implementation paths and release assets.

Validation:
- npx prettier --check AGENTS.md CLAUDE.md CONTRIBUTING.md README.md
- npx mocha tests/unit/release-hygiene.test.js
tests/unit/cli-invalid-command.test.js --timeout 30000
- commit hook: npm run typecheck; npm run validate:templates
- push hook: npm run lint; npm run typecheck
## Summary
- run the release job on Node 24 so npm is new enough for Trusted
Publishing/OIDC
- set the npm registry URL in setup-node
- remove the stale NPM_TOKEN fallback from semantic-release so CI uses
trusted publishing

## Validation
- prior release rerun passed package verification before failing
OIDC/token auth
- workflow-only change; release job will validate package again before
publishing

## Release note
This is release infrastructure only and does not change package
contents.
## Summary
- make main merge-queue runs emit the same install-matrix contexts
required by main protection
- accept both `main` and `refs/heads/main` merge-group base refs
- also match the GitHub queue ref prefix for main

## Verification
- workflow-only change
- fixes the merge queue waiting for missing install-matrix contexts
Update repository metadata, docs, and merge-queue setup references after
the GitHub organization rename.\n\nKeeps the npm package scope
unchanged.\n\nVerification:\n- npm run lint\n- npm run typecheck\n- npm
run validate:templates
## Summary
- switch package metadata, docs, update checker, and provider helper
metadata to @the-open-engine/zeroshot
- update the release smoke cleanup to uninstall the new scoped package
- keep NPM_TOKEN as fallback when trusted publishing cannot publish the
first package version

## Verification
- npm run check:agent-cli-provider:ci
- npx eslint cli/lib/update-checker.js
- temp-prefix npm pack/install smoke: zeroshot --version, --help, list
- npm publish --dry-run
## Summary
- Record origin/main as merged into current dev so the release PR can
compare cleanly.
- No file content changes relative to dev; this is an ancestry-only sync
commit.

## Verification
- PR #498 CI passed and merged via merge queue.
- dev push CI passed.
- push hook lint and typecheck passed for this branch.

---------

Co-authored-by: Eivind Meyer <eiv.meyer@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Eivind Meyer <eivind.meyer@ksat.no>
Co-authored-by: Michael Eichelbeck <141341133+mkceichelbeck@users.noreply.github.com>
Co-authored-by: Michael Eichelbeck <michael.eichelbeck.ext@wtsde.onmicrosoft.de>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-38-53.eu-north-1.compute.internal>
Co-authored-by: Eivind <eivind@covibes.ai>
Co-authored-by: CI Test <ci-test@covibes.ai>
Co-authored-by: Codex <codex@example.com>
## Summary
- Preserve origin/main as actual ancestry of dev so the protected
dev-to-main release PR can merge cleanly.
- No file content changes relative to current dev; the resolved tree
keeps the @the-open-engine package metadata and the trusted-publishing
release workflow.

## Why
- The dev merge queue squash-merges PRs, which strips merge parentage.
- The release PR requires dev to be cleanly mergeable into main, so this
sync must land as a merge commit rather than a squash.

## Verification
- push hook lint and typecheck passed for this branch.
- origin/dev package metadata is @the-open-engine/zeroshot.
- origin/main currently still has @covibes/zeroshot, which this release
PR will replace.

---------

Co-authored-by: Eivind Meyer <eiv.meyer@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Eivind Meyer <eivind.meyer@ksat.no>
Co-authored-by: Michael Eichelbeck <141341133+mkceichelbeck@users.noreply.github.com>
Co-authored-by: Michael Eichelbeck <michael.eichelbeck.ext@wtsde.onmicrosoft.de>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-38-53.eu-north-1.compute.internal>
Co-authored-by: Eivind <eivind@covibes.ai>
Co-authored-by: CI Test <ci-test@covibes.ai>
Co-authored-by: Codex <codex@example.com>
Preserve origin/main ancestry on current dev so the protected dev-to-main release PR can merge cleanly.
…-merge-release-2

chore(release): merge main into dev
@tomdps tomdps added this pull request to the merge queue Jun 16, 2026
@tomdps tomdps removed this pull request from the merge queue due to a manual request Jun 16, 2026
@tomdps tomdps added this pull request to the merge queue Jun 16, 2026
Merged via the queue into main with commit 0b51c60 Jun 16, 2026
12 checks passed
@github-actions

Copy link
Copy Markdown

🎉 This PR is included in version 6.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants