| created | 2026-05-11 | ||
|---|---|---|---|
| last_modified | 2026-05-11 | ||
| revisions | 0 | ||
| doc_type |
|
||
| supersedes | none | ||
| informs | m-cli-integration-research.md |
Status: design-approved, ready for Phase 1 implementation.
Scope: the m engine subcommand family in m-cli core, the
dist/m-test-engine.json manifest published by this repo, and the
companion changes to m-cli's m doctor that consume it.
Predecessor: docs/m-cli-integration-research.md
captures the rationale, ecosystem comparisons, and option analysis that
led to the decisions below. This plan does not restate that material; it
records the what and when, with the decisions section pinning every
open question.
Each entry below corresponds to a numbered open question in
m-cli-integration-research.md §5. Recording them here makes the plan
self-contained and audit-friendly for future contributors.
The manifest is authored and versioned in this repo (m-test-engine)
and vendored into m-cli/dist/m-test-engine.json at m-cli release time.
- Why: m-cli already has a vendoring discipline (
dist/repo.meta.json,dist/commands.json, etc.); reusing the same pattern avoids inventing a second metadata-package release pipeline. m-cli releases ride on their own cadence and pull in the latest published manifest. - Mechanic: m-cli's
make manifesttarget gains a step that copiesm-test-engine/dist/m-test-engine.jsonfrom a pinned tag (recorded in m-cli's lockfile ordist/repo.meta.jsondependenciesblock). No network fetch at runtime — vendoring is a build-time artifact. - Drift gate: m-cli's
make check-manifestasserts the vendored copy byte-matches the pinned upstream tag.
Confirmed. GHCR matches the org domain on GitHub, supports anonymous pulls, and inherits org-level access control. No separate Docker Hub account or rate-limit footprint to manage.
- First published tag:
:r2.02(matches the currentydb_versionin the manifest sketch). - Floating tag:
:latesttracks the most recent stable r-release. - Multi-arch: linux/amd64 only initially; arm64 added when Mac-on-arm
consumers materialise (build matrix already supports it via
docker buildx).
Decision: m engine start shells out to docker compose -f <discovered>
as the primary path. Plain docker run is the documented fallback for
hosts without the compose plugin, constructible deterministically from
the manifest fields.
Pros/cons that drove the call — the user's stated priorities were simplicity, maintainability, and minimal drift as Docker evolves:
| Aspect | docker compose |
docker run |
|---|---|---|
| Declarativeness | One reviewable compose.yml file; diffs read like config |
Configuration lives in code (flags assembled per call) |
| Drift over time | Compose v2 schema is stable; Docker has committed to it long-term (v1 was retired 2023) | CLI flags are the most stable Docker surface — ~10 years backward-compatible |
| Set-and-forget | Yes — edit the file, restart the container, done | Partially — flag changes touch m-cli code |
| Dependency surface | Requires docker compose plugin (bundled with modern Docker since 2022) |
Only requires docker CLI |
| Multi-container readiness | Trivial (add a service) | Manual orchestration |
| Maintainability across the m-* repos | The compose file lives in m-test-engine and every consumer points at the same one | Every consumer assembles its own flag list — divergence risk |
| Failure mode visibility | Compose surfaces healthchecks, depends_on, restart-policy out of the box | Has to be re-implemented per call site |
Why compose wins for this project: m-test-engine already ships
compose.yml as its canonical contract; pointing every consumer at that
same file means the configuration lives in one place and edits
propagate without coordinated multi-repo changes. Compose v2 is now
shipped as part of Docker Engine and Docker Desktop, so the
"prerequisite" cost is approximately zero on any host that has Docker
in the first place. Compose's schema has been remarkably stable since
the v2 rewrite (2022); the deprecations that do happen (e.g. the
version: top-level field) are non-breaking warnings.
Why docker run stays in the fallback slot: minimal CI runners and
older Linux distros sometimes ship Docker Engine without the compose
plugin. The manifest carries enough fields (image, container,
bind_mount, env vars) to reconstruct an equivalent docker run
invocation; m-cli detects compose-plugin absence and falls back
transparently.
Set-and-forget guarantee: the manifest declares both
compose_file: docker/compose.yml and a run_args block. m-cli prefers
compose; on docker compose version failure it constructs the
equivalent docker run from run_args and the rest of the manifest.
Either path produces an identically-named, identically-mounted
container.
Decision: a single, shared host directory at $HOME/m-work is
bind-mounted into the container at /m-work. All m-* repos that
participate (m-cli, m-stdlib, m-test-engine itself, future m-*
projects) are checked out or symlinked under $HOME/m-work/, e.g.:
$HOME/m-work/
├── m-cli/
├── m-stdlib/
├── m-test-engine/
├── m-modern-corpus/
└── ...
The container sees the same layout under /m-work/. ydb_routines is
configured (inside the container) to include the relevant routine
subdirs across all participating repos, so routines from m-stdlib are
callable from m-cli tests without re-mounting or restarting.
- Why $HOME, not a root-peer path: this is a single-user
home-server environment; rooting host paths under
$HOMEavoidssudofor directory creation and generalises across users. Container-side paths stay absolute (/m-work) because they're the public cross-repo contract — only the host side moves. - Why this shape: every m-* repo provides distinct capabilities
(m-stdlib provides
^STDASSERT/^STDJSON/^STDREGEX; m-cli provides linting/formatting/runner; m-modern-corpus provides calibration M source). They must coexist in the running engine to be useful. A per-cwd/workmount silos them and forces "one container per project" — which contradicts the canonical-runtime model. - Manifest field:
(was:
"bind_mount": { "host": "$HOME/m-work", "container": "/m-work", "mode": "rw" }
"bind_mount": "/work"— a single string. Promoted to an object to carry host/container/mode. Host side later moved from/m-workto$HOME/m-workper workspace convention; consumers expand$HOMEat runtime.) m engine startprecondition:$HOME/m-workmust exist on the host. If absent, m-cli prints an actionable hint:✗ host directory $HOME/m-work does not exist fix: mkdir -p "$HOME/m-work" cd "$HOME/m-work" && git clone https://github.com/m-dev-tools/m-cli cd "$HOME/m-work" && git clone https://github.com/m-dev-tools/m-stdlib- m-cli implications: m-cli's
engine.pyDockerEngineconstructor loses its per-instancebind_rootarg in favour of the manifest's shared mount. Engine discovery (detect_engine) becomes a singleton per host, not per cwd. - Migration note for existing dev setups: anyone with a working
/work-mounted setup needs a one-time move to$HOME/m-work. m-cli'sm doctordetects the legacy mount and emits a migration hint (✗ legacy /work mount detected — see docs/migration-to-m-work.md).
The protocol field in dist/m-test-engine.json is a single integer
that m-cli treats as a compatibility handshake. Question 5 in the
research doc was left open ("advise on impact"). Here is the
recommendation and the policy that follows from it.
Impact of getting the policy wrong:
- Bumping too aggressively — every minor manifest change forces
every consumer to upgrade.
m doctorstarts firing "protocol mismatch" warnings during normal release cycles, users develop alarm-fatigue, and the field becomes ignored noise. - Bumping too conservatively — silent contract drift. A field's semantics change but the protocol number doesn't move, so m-cli keeps using the old interpretation and behaves wrongly. This is the more dangerous failure mode because it manifests as inscrutable bugs rather than visible warnings.
Policy (additive-by-default, strict on semantics):
| Change | Bump protocol? |
|---|---|
| New optional field added | No |
| New required field added | Yes |
| Field renamed | Yes |
| Field removed | Yes |
| Field's type changes (string → object, etc.) | Yes |
| Field's semantics change (same name, new meaning) | Yes |
| Default value of an optional field changes | No (document in release notes) |
| New enum value added to an existing enum field | No, provided consumers tolerate unknown values |
| Enum value removed or repurposed | Yes |
| Documentation / comment / typo fix | No |
Consumer rules (m-cli, future drivers):
- m-cli must tolerate unknown fields in the manifest. Future additive evolution stays unblocked.
- m-cli must reject a manifest whose
protocolis higher than the highest version it understands, with a clear "upgrade m-cli" hint. - m-cli may warn when
protocolis lower than expected (consumer is newer than the manifest); behaviour is best-effort.
Expected cadence: bumps are rare. Realistic expectation is one bump per 12–24 months. Most evolution will be additive.
Initial state: protocol: 1 ships with Phase 1.
Confirmed. Short, consistent with m_cli.plugins (the existing
entry-point group name), reads naturally as "m-cli engines".
- Underscore-separated to match Python entry-point conventions.
- Locked as part of
PLUGIN_API_VERSION = 1once Phase 2 ships.
The research doc proposed five phases; this plan keeps that shape but specifies the exit criteria, owners, and the cross-repo coordination required for each.
Goal: ship the manifest from this repo, vendor it into m-cli, and
rewrite m doctor's Docker-path hints to consume it. No new
subcommands, no Docker image changes.
Deliverables in m-test-engine:
dist/m-test-engine.json— hand-authored, validated against a JSON Schema atdist/m-test-engine.schema.json. Fields exactly as decided above (image,default_tag,container,bind_mountobject,compose_file,repo_url,min_docker,ydb_version,protocol,run_args).make check-manifest— schema-validatesdist/m-test-engine.json, assertsverified_onis within 90 days, and asserts the referencedcompose_filepath exists.- README pointer to the manifest as the public machine-readable contract.
Deliverables in m-cli:
dist/m-test-engine.jsonvendored from this repo at a pinned tag.m doctorrewritten so every WARN in the Docker engine path quotes the exactdocker pull/docker compose -f <path> up -d/docker exec m-test-engine ...command derived from the manifest.m doctor --jsonschema extended withfix.command: [...]andfix.destructive: boolper check (lays groundwork for autonomous agents).- Root-cause grouping: prerequisite-failed checks downstream report
SKIPPEDrather than running and producing secondary failures.
Exit criteria:
m doctoron a fresh Mac with Docker installed and no m-test-engine pulled prints a four-line fix recipe that, when run verbatim, resolves every WARN.m doctor --jsonvalidates against the new schema.- m-cli's
make check-manifestcatches drift from the upstream manifest.
Duration: 1–2 days of focused work. Phase 1 is independent of every later phase and is the single highest-leverage delivery.
Goal: turn the WARN hints from Phase 1 into commands that actually
exist. m doctor --fix becomes safe and idempotent.
Deliverables in m-cli:
- New subcommand tree under
src/m_cli/engine/:m engine status(text +--json)m engine installm engine startm engine stop/restartm engine logs [--follow]m engine shellm engine exec '<m-cmd>'m engine versionm engine upgradem engine reset --confirm(destructive, opt-in)m engine capabilities --json(mirrors top-levelm capabilities)
EngineDriverprotocol exported as a public API; built-inDockerDriveris the only registered driver.m_cli_enginesPython entry-point group declared and documented; no out-of-tree drivers yet but the seam exists.m doctor --fixdelegates tom engine <verb>for every fixable WARN; refuses to run destructive verbs without explicit--confirm.dist/commands.jsonauto-grows to include theenginenamespace (downstream agents pick it up for free).
Exit criteria:
m engine status --jsonis the canonical health check;m doctor's runtime section becomes a thin facade over it.- All
m engine <verb>calls construct theirdocker/docker composeinvocations from the manifest — no hard-coded image names or paths in Python. m doctor --fixon a fresh Mac with Docker installed runs to a green state without manual intervention.
Duration: 3–5 days.
Goal: make the image self-describing once pulled, so m-cli can do
version-mismatch detection and m engine status can report real Docker
health.
Deliverables in m-test-engine:
Dockerfileadds:LABEL org.m-dev-tools.m-test-engine.protocol="1" LABEL org.m-dev-tools.m-test-engine.bind-mount="/m-work" LABEL org.m-dev-tools.m-test-engine.ydb-version="r2.02" LABEL org.m-dev-tools.m-test-engine.image-rev="<git-sha>" HEALTHCHECK CMD $ydb_dist/mumps -run %XCMD 'write "ok",!' || exit 1
make smokeextended to verify the label set and healthcheck presence.- Release process documents the
image-revpropagation (docker buildx --build-arg GIT_SHA=$(git rev-parse HEAD)).
Deliverables in m-cli:
m engine statusreadsdocker image inspectand surfacesprotocol_mismatch/image_outdatedwarnings derived from label comparisons against the vendored manifest.m engine versionprints both the manifest-declared expectation and the image-reported actual.
Exit criteria:
- An intentionally-mismatched image (older tag pulled, newer manifest
vendored) produces a clear "run
m engine upgrade" WARN. docker inspect --format '{{.State.Health.Status}}'returnshealthyafterm engine startcompletes.
Duration: 1–2 days, mostly on the m-test-engine side; m-cli side is
small once Phase 2's status infrastructure is in place.
Goal: structured, rich introspection from inside the container, so
m engine status --verbose reports more than just "running / healthy".
Deliverables in m-test-engine:
mteshell script (or compact M routine) on$PATHinside the container.mte status --jsonprints:{ "ok": true, "ydb_dist": "/opt/yottadb/r2.02", "release": "r2.02", "uptime_s": 1234, "globals_count": 17, "routines_count": 412, "mounted_repos": ["m-cli", "m-stdlib", "m-modern-corpus"] }- Tests in
make smokeassertmte status --jsonproduces valid JSON.
Deliverables in m-cli:
m engine status --verboserunsdocker exec m-test-engine mte status --jsonand folds the output into its report.m engine watch --interval 5sstreamsmte status --jsonlines for live monitoring (TAP-like format for CI; JSON-lines for tools).
Exit criteria:
m engine status --verboseon a healthy container shows mounted repos, routine count, uptime — answering "is the engine ready for my repo's tests?" not just "is it up?".
Duration: 2–3 days.
Goal: extend the existing manifest-driven AI-discoverability stance to the engine namespace, so Claude Code and other agents bootstrap m-* projects without bespoke instructions.
Deliverables:
- Auto-generated
~/claude/skills/m-engine/SKILL.mddriven bydist/m-test-engine.json+ theengineslice of m-cli'sdist/commands.json(make skill-installtarget in this repo, mirroring m-stdlib's existing pattern). dist/m-test-engine.jsongains averbssection declaring whichm engine <verb>commands are safe for autonomous execution vs. require--confirm:"verbs": { "status": { "destructive": false, "read_only": true }, "start": { "destructive": false, "read_only": false }, "reset": { "destructive": true, "requires_confirm": true }, ... }
- Optional: m-cli MCP server registers the safe verbs as MCP tools so Claude Code can drive the engine natively without shelling out.
Exit criteria:
- A fresh Claude Code session in any m-* repo auto-loads the m-engine
skill and offers
m engine install/start/statusas actions. - The verb-safety classification gates destructive operations at the agent-harness layer, not at human-prose-warning layer.
Duration: 2–3 days, parallelisable with Phase 4.
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
Docker compose v2 schema deprecation breaks compose.yml mid-cycle |
Low | Medium | The run_args fallback in the manifest reconstructs docker run. Schema deprecations in v2 have been non-breaking warnings; we'd notice via make smoke long before users do. |
/m-work migration friction for existing /work-mounted devs |
Medium | Low | m doctor detects legacy /work, prints a one-step mv / re-symlink hint. Document in docs/migration-to-m-work.md. Only relevant for current maintainers; new users land directly on /m-work. |
| Manifest drift between m-test-engine and m-cli's vendored copy | Medium | High | m-cli's make check-manifest byte-compares against the pinned upstream tag; CI gate. Vendoring pin recorded in m-cli's dist/repo.meta.json dependencies block. |
| GHCR rate-limits or outages | Low | Medium | Anonymous pulls are 1000/hr per IP — well above realistic dev usage. For CI, document GHCR token auth as an opt-in. |
| Protocol bump churn surprises users | Low | Medium | Policy in §1.5 is conservative-by-default; expected cadence is 12–24 months. Every bump documented in CHANGELOG.md with a migration recipe. |
| m-cli grows into "yet another Docker orchestrator" | Medium | Medium | Scope discipline: m engine shells out to docker / docker compose; it does not reimplement them. Anything beyond start/stop/exec/status belongs in compose, not in Python. The EngineDriver seam keeps the door open for non-Docker engines without bloating core. |
mte introspection script leaks YDB internals or PII |
Low | Low | Output is structured JSON with a fixed allowlist (no $ZGBLDIR, no env dump). Phase 4 ships with a schema for mte status --json and tests pin the field set. |
Bind-mount of host $HOME/m-work exposes too much filesystem to the container |
Low | Low | $HOME/m-work is a user-controlled directory containing only m-* repos. Mount mode is rw (consumers need to write build artifacts). Document the security model in README. |
Mapping back to the research doc's framing — what does the status-quo unblock once each phase lands?
| Benefit | Phase that delivers it |
|---|---|
m doctor produces actionable, copy-pasteable fix recipes |
1 |
AI agents bootstrap m-* projects from dist/commands.json alone |
1 (manifest) + 2 (m engine in commands.json) |
| Version-mismatch detection between image and m-cli | 3 |
Single shared engine across all m-* repos via $HOME/m-work (host) → /m-work (container) |
1 (manifest) + 2 (start command) |
m doctor --fix autonomous-execution safe |
2 (typed fixes) + 5 (verb safety classes) |
Continuous health monitoring (m engine watch) |
4 |
| Out-of-tree engines (IRIS, podman) without forking core | 2 (m_cli_engines entry point) |
Phase ordering reflects dependency between this repo and m-cli:
this repo (m-test-engine) m-cli
───────────────────────── ─────
Phase 1a: ship manifest ───► Phase 1b: vendor + rewrite doctor
Phase 2: m engine subcommand family
Phase 3a: labels + healthcheck ─► Phase 3b: status reads labels
Phase 4a: mte introspection ───► Phase 4b: status --verbose
Phase 5: skill + MCP
Phase 1a (this repo) is the only blocker for Phase 1b (m-cli). After that, m-cli can iterate independently through Phase 2 without further changes here. Phases 3 and 4 require small coordinated bumps but neither breaks any earlier deliverable.
Explicitly not part of this plan:
- IRIS engine support (the
m_cli_enginesentry-point seam admits it later, but no IRIS driver ships in core). - Podman as a Docker drop-in (same — seam exists, driver doesn't).
- Multi-arch image (arm64) — added when arm64 consumers materialise.
- SSH transport changes —
SSHEngineremains the legacy maintainer path; not modified by this plan. - VistA-specific extras inside the container (no FileMan, no Kernel — m-test-engine's existing guardrail stands).
- m-cli replacing
docker/docker composewith a Python Docker SDK. Shell-out keeps the dependency surface minimal and the behaviour trivially auditable.
A new contributor on a fresh laptop, after git clone m-cli, runs:
m doctor --fix
m test…and sees a green test suite without reading any documentation, without manually pulling images, and without setting environment variables. That is the bar Phase 2 must clear. Every phase before contributes to it; every phase after polishes it for agents.