`m engine` — implementation plan

Status: design-approved, ready for Phase 1 implementation. Scope: the m engine subcommand family in m-cli core, the dist/m-test-engine.json manifest published by this repo, and the companion changes to m-cli's m doctor that consume it. Predecessor: docs/m-cli-integration-research.md captures the rationale, ecosystem comparisons, and option analysis that led to the decisions below. This plan does not restate that material; it records the what and when, with the decisions section pinning every open question.

1. Decisions on open questions

Each entry below corresponds to a numbered open question in m-cli-integration-research.md §5. Recording them here makes the plan self-contained and audit-friendly for future contributors.

1.1 Source of truth for `dist/m-test-engine.json` — Option A

The manifest is authored and versioned in this repo (m-test-engine) and vendored into m-cli/dist/m-test-engine.json at m-cli release time.

Why: m-cli already has a vendoring discipline (dist/repo.meta.json, dist/commands.json, etc.); reusing the same pattern avoids inventing a second metadata-package release pipeline. m-cli releases ride on their own cadence and pull in the latest published manifest.
Mechanic: m-cli's make manifest target gains a step that copies m-test-engine/dist/m-test-engine.json from a pinned tag (recorded in m-cli's lockfile or dist/repo.meta.json dependencies block). No network fetch at runtime — vendoring is a build-time artifact.
Drift gate: m-cli's make check-manifest asserts the vendored copy byte-matches the pinned upstream tag.

1.2 Canonical image registry — `ghcr.io/m-dev-tools/m-test-engine`

Confirmed. GHCR matches the org domain on GitHub, supports anonymous pulls, and inherits org-level access control. No separate Docker Hub account or rate-limit footprint to manage.

First published tag: :r2.02 (matches the current ydb_version in the manifest sketch).
Floating tag: :latest tracks the most recent stable r-release.
Multi-arch: linux/amd64 only initially; arm64 added when Mac-on-arm consumers materialise (build matrix already supports it via docker buildx).

1.3 Compose vs. `docker run` — compose-first, run-fallback

Decision: m engine start shells out to docker compose -f <discovered> as the primary path. Plain docker run is the documented fallback for hosts without the compose plugin, constructible deterministically from the manifest fields.

Pros/cons that drove the call — the user's stated priorities were simplicity, maintainability, and minimal drift as Docker evolves:

Aspect	`docker compose`	`docker run`
Declarativeness	One reviewable `compose.yml` file; diffs read like config	Configuration lives in code (flags assembled per call)
Drift over time	Compose v2 schema is stable; Docker has committed to it long-term (v1 was retired 2023)	CLI flags are the most stable Docker surface — ~10 years backward-compatible
Set-and-forget	Yes — edit the file, restart the container, done	Partially — flag changes touch m-cli code
Dependency surface	Requires `docker compose` plugin (bundled with modern Docker since 2022)	Only requires `docker` CLI
Multi-container readiness	Trivial (add a service)	Manual orchestration
Maintainability across the m-* repos	The compose file lives in m-test-engine and every consumer points at the same one	Every consumer assembles its own flag list — divergence risk
Failure mode visibility	Compose surfaces healthchecks, depends_on, restart-policy out of the box	Has to be re-implemented per call site

Why compose wins for this project: m-test-engine already ships compose.yml as its canonical contract; pointing every consumer at that same file means the configuration lives in one place and edits propagate without coordinated multi-repo changes. Compose v2 is now shipped as part of Docker Engine and Docker Desktop, so the "prerequisite" cost is approximately zero on any host that has Docker in the first place. Compose's schema has been remarkably stable since the v2 rewrite (2022); the deprecations that do happen (e.g. the version: top-level field) are non-breaking warnings.

Why docker run stays in the fallback slot: minimal CI runners and older Linux distros sometimes ship Docker Engine without the compose plugin. The manifest carries enough fields (image, container, bind_mount, env vars) to reconstruct an equivalent docker run invocation; m-cli detects compose-plugin absence and falls back transparently.

Set-and-forget guarantee: the manifest declares both compose_file: docker/compose.yml and a run_args block. m-cli prefers compose; on docker compose version failure it constructs the equivalent docker run from run_args and the rest of the manifest. Either path produces an identically-named, identically-mounted container.

1.4 Bind-mount semantics — shared host `$HOME/m-work` directory

Decision: a single, shared host directory at $HOME/m-work is bind-mounted into the container at /m-work. All m-* repos that participate (m-cli, m-stdlib, m-test-engine itself, future m-* projects) are checked out or symlinked under $HOME/m-work/, e.g.:

$HOME/m-work/
├── m-cli/
├── m-stdlib/
├── m-test-engine/
├── m-modern-corpus/
└── ...

The container sees the same layout under /m-work/. ydb_routines is configured (inside the container) to include the relevant routine subdirs across all participating repos, so routines from m-stdlib are callable from m-cli tests without re-mounting or restarting.

Why $HOME, not a root-peer path: this is a single-user home-server environment; rooting host paths under $HOME avoids sudo for directory creation and generalises across users. Container-side paths stay absolute (/m-work) because they're the public cross-repo contract — only the host side moves.
Why this shape: every m-* repo provides distinct capabilities (m-stdlib provides ^STDASSERT / ^STDJSON / ^STDREGEX; m-cli provides linting/formatting/runner; m-modern-corpus provides calibration M source). They must coexist in the running engine to be useful. A per-cwd /work mount silos them and forces "one container per project" — which contradicts the canonical-runtime model.
Manifest field:
```
"bind_mount": {
  "host":      "$HOME/m-work",
  "container": "/m-work",
  "mode":      "rw"
}
```
(was: "bind_mount": "/work" — a single string. Promoted to an object to carry host/container/mode. Host side later moved from /m-work to $HOME/m-work per workspace convention; consumers expand $HOME at runtime.)

m engine start precondition: $HOME/m-work must exist on the host. If absent, m-cli prints an actionable hint:

✗ host directory $HOME/m-work does not exist
    fix: mkdir -p "$HOME/m-work"
         cd "$HOME/m-work" && git clone https://github.com/m-dev-tools/m-cli
         cd "$HOME/m-work" && git clone https://github.com/m-dev-tools/m-stdlib

m-cli implications: m-cli's engine.py DockerEngine constructor loses its per-instance bind_root arg in favour of the manifest's shared mount. Engine discovery (detect_engine) becomes a singleton per host, not per cwd.
Migration note for existing dev setups: anyone with a working /work-mounted setup needs a one-time move to $HOME/m-work. m-cli's m doctor detects the legacy mount and emits a migration hint (✗ legacy /work mount detected — see docs/migration-to-m-work.md).

1.5 Protocol version bump policy — semver-style, with explicit rules

The protocol field in dist/m-test-engine.json is a single integer that m-cli treats as a compatibility handshake. Question 5 in the research doc was left open ("advise on impact"). Here is the recommendation and the policy that follows from it.

Impact of getting the policy wrong:

Bumping too aggressively — every minor manifest change forces every consumer to upgrade. m doctor starts firing "protocol mismatch" warnings during normal release cycles, users develop alarm-fatigue, and the field becomes ignored noise.
Bumping too conservatively — silent contract drift. A field's semantics change but the protocol number doesn't move, so m-cli keeps using the old interpretation and behaves wrongly. This is the more dangerous failure mode because it manifests as inscrutable bugs rather than visible warnings.

Policy (additive-by-default, strict on semantics):

Change	Bump `protocol`?
New optional field added	No
New required field added	Yes
Field renamed	Yes
Field removed	Yes
Field's type changes (string → object, etc.)	Yes
Field's semantics change (same name, new meaning)	Yes
Default value of an optional field changes	No (document in release notes)
New enum value added to an existing enum field	No, provided consumers tolerate unknown values
Enum value removed or repurposed	Yes
Documentation / comment / typo fix	No

Consumer rules (m-cli, future drivers):

m-cli must tolerate unknown fields in the manifest. Future additive evolution stays unblocked.
m-cli must reject a manifest whose protocol is higher than the highest version it understands, with a clear "upgrade m-cli" hint.
m-cli may warn when protocol is lower than expected (consumer is newer than the manifest); behaviour is best-effort.

Expected cadence: bumps are rare. Realistic expectation is one bump per 12–24 months. Most evolution will be additive.

Initial state: protocol: 1 ships with Phase 1.

1.6 `EngineDriver` entry-point group name — `m_cli_engines`

Confirmed. Short, consistent with m_cli.plugins (the existing entry-point group name), reads naturally as "m-cli engines".

Underscore-separated to match Python entry-point conventions.
Locked as part of PLUGIN_API_VERSION = 1 once Phase 2 ships.

2. Phased rollout

The research doc proposed five phases; this plan keeps that shape but specifies the exit criteria, owners, and the cross-repo coordination required for each.

Phase 1 — vendored manifest + actionable `m doctor`

Goal: ship the manifest from this repo, vendor it into m-cli, and rewrite m doctor's Docker-path hints to consume it. No new subcommands, no Docker image changes.

Deliverables in m-test-engine:

dist/m-test-engine.json — hand-authored, validated against a JSON Schema at dist/m-test-engine.schema.json. Fields exactly as decided above (image, default_tag, container, bind_mount object, compose_file, repo_url, min_docker, ydb_version, protocol, run_args).
make check-manifest — schema-validates dist/m-test-engine.json, asserts verified_on is within 90 days, and asserts the referenced compose_file path exists.
README pointer to the manifest as the public machine-readable contract.

Deliverables in m-cli:

dist/m-test-engine.json vendored from this repo at a pinned tag.
m doctor rewritten so every WARN in the Docker engine path quotes the exact docker pull / docker compose -f <path> up -d / docker exec m-test-engine ... command derived from the manifest.
m doctor --json schema extended with fix.command: [...] and fix.destructive: bool per check (lays groundwork for autonomous agents).
Root-cause grouping: prerequisite-failed checks downstream report SKIPPED rather than running and producing secondary failures.

Exit criteria:

m doctor on a fresh Mac with Docker installed and no m-test-engine pulled prints a four-line fix recipe that, when run verbatim, resolves every WARN.
m doctor --json validates against the new schema.
m-cli's make check-manifest catches drift from the upstream manifest.

Duration: 1–2 days of focused work. Phase 1 is independent of every later phase and is the single highest-leverage delivery.

Phase 2 — `m engine` subcommand family in m-cli core

Goal: turn the WARN hints from Phase 1 into commands that actually exist. m doctor --fix becomes safe and idempotent.

Deliverables in m-cli:

New subcommand tree under src/m_cli/engine/:
- m engine status (text + --json)
- m engine install
- m engine start
- m engine stop / restart
- m engine logs [--follow]
- m engine shell
- m engine exec '<m-cmd>'
- m engine version
- m engine upgrade
- m engine reset --confirm (destructive, opt-in)
- m engine capabilities --json (mirrors top-level m capabilities)
EngineDriver protocol exported as a public API; built-in DockerDriver is the only registered driver.
m_cli_engines Python entry-point group declared and documented; no out-of-tree drivers yet but the seam exists.
m doctor --fix delegates to m engine <verb> for every fixable WARN; refuses to run destructive verbs without explicit --confirm.
dist/commands.json auto-grows to include the engine namespace (downstream agents pick it up for free).

Exit criteria:

m engine status --json is the canonical health check; m doctor's runtime section becomes a thin facade over it.
All m engine <verb> calls construct their docker / docker compose invocations from the manifest — no hard-coded image names or paths in Python.
m doctor --fix on a fresh Mac with Docker installed runs to a green state without manual intervention.

Duration: 3–5 days.

Phase 3 — OCI labels + `HEALTHCHECK` (m-test-engine side)

Goal: make the image self-describing once pulled, so m-cli can do version-mismatch detection and m engine status can report real Docker health.

Deliverables in m-test-engine:

Dockerfile adds:

LABEL org.m-dev-tools.m-test-engine.protocol="1"
LABEL org.m-dev-tools.m-test-engine.bind-mount="/m-work"
LABEL org.m-dev-tools.m-test-engine.ydb-version="r2.02"
LABEL org.m-dev-tools.m-test-engine.image-rev="<git-sha>"
HEALTHCHECK CMD $ydb_dist/mumps -run %XCMD 'write "ok",!' || exit 1

make smoke extended to verify the label set and healthcheck presence.
Release process documents the image-rev propagation (docker buildx --build-arg GIT_SHA=$(git rev-parse HEAD)).

Deliverables in m-cli:

m engine status reads docker image inspect and surfaces protocol_mismatch / image_outdated warnings derived from label comparisons against the vendored manifest.
m engine version prints both the manifest-declared expectation and the image-reported actual.

Exit criteria:

An intentionally-mismatched image (older tag pulled, newer manifest vendored) produces a clear "run m engine upgrade" WARN.
docker inspect --format '{{.State.Health.Status}}' returns healthy after m engine start completes.

Duration: 1–2 days, mostly on the m-test-engine side; m-cli side is small once Phase 2's status infrastructure is in place.

Phase 4 — `mte` container-side introspection

Goal: structured, rich introspection from inside the container, so m engine status --verbose reports more than just "running / healthy".

Deliverables in m-test-engine:

mte shell script (or compact M routine) on $PATH inside the container. mte status --json prints:

{
  "ok": true,
  "ydb_dist": "/opt/yottadb/r2.02",
  "release": "r2.02",
  "uptime_s": 1234,
  "globals_count": 17,
  "routines_count": 412,
  "mounted_repos": ["m-cli", "m-stdlib", "m-modern-corpus"]
}

Tests in make smoke assert mte status --json produces valid JSON.

Deliverables in m-cli:

m engine status --verbose runs docker exec m-test-engine mte status --json and folds the output into its report.
m engine watch --interval 5s streams mte status --json lines for live monitoring (TAP-like format for CI; JSON-lines for tools).

Exit criteria:

m engine status --verbose on a healthy container shows mounted repos, routine count, uptime — answering "is the engine ready for my repo's tests?" not just "is it up?".

Duration: 2–3 days.

Phase 5 — Skill / MCP integration

Goal: extend the existing manifest-driven AI-discoverability stance to the engine namespace, so Claude Code and other agents bootstrap m-* projects without bespoke instructions.

Deliverables:

Auto-generated ~/claude/skills/m-engine/SKILL.md driven by dist/m-test-engine.json + the engine slice of m-cli's dist/commands.json (make skill-install target in this repo, mirroring m-stdlib's existing pattern).

dist/m-test-engine.json gains a verbs section declaring which m engine <verb> commands are safe for autonomous execution vs. require --confirm:

"verbs": {
  "status":  { "destructive": false, "read_only": true },
  "start":   { "destructive": false, "read_only": false },
  "reset":   { "destructive": true,  "requires_confirm": true },
  ...
}

Optional: m-cli MCP server registers the safe verbs as MCP tools so Claude Code can drive the engine natively without shelling out.

Exit criteria:

A fresh Claude Code session in any m-* repo auto-loads the m-engine skill and offers m engine install / start / status as actions.
The verb-safety classification gates destructive operations at the agent-harness layer, not at human-prose-warning layer.

Duration: 2–3 days, parallelisable with Phase 4.

3. Risks and mitigations

Risk	Likelihood	Impact	Mitigation
Docker compose v2 schema deprecation breaks `compose.yml` mid-cycle	Low	Medium	The `run_args` fallback in the manifest reconstructs `docker run`. Schema deprecations in v2 have been non-breaking warnings; we'd notice via `make smoke` long before users do.
`/m-work` migration friction for existing `/work`-mounted devs	Medium	Low	`m doctor` detects legacy `/work`, prints a one-step `mv` / re-symlink hint. Document in `docs/migration-to-m-work.md`. Only relevant for current maintainers; new users land directly on `/m-work`.
Manifest drift between m-test-engine and m-cli's vendored copy	Medium	High	m-cli's `make check-manifest` byte-compares against the pinned upstream tag; CI gate. Vendoring pin recorded in m-cli's `dist/repo.meta.json` `dependencies` block.
GHCR rate-limits or outages	Low	Medium	Anonymous pulls are 1000/hr per IP — well above realistic dev usage. For CI, document GHCR token auth as an opt-in.
Protocol bump churn surprises users	Low	Medium	Policy in §1.5 is conservative-by-default; expected cadence is 12–24 months. Every bump documented in `CHANGELOG.md` with a migration recipe.
m-cli grows into "yet another Docker orchestrator"	Medium	Medium	Scope discipline: `m engine` shells out to `docker` / `docker compose`; it does not reimplement them. Anything beyond start/stop/exec/status belongs in compose, not in Python. The `EngineDriver` seam keeps the door open for non-Docker engines without bloating core.
`mte` introspection script leaks YDB internals or PII	Low	Low	Output is structured JSON with a fixed allowlist (no `$ZGBLDIR`, no env dump). Phase 4 ships with a schema for `mte status --json` and tests pin the field set.
Bind-mount of host `$HOME/m-work` exposes too much filesystem to the container	Low	Low	`$HOME/m-work` is a user-controlled directory containing only m-* repos. Mount mode is `rw` (consumers need to write build artifacts). Document the security model in README.

4. Benefits realised

Mapping back to the research doc's framing — what does the status-quo unblock once each phase lands?

Benefit	Phase that delivers it
`m doctor` produces actionable, copy-pasteable fix recipes	1
AI agents bootstrap m-* projects from `dist/commands.json` alone	1 (manifest) + 2 (`m engine` in `commands.json`)
Version-mismatch detection between image and m-cli	3
Single shared engine across all m-* repos via `$HOME/m-work` (host) → `/m-work` (container)	1 (manifest) + 2 (start command)
`m doctor --fix` autonomous-execution safe	2 (typed fixes) + 5 (verb safety classes)
Continuous health monitoring (`m engine watch`)	4
Out-of-tree engines (IRIS, podman) without forking core	2 (`m_cli_engines` entry point)

5. Cross-repo coordination

Phase ordering reflects dependency between this repo and m-cli:

this repo (m-test-engine)        m-cli
─────────────────────────        ─────
Phase 1a: ship manifest    ───►  Phase 1b: vendor + rewrite doctor
                                  Phase 2:  m engine subcommand family
Phase 3a: labels + healthcheck ─► Phase 3b: status reads labels
Phase 4a: mte introspection  ───► Phase 4b: status --verbose
                                  Phase 5:  skill + MCP

Phase 1a (this repo) is the only blocker for Phase 1b (m-cli). After that, m-cli can iterate independently through Phase 2 without further changes here. Phases 3 and 4 require small coordinated bumps but neither breaks any earlier deliverable.

6. Out of scope

Explicitly not part of this plan:

IRIS engine support (the m_cli_engines entry-point seam admits it later, but no IRIS driver ships in core).
Podman as a Docker drop-in (same — seam exists, driver doesn't).
Multi-arch image (arm64) — added when arm64 consumers materialise.
SSH transport changes — SSHEngine remains the legacy maintainer path; not modified by this plan.
VistA-specific extras inside the container (no FileMan, no Kernel — m-test-engine's existing guardrail stands).
m-cli replacing docker / docker compose with a Python Docker SDK. Shell-out keeps the dependency surface minimal and the behaviour trivially auditable.

7. Success metric

A new contributor on a fresh laptop, after git clone m-cli, runs:

m doctor --fix
m test

…and sees a green test suite without reading any documentation, without manually pulling images, and without setting environment variables. That is the bar Phase 2 must clear. Every phase before contributes to it; every phase after polishes it for agents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`m engine` — implementation plan

1. Decisions on open questions

1.1 Source of truth for `dist/m-test-engine.json` — Option A

1.2 Canonical image registry — `ghcr.io/m-dev-tools/m-test-engine`

1.3 Compose vs. `docker run` — compose-first, run-fallback

1.4 Bind-mount semantics — shared host `$HOME/m-work` directory

1.5 Protocol version bump policy — semver-style, with explicit rules

1.6 `EngineDriver` entry-point group name — `m_cli_engines`

2. Phased rollout

Phase 1 — vendored manifest + actionable `m doctor`

Phase 2 — `m engine` subcommand family in m-cli core

Phase 3 — OCI labels + `HEALTHCHECK` (m-test-engine side)

Phase 4 — `mte` container-side introspection

Phase 5 — Skill / MCP integration

3. Risks and mitigations

4. Benefits realised

5. Cross-repo coordination

6. Out of scope

7. Success metric

FilesExpand file tree

m-engine-implementation-plan.md

Latest commit

History

m-engine-implementation-plan.md

File metadata and controls

m engine — implementation plan

1. Decisions on open questions

1.1 Source of truth for dist/m-test-engine.json — Option A

1.2 Canonical image registry — ghcr.io/m-dev-tools/m-test-engine

1.3 Compose vs. docker run — compose-first, run-fallback

1.4 Bind-mount semantics — shared host $HOME/m-work directory

1.5 Protocol version bump policy — semver-style, with explicit rules

1.6 EngineDriver entry-point group name — m_cli_engines

2. Phased rollout

Phase 1 — vendored manifest + actionable m doctor

Phase 2 — m engine subcommand family in m-cli core

Phase 3 — OCI labels + HEALTHCHECK (m-test-engine side)

Phase 4 — mte container-side introspection

Phase 5 — Skill / MCP integration

3. Risks and mitigations

4. Benefits realised

5. Cross-repo coordination

6. Out of scope

7. Success metric

`m engine` — implementation plan

1.1 Source of truth for `dist/m-test-engine.json` — Option A

1.2 Canonical image registry — `ghcr.io/m-dev-tools/m-test-engine`

1.3 Compose vs. `docker run` — compose-first, run-fallback

1.4 Bind-mount semantics — shared host `$HOME/m-work` directory

1.6 `EngineDriver` entry-point group name — `m_cli_engines`

Phase 1 — vendored manifest + actionable `m doctor`

Phase 2 — `m engine` subcommand family in m-cli core

Phase 3 — OCI labels + `HEALTHCHECK` (m-test-engine side)

Phase 4 — `mte` container-side introspection