Skip to content

Latest commit

 

History

History
514 lines (409 loc) · 22.3 KB

File metadata and controls

514 lines (409 loc) · 22.3 KB
created 2026-05-11
last_modified 2026-05-11
revisions 0
doc_type
PLAN
DESIGN
supersedes none
informs m-cli-integration-research.md

m engine — implementation plan

Status: design-approved, ready for Phase 1 implementation. Scope: the m engine subcommand family in m-cli core, the dist/m-test-engine.json manifest published by this repo, and the companion changes to m-cli's m doctor that consume it. Predecessor: docs/m-cli-integration-research.md captures the rationale, ecosystem comparisons, and option analysis that led to the decisions below. This plan does not restate that material; it records the what and when, with the decisions section pinning every open question.


1. Decisions on open questions

Each entry below corresponds to a numbered open question in m-cli-integration-research.md §5. Recording them here makes the plan self-contained and audit-friendly for future contributors.

1.1 Source of truth for dist/m-test-engine.jsonOption A

The manifest is authored and versioned in this repo (m-test-engine) and vendored into m-cli/dist/m-test-engine.json at m-cli release time.

  • Why: m-cli already has a vendoring discipline (dist/repo.meta.json, dist/commands.json, etc.); reusing the same pattern avoids inventing a second metadata-package release pipeline. m-cli releases ride on their own cadence and pull in the latest published manifest.
  • Mechanic: m-cli's make manifest target gains a step that copies m-test-engine/dist/m-test-engine.json from a pinned tag (recorded in m-cli's lockfile or dist/repo.meta.json dependencies block). No network fetch at runtime — vendoring is a build-time artifact.
  • Drift gate: m-cli's make check-manifest asserts the vendored copy byte-matches the pinned upstream tag.

1.2 Canonical image registry — ghcr.io/m-dev-tools/m-test-engine

Confirmed. GHCR matches the org domain on GitHub, supports anonymous pulls, and inherits org-level access control. No separate Docker Hub account or rate-limit footprint to manage.

  • First published tag: :r2.02 (matches the current ydb_version in the manifest sketch).
  • Floating tag: :latest tracks the most recent stable r-release.
  • Multi-arch: linux/amd64 only initially; arm64 added when Mac-on-arm consumers materialise (build matrix already supports it via docker buildx).

1.3 Compose vs. docker runcompose-first, run-fallback

Decision: m engine start shells out to docker compose -f <discovered> as the primary path. Plain docker run is the documented fallback for hosts without the compose plugin, constructible deterministically from the manifest fields.

Pros/cons that drove the call — the user's stated priorities were simplicity, maintainability, and minimal drift as Docker evolves:

Aspect docker compose docker run
Declarativeness One reviewable compose.yml file; diffs read like config Configuration lives in code (flags assembled per call)
Drift over time Compose v2 schema is stable; Docker has committed to it long-term (v1 was retired 2023) CLI flags are the most stable Docker surface — ~10 years backward-compatible
Set-and-forget Yes — edit the file, restart the container, done Partially — flag changes touch m-cli code
Dependency surface Requires docker compose plugin (bundled with modern Docker since 2022) Only requires docker CLI
Multi-container readiness Trivial (add a service) Manual orchestration
Maintainability across the m-* repos The compose file lives in m-test-engine and every consumer points at the same one Every consumer assembles its own flag list — divergence risk
Failure mode visibility Compose surfaces healthchecks, depends_on, restart-policy out of the box Has to be re-implemented per call site

Why compose wins for this project: m-test-engine already ships compose.yml as its canonical contract; pointing every consumer at that same file means the configuration lives in one place and edits propagate without coordinated multi-repo changes. Compose v2 is now shipped as part of Docker Engine and Docker Desktop, so the "prerequisite" cost is approximately zero on any host that has Docker in the first place. Compose's schema has been remarkably stable since the v2 rewrite (2022); the deprecations that do happen (e.g. the version: top-level field) are non-breaking warnings.

Why docker run stays in the fallback slot: minimal CI runners and older Linux distros sometimes ship Docker Engine without the compose plugin. The manifest carries enough fields (image, container, bind_mount, env vars) to reconstruct an equivalent docker run invocation; m-cli detects compose-plugin absence and falls back transparently.

Set-and-forget guarantee: the manifest declares both compose_file: docker/compose.yml and a run_args block. m-cli prefers compose; on docker compose version failure it constructs the equivalent docker run from run_args and the rest of the manifest. Either path produces an identically-named, identically-mounted container.

1.4 Bind-mount semantics — shared host $HOME/m-work directory

Decision: a single, shared host directory at $HOME/m-work is bind-mounted into the container at /m-work. All m-* repos that participate (m-cli, m-stdlib, m-test-engine itself, future m-* projects) are checked out or symlinked under $HOME/m-work/, e.g.:

$HOME/m-work/
├── m-cli/
├── m-stdlib/
├── m-test-engine/
├── m-modern-corpus/
└── ...

The container sees the same layout under /m-work/. ydb_routines is configured (inside the container) to include the relevant routine subdirs across all participating repos, so routines from m-stdlib are callable from m-cli tests without re-mounting or restarting.

  • Why $HOME, not a root-peer path: this is a single-user home-server environment; rooting host paths under $HOME avoids sudo for directory creation and generalises across users. Container-side paths stay absolute (/m-work) because they're the public cross-repo contract — only the host side moves.
  • Why this shape: every m-* repo provides distinct capabilities (m-stdlib provides ^STDASSERT / ^STDJSON / ^STDREGEX; m-cli provides linting/formatting/runner; m-modern-corpus provides calibration M source). They must coexist in the running engine to be useful. A per-cwd /work mount silos them and forces "one container per project" — which contradicts the canonical-runtime model.
  • Manifest field:
    "bind_mount": {
      "host":      "$HOME/m-work",
      "container": "/m-work",
      "mode":      "rw"
    }
    (was: "bind_mount": "/work" — a single string. Promoted to an object to carry host/container/mode. Host side later moved from /m-work to $HOME/m-work per workspace convention; consumers expand $HOME at runtime.)
  • m engine start precondition: $HOME/m-work must exist on the host. If absent, m-cli prints an actionable hint:
    ✗ host directory $HOME/m-work does not exist
        fix: mkdir -p "$HOME/m-work"
             cd "$HOME/m-work" && git clone https://github.com/m-dev-tools/m-cli
             cd "$HOME/m-work" && git clone https://github.com/m-dev-tools/m-stdlib
    
  • m-cli implications: m-cli's engine.py DockerEngine constructor loses its per-instance bind_root arg in favour of the manifest's shared mount. Engine discovery (detect_engine) becomes a singleton per host, not per cwd.
  • Migration note for existing dev setups: anyone with a working /work-mounted setup needs a one-time move to $HOME/m-work. m-cli's m doctor detects the legacy mount and emits a migration hint (✗ legacy /work mount detected — see docs/migration-to-m-work.md).

1.5 Protocol version bump policy — semver-style, with explicit rules

The protocol field in dist/m-test-engine.json is a single integer that m-cli treats as a compatibility handshake. Question 5 in the research doc was left open ("advise on impact"). Here is the recommendation and the policy that follows from it.

Impact of getting the policy wrong:

  • Bumping too aggressively — every minor manifest change forces every consumer to upgrade. m doctor starts firing "protocol mismatch" warnings during normal release cycles, users develop alarm-fatigue, and the field becomes ignored noise.
  • Bumping too conservatively — silent contract drift. A field's semantics change but the protocol number doesn't move, so m-cli keeps using the old interpretation and behaves wrongly. This is the more dangerous failure mode because it manifests as inscrutable bugs rather than visible warnings.

Policy (additive-by-default, strict on semantics):

Change Bump protocol?
New optional field added No
New required field added Yes
Field renamed Yes
Field removed Yes
Field's type changes (string → object, etc.) Yes
Field's semantics change (same name, new meaning) Yes
Default value of an optional field changes No (document in release notes)
New enum value added to an existing enum field No, provided consumers tolerate unknown values
Enum value removed or repurposed Yes
Documentation / comment / typo fix No

Consumer rules (m-cli, future drivers):

  • m-cli must tolerate unknown fields in the manifest. Future additive evolution stays unblocked.
  • m-cli must reject a manifest whose protocol is higher than the highest version it understands, with a clear "upgrade m-cli" hint.
  • m-cli may warn when protocol is lower than expected (consumer is newer than the manifest); behaviour is best-effort.

Expected cadence: bumps are rare. Realistic expectation is one bump per 12–24 months. Most evolution will be additive.

Initial state: protocol: 1 ships with Phase 1.

1.6 EngineDriver entry-point group name — m_cli_engines

Confirmed. Short, consistent with m_cli.plugins (the existing entry-point group name), reads naturally as "m-cli engines".

  • Underscore-separated to match Python entry-point conventions.
  • Locked as part of PLUGIN_API_VERSION = 1 once Phase 2 ships.

2. Phased rollout

The research doc proposed five phases; this plan keeps that shape but specifies the exit criteria, owners, and the cross-repo coordination required for each.

Phase 1 — vendored manifest + actionable m doctor

Goal: ship the manifest from this repo, vendor it into m-cli, and rewrite m doctor's Docker-path hints to consume it. No new subcommands, no Docker image changes.

Deliverables in m-test-engine:

  • dist/m-test-engine.json — hand-authored, validated against a JSON Schema at dist/m-test-engine.schema.json. Fields exactly as decided above (image, default_tag, container, bind_mount object, compose_file, repo_url, min_docker, ydb_version, protocol, run_args).
  • make check-manifest — schema-validates dist/m-test-engine.json, asserts verified_on is within 90 days, and asserts the referenced compose_file path exists.
  • README pointer to the manifest as the public machine-readable contract.

Deliverables in m-cli:

  • dist/m-test-engine.json vendored from this repo at a pinned tag.
  • m doctor rewritten so every WARN in the Docker engine path quotes the exact docker pull / docker compose -f <path> up -d / docker exec m-test-engine ... command derived from the manifest.
  • m doctor --json schema extended with fix.command: [...] and fix.destructive: bool per check (lays groundwork for autonomous agents).
  • Root-cause grouping: prerequisite-failed checks downstream report SKIPPED rather than running and producing secondary failures.

Exit criteria:

  • m doctor on a fresh Mac with Docker installed and no m-test-engine pulled prints a four-line fix recipe that, when run verbatim, resolves every WARN.
  • m doctor --json validates against the new schema.
  • m-cli's make check-manifest catches drift from the upstream manifest.

Duration: 1–2 days of focused work. Phase 1 is independent of every later phase and is the single highest-leverage delivery.


Phase 2 — m engine subcommand family in m-cli core

Goal: turn the WARN hints from Phase 1 into commands that actually exist. m doctor --fix becomes safe and idempotent.

Deliverables in m-cli:

  • New subcommand tree under src/m_cli/engine/:
    • m engine status (text + --json)
    • m engine install
    • m engine start
    • m engine stop / restart
    • m engine logs [--follow]
    • m engine shell
    • m engine exec '<m-cmd>'
    • m engine version
    • m engine upgrade
    • m engine reset --confirm (destructive, opt-in)
    • m engine capabilities --json (mirrors top-level m capabilities)
  • EngineDriver protocol exported as a public API; built-in DockerDriver is the only registered driver.
  • m_cli_engines Python entry-point group declared and documented; no out-of-tree drivers yet but the seam exists.
  • m doctor --fix delegates to m engine <verb> for every fixable WARN; refuses to run destructive verbs without explicit --confirm.
  • dist/commands.json auto-grows to include the engine namespace (downstream agents pick it up for free).

Exit criteria:

  • m engine status --json is the canonical health check; m doctor's runtime section becomes a thin facade over it.
  • All m engine <verb> calls construct their docker / docker compose invocations from the manifest — no hard-coded image names or paths in Python.
  • m doctor --fix on a fresh Mac with Docker installed runs to a green state without manual intervention.

Duration: 3–5 days.


Phase 3 — OCI labels + HEALTHCHECK (m-test-engine side)

Goal: make the image self-describing once pulled, so m-cli can do version-mismatch detection and m engine status can report real Docker health.

Deliverables in m-test-engine:

  • Dockerfile adds:
    LABEL org.m-dev-tools.m-test-engine.protocol="1"
    LABEL org.m-dev-tools.m-test-engine.bind-mount="/m-work"
    LABEL org.m-dev-tools.m-test-engine.ydb-version="r2.02"
    LABEL org.m-dev-tools.m-test-engine.image-rev="<git-sha>"
    HEALTHCHECK CMD $ydb_dist/mumps -run %XCMD 'write "ok",!' || exit 1
  • make smoke extended to verify the label set and healthcheck presence.
  • Release process documents the image-rev propagation (docker buildx --build-arg GIT_SHA=$(git rev-parse HEAD)).

Deliverables in m-cli:

  • m engine status reads docker image inspect and surfaces protocol_mismatch / image_outdated warnings derived from label comparisons against the vendored manifest.
  • m engine version prints both the manifest-declared expectation and the image-reported actual.

Exit criteria:

  • An intentionally-mismatched image (older tag pulled, newer manifest vendored) produces a clear "run m engine upgrade" WARN.
  • docker inspect --format '{{.State.Health.Status}}' returns healthy after m engine start completes.

Duration: 1–2 days, mostly on the m-test-engine side; m-cli side is small once Phase 2's status infrastructure is in place.


Phase 4 — mte container-side introspection

Goal: structured, rich introspection from inside the container, so m engine status --verbose reports more than just "running / healthy".

Deliverables in m-test-engine:

  • mte shell script (or compact M routine) on $PATH inside the container. mte status --json prints:
    {
      "ok": true,
      "ydb_dist": "/opt/yottadb/r2.02",
      "release": "r2.02",
      "uptime_s": 1234,
      "globals_count": 17,
      "routines_count": 412,
      "mounted_repos": ["m-cli", "m-stdlib", "m-modern-corpus"]
    }
  • Tests in make smoke assert mte status --json produces valid JSON.

Deliverables in m-cli:

  • m engine status --verbose runs docker exec m-test-engine mte status --json and folds the output into its report.
  • m engine watch --interval 5s streams mte status --json lines for live monitoring (TAP-like format for CI; JSON-lines for tools).

Exit criteria:

  • m engine status --verbose on a healthy container shows mounted repos, routine count, uptime — answering "is the engine ready for my repo's tests?" not just "is it up?".

Duration: 2–3 days.


Phase 5 — Skill / MCP integration

Goal: extend the existing manifest-driven AI-discoverability stance to the engine namespace, so Claude Code and other agents bootstrap m-* projects without bespoke instructions.

Deliverables:

  • Auto-generated ~/claude/skills/m-engine/SKILL.md driven by dist/m-test-engine.json + the engine slice of m-cli's dist/commands.json (make skill-install target in this repo, mirroring m-stdlib's existing pattern).
  • dist/m-test-engine.json gains a verbs section declaring which m engine <verb> commands are safe for autonomous execution vs. require --confirm:
    "verbs": {
      "status":  { "destructive": false, "read_only": true },
      "start":   { "destructive": false, "read_only": false },
      "reset":   { "destructive": true,  "requires_confirm": true },
      ...
    }
  • Optional: m-cli MCP server registers the safe verbs as MCP tools so Claude Code can drive the engine natively without shelling out.

Exit criteria:

  • A fresh Claude Code session in any m-* repo auto-loads the m-engine skill and offers m engine install / start / status as actions.
  • The verb-safety classification gates destructive operations at the agent-harness layer, not at human-prose-warning layer.

Duration: 2–3 days, parallelisable with Phase 4.


3. Risks and mitigations

Risk Likelihood Impact Mitigation
Docker compose v2 schema deprecation breaks compose.yml mid-cycle Low Medium The run_args fallback in the manifest reconstructs docker run. Schema deprecations in v2 have been non-breaking warnings; we'd notice via make smoke long before users do.
/m-work migration friction for existing /work-mounted devs Medium Low m doctor detects legacy /work, prints a one-step mv / re-symlink hint. Document in docs/migration-to-m-work.md. Only relevant for current maintainers; new users land directly on /m-work.
Manifest drift between m-test-engine and m-cli's vendored copy Medium High m-cli's make check-manifest byte-compares against the pinned upstream tag; CI gate. Vendoring pin recorded in m-cli's dist/repo.meta.json dependencies block.
GHCR rate-limits or outages Low Medium Anonymous pulls are 1000/hr per IP — well above realistic dev usage. For CI, document GHCR token auth as an opt-in.
Protocol bump churn surprises users Low Medium Policy in §1.5 is conservative-by-default; expected cadence is 12–24 months. Every bump documented in CHANGELOG.md with a migration recipe.
m-cli grows into "yet another Docker orchestrator" Medium Medium Scope discipline: m engine shells out to docker / docker compose; it does not reimplement them. Anything beyond start/stop/exec/status belongs in compose, not in Python. The EngineDriver seam keeps the door open for non-Docker engines without bloating core.
mte introspection script leaks YDB internals or PII Low Low Output is structured JSON with a fixed allowlist (no $ZGBLDIR, no env dump). Phase 4 ships with a schema for mte status --json and tests pin the field set.
Bind-mount of host $HOME/m-work exposes too much filesystem to the container Low Low $HOME/m-work is a user-controlled directory containing only m-* repos. Mount mode is rw (consumers need to write build artifacts). Document the security model in README.

4. Benefits realised

Mapping back to the research doc's framing — what does the status-quo unblock once each phase lands?

Benefit Phase that delivers it
m doctor produces actionable, copy-pasteable fix recipes 1
AI agents bootstrap m-* projects from dist/commands.json alone 1 (manifest) + 2 (m engine in commands.json)
Version-mismatch detection between image and m-cli 3
Single shared engine across all m-* repos via $HOME/m-work (host) → /m-work (container) 1 (manifest) + 2 (start command)
m doctor --fix autonomous-execution safe 2 (typed fixes) + 5 (verb safety classes)
Continuous health monitoring (m engine watch) 4
Out-of-tree engines (IRIS, podman) without forking core 2 (m_cli_engines entry point)

5. Cross-repo coordination

Phase ordering reflects dependency between this repo and m-cli:

this repo (m-test-engine)        m-cli
─────────────────────────        ─────
Phase 1a: ship manifest    ───►  Phase 1b: vendor + rewrite doctor
                                  Phase 2:  m engine subcommand family
Phase 3a: labels + healthcheck ─► Phase 3b: status reads labels
Phase 4a: mte introspection  ───► Phase 4b: status --verbose
                                  Phase 5:  skill + MCP

Phase 1a (this repo) is the only blocker for Phase 1b (m-cli). After that, m-cli can iterate independently through Phase 2 without further changes here. Phases 3 and 4 require small coordinated bumps but neither breaks any earlier deliverable.


6. Out of scope

Explicitly not part of this plan:

  • IRIS engine support (the m_cli_engines entry-point seam admits it later, but no IRIS driver ships in core).
  • Podman as a Docker drop-in (same — seam exists, driver doesn't).
  • Multi-arch image (arm64) — added when arm64 consumers materialise.
  • SSH transport changes — SSHEngine remains the legacy maintainer path; not modified by this plan.
  • VistA-specific extras inside the container (no FileMan, no Kernel — m-test-engine's existing guardrail stands).
  • m-cli replacing docker / docker compose with a Python Docker SDK. Shell-out keeps the dependency surface minimal and the behaviour trivially auditable.

7. Success metric

A new contributor on a fresh laptop, after git clone m-cli, runs:

m doctor --fix
m test

…and sees a green test suite without reading any documentation, without manually pulling images, and without setting environment variables. That is the bar Phase 2 must clear. Every phase before contributes to it; every phase after polishes it for agents.