Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,5 @@ jobs:
- name: Test with coverage
run: uv run pytest --cov --cov-report=term-missing

- name: Validate integrated layer (spec §7.4 — 7 gates)
run: uv run python -m m_standard.tools.validate --root .
- name: Manifest drift gate (Phase 0 / Track C)
run: make check-manifest
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ __pycache__/
.env
!.env.example
*.log
dist/
dist/*
!dist/repo.meta.json
build/
*.egg-info/
src/*.egg-info/
Expand Down
166 changes: 166 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
# Machine-readable project descriptor — schema v1 (2026-05-05).
name: m-standard
kind: [data, library, reference]
status: active
languages: [python]

runtime:
needs:
- python>=3.10
- uv
optional: []
excludes: [] # GT.M docs deliberately out of scope (project rule)

distribution:
pypi: null
github: rafael5/m-standard

location: ~/projects/m-standard

exposes:
python_api: "src/m_standard/ — library + tools package"
cli_modules:
- "python -m m_standard.tools.crawl"
- "python -m m_standard.tools.extract"
- "python -m m_standard.tools.reconcile"
- "python -m m_standard.tools.emit"
- "python -m m_standard.tools.validate"
formats_produced:
- "per-source/<src>/*.tsv (per-source extracted)"
- "integrated/*.tsv (reconciled, citable)"
- "integrated/*.json (machine-emitted; consumed by tree-sitter-m + m-cli)"
- "schemas/*.json (validation schemas)"
- "docs/m-standards-guide.md (narrative)"
- "docs/adr/* (decision records)"

consumes:
formats: []
services: []
upstream_sources:
- "Annotated M Standard (AnnoStd)"
- "YottaDB documentation corpus"
- "InterSystems IRIS docs (v2.0+)"

companions:
- project: tree-sitter-m
relation: "downstream consumer — `m-standard/integrated/grammar-surface.json` drives tree-sitter-m's grammar generator"
- project: m-cli
relation: "downstream consumer — m-cli loads commands/ISVs/functions from m-standard's TSVs"
- project: m-tools
relation: "the M toolchain hub references m-standard as the spec layer"
- project: m-stdlib
relation: "m-stdlib obeys m-standard's reconciled language definitions"

incompatibilities:
- "GT.M permanently out of scope. Do not add GT.M sources."
- "No live network at pipeline run time. `crawl`/`clone` populates `sources/`; downstream stages read disk only."
- "Every integrated row needs `in_anno`/`in_ydb` provenance flags + at least one source ref."

docs:
primary: README.md
spec: docs/spec.md
user_guide: docs/m-standards-guide.md
adr: docs/adr/
---

# Claude project context — m-standard

## What this is
Reconciles the Annotated M Standard (AnnoStd) and the YottaDB
documentation corpus into a single citable, machine-readable reference
standard for the M (MUMPS) language. Outputs are TSV + JSON pairs under
`integrated/` plus a narrative under `docs/m-standards-guide.md`.

The full design and rationale are in `docs/spec.md`. ADRs in
`docs/adr/`.

## Where things live
- `src/m_standard/` — library + tools package. Anything importable.
- `src/m_standard/tools/` — pipeline stages (crawl, extract, reconcile,
emit, validate). Each is invokable via `python -m
m_standard.tools.<name>`.
- `tools/` — non-Python utilities (e.g. `clone-ydb.sh`).
- `sources/` — offline local replicas of the upstream sources. The
pipeline reads only from here, never the network at run time.
- `per-source/`, `integrated/`, `schemas/` — pipeline outputs (committed
artifacts).
- `tests/` — pytest, mirrors `src/m_standard/` structure.

## Pipeline (per spec §7)
```
sources/ ──extract──▶ per-source/<src>/*.tsv
└───reconcile──▶ integrated/*.tsv + conflicts.tsv
└────emit──▶ integrated/*.json
└────validate──▶ CI gates pass
```

## Hard rules
- **TDD.** Test first, confirm failure, then implement. Always.
- **No live network at pipeline run time.** Crawl/clone populates
`sources/`; everything downstream reads from disk.
- **Reproducibility.** Every source file has a sha256 in
`sources/<src>/manifest.tsv`. Every YDB-derived row carries the
pinned commit SHA.
- **Provenance.** Every integrated row has `in_anno`/`in_ydb` flags +
source section refs. No integrated row exists without at least one
source attesting it.
- **Determinism.** `reconcile.py` is byte-deterministic — same inputs,
same outputs.

## Toolchain
- Python ≥3.12, `uv`, ruff, mypy, pytest.
- `Makefile` uses `.venv/bin/` prefixes for every tool (parent direnv
hijacks bare names — see `docs/build-log.md` BL-001).

## Conventions
- No `print()` in library code — use `logging.getLogger(__name__)`.
- BeautifulSoup attr access: cast with `str()` (mypy strict).
- Click group options before subcommand if Click is added later.
- YAML frontmatter: quote any value containing a colon.

## Setup
```bash
make install # uv sync --extra dev + install pre-commit hooks
```

## Test
```bash
make test # .venv/bin/pytest
make cov # pytest with coverage report
make check # lint + mypy + cov (run before push)
```

## Build / generate
The committed payloads under `docs/integrated/` are produced by the
pipeline. To regenerate them from `sources/`:

```bash
make integrated # extract → reconcile → emit → validate (alias; no network fetch)
make all # adds `sources` (network fetch) at the head — full rebuild
```

The `dist/repo.meta.json` manifest is hand-authored; CI verifies it
hasn't drifted from the committed integrated payloads.

## Verify
These are the `verification_commands` declared in `dist/repo.meta.json`:

```bash
make integrated # regenerates docs/integrated/ deterministically
make test # full test suite
```

A green `make check-manifest` proves the committed integrated payloads
still match what the pipeline produces.

## Guardrails
- **Do not hand-edit `docs/integrated/*` files.** They are pipeline
outputs; edit `sources/` + the extract/reconcile/emit stages instead.
- **Do not hand-edit `dist/repo.meta.json` `verified_on` to a future
date.** The Phase 0 smoke test rejects manifests older than 90 days;
bump the date only when the manifest changes materially.
- **No GT.M sources.** Permanently out of scope.
- **No live network at pipeline run time.** `make sources` is the only
stage that touches the network; all downstream stages read disk.
- **Determinism.** Reconcile output must be byte-stable across runs.
120 changes: 0 additions & 120 deletions CLAUDE.md

This file was deleted.

1 change: 1 addition & 0 deletions CLAUDE.md
18 changes: 17 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
.PHONY: install hooks test test-lf watch lint format mypy cov check push pull \
sources sources-anno sources-ydb serve-anno extract reconcile validate all clean
sources sources-anno sources-ydb serve-anno extract reconcile validate all clean \
integrated check-manifest

PYTHON := .venv/bin/python
PYTEST := .venv/bin/pytest
Expand Down Expand Up @@ -101,3 +102,18 @@ all: sources extract reconcile emit validate
clean:
rm -rf .pytest_cache .mypy_cache .ruff_cache htmlcov .coverage
find . -type d -name __pycache__ -prune -exec rm -rf {} +

# ----- Phase 0 manifest drift gate (per .github/docs/phase0-plan.md §4 / C4)

# `integrated` is the named verification entry-point referenced by
# `dist/repo.meta.json.verification_commands`. It re-runs the same
# consistency check CI already enforces: every committed integrated
# payload still matches its schema and has source provenance.
integrated:
$(PYTHON) -m m_standard.tools.validate --root .

# `check-manifest` is the per-repo Phase 0 gate. It (a) reruns
# `integrated` and (b) verifies dist/repo.meta.json schema-validates and
# every `exposes.*` payload still exists on disk and (for .json) parses.
check-manifest: integrated
$(PYTHON) tools/check-repo-meta.py dist/repo.meta.json
22 changes: 22 additions & 0 deletions dist/repo.meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"$schema": "https://raw.githubusercontent.com/m-dev-tools/.github/main/profile/repo.meta.schema.json",
"id": "tool:m-standard",
"repo": "https://github.com/m-dev-tools/m-standard",
"role": "Machine-readable M language reference",
"language": ["python"],
"license": "AGPL-3.0",
"agent_instructions": "AGENTS.md",
"verified_on": "2026-05-10",
"exposes": {
"grammar_surface": "docs/integrated/grammar-surface.json",
"commands": "docs/integrated/commands.tsv",
"intrinsic_functions": "docs/integrated/intrinsic-functions.tsv",
"intrinsic_special_variables": "docs/integrated/intrinsic-special-variables.tsv",
"operators": "docs/integrated/operators.tsv",
"errors": "docs/integrated/errors.tsv",
"pragmatic_standard": "docs/integrated/pragmatic-m-standard.json",
"operational_standard": "docs/integrated/operational-m-standard.json",
"va_sac_rules": "docs/integrated/va-sac-rules.tsv"
},
"verification_commands": ["make integrated", "make test"]
}
Loading
Loading