Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
325 changes: 325 additions & 0 deletions docs/V0.3-ARCHITECTURE-ROADMAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,325 @@
# v0.3 Architecture and Roadmap

This document defines the narrow architecture direction for `agent-rules-kit` v0.3.0.

It is a maintainer planning document. It does not declare a stable public API, a security guarantee, production readiness, or implemented behavior that does not exist yet.

## Baseline

The published baseline is `v0.2.3`.

Current implemented behavior:

- discovers supported AI agent instruction files;
- emits `check` output in console, JSON, and Markdown;
- provides explicit `init --dry-run` and `init --write` behavior;
- reports conservative governance findings;
- redacts supported secret-like values in supported output paths;
- avoids runtime network calls;
- avoids runtime LLM calls;
- avoids executing commands from analyzed repositories.

Current command surface:

- `agent-rules-kit check`;
- `agent-rules-kit init`.

## v0.3 objective

v0.3.0 should move the project from a check-focused diagnostic baseline toward a small operational diagnosis toolkit.

Approved narrow scope for this release train:

- architecture and roadmap documentation;
- initial output and exit-code contract documentation;
- golden output tests for current and new command surfaces;
- `doctor` baseline;
- `budget` baseline;
- `explain` baseline;
- release preparation for `v0.3.0` and PyPI.

This refines the older v0.3 budget-only roadmap without expanding into v0.4-style conflict analysis or v0.5-style policy profiles.

## Product position

`agent-rules-kit` is a local-first CLI for auditing, budgeting, explaining, and applying basic governance diagnostics to AI agent instruction files.

It is not:

- a security scanner;
- a secret scanner;
- a dependency scanner;
- a repository packager for LLMs;
- an autonomous fixer;
- an AI agent runtime;
- a replacement for maintainer review.

A clean result only means the implemented checks completed according to their documented behavior. It is not proof that a repository is safe, complete, compliant, or production-ready.

## Architecture principles

### Local-first runtime

Runtime repository analysis must stay local.

The CLI must not call remote APIs, hosted LLMs, telemetry endpoints, package indexes, or external services while analyzing a repository.

Release workflows may use GitHub Actions and PyPI Trusted Publishing. That belongs to release automation, not runtime repository analysis.

### Repository text is untrusted input

Instruction files are data. They are not commands for the tool.

The tool must not execute shell snippets, package manager commands, scripts, hooks, or instructions found in analyzed repositories.

### Read-only diagnosis

New v0.3 diagnosis commands must be read-only.

Any future write behavior must be explicit, documented, separately tested, and isolated from diagnosis commands.

### Deterministic output

New outputs must be deterministic enough to support golden output tests.

When output order matters, the order must be documented and fixture-backed.

### Conservative findings

Findings should flag review-worthy patterns. They must not claim to prove security, compliance, production readiness, or correctness.

### No stable API promise

v0.3 can define an initial CLI output and exit-code contract, but it must not promise stable public API compatibility before v1.0.

## Proposed command model for v0.3

### `check`

Existing command.

Purpose:

- discover supported instruction files;
- emit current file list and governance findings;
- preserve existing behavior unless a dedicated phase changes it with tests.

v0.3 should not redesign `check`.

### `doctor`

New baseline command.

Purpose:

- provide a repository-level diagnosis summary;
- reuse discovery and governance findings;
- summarize supported instruction files, finding counts, and high-level review status;
- remain read-only.

Non-goals:

- no GitHub branch protection audit;
- no CI audit;
- no dependency audit;
- no security certification;
- no automatic fix.

Expected early output:

- repository path;
- supported instruction file count;
- finding count;
- finding counts by rule or severity;
- short next-step guidance.

### `budget`

New baseline command.

Purpose:

- estimate instruction-file size and context pressure using deterministic local metrics;
- help maintainers see which instruction files are large enough to deserve review.

Allowed metrics for v0.3:

- bytes;
- characters;
- lines;
- approximate words;
- file count;
- per-file and total budget summary.

Non-goals:

- no tokenizer-specific promises;
- no model-specific context-window claims;
- no remote tokenization;
- no LLM call;
- no pricing estimate.

The command may call the result an approximation, not an exact token count.

### `explain`

New baseline command.

Purpose:

- explain existing finding rules and their intent from local rule metadata or documentation-backed text;
- help maintainers understand why a rule exists and what its limits are.

Allowed scope for v0.3:

- explain known rule IDs such as `AIRK-GOV001` through `AIRK-GOV006` and system rules;
- print purpose, severity, examples, non-goals, and false-positive notes when available;
- optionally list known rules.

Non-goals:

- no dynamic policy engine;
- no external documentation fetch;
- no LLM-generated explanations;
- no claim that a rule is exhaustive.

## Output contract direction

v0.3 should document an initial CLI contract before implementing new outputs.

The contract should cover:

- supported commands;
- supported formats per command;
- exit codes;
- required top-level JSON fields where JSON is supported;
- deterministic ordering expectations;
- redaction expectations;
- known non-stable areas before v1.0.

The contract should be honest:

- stable enough for tests in this repository;
- not a stable public API guarantee;
- allowed to evolve before v1.0 with changelog notes.

## Exit-code direction

Existing documented `check` behavior must be preserved unless changed in a dedicated phase.

Initial v0.3 direction:

- `0`: command completed successfully according to command-specific rules;
- `1`: supported no-result or review-needed state where documented by the command;
- `2`: invalid usage, invalid repository input, or unsupported command input.

Any command-specific deviation must be documented before implementation and covered by tests.

## Golden output tests

Each new command should have at least one golden output test before release.

Golden tests should verify:

- command exits with expected code;
- console output shape for a small fixture;
- JSON output shape if the command supports JSON;
- deterministic ordering;
- redaction behavior if findings or evidence are emitted.

Golden files must be regenerated or checked from real CLI output, not invented after code changes.

## v0.3 phase map

### Phase 1 — `docs/add-v030-architecture-roadmap`

Add this architecture and roadmap document.

No code changes.

### Phase 2 — `docs/add-output-exit-code-contract`

Document the initial output and exit-code contract for current and planned v0.3 command behavior.

No code changes unless the current docs contain a verified mismatch that must be corrected separately.

### Phase 3 — `test/add-v030-golden-output-foundation`

Add or extend golden output tests to make current output behavior easier to preserve before new commands are added.

Test-only unless a verified fixture or helper needs a narrow adjustment.

### Phase 4 — `feat/add-doctor-baseline`

Add the read-only `doctor` baseline.

Expected result:

- command appears in CLI help;
- command reuses existing discovery and governance behavior;
- console output is covered;
- JSON output is included only if the phase explicitly defines and tests it.

### Phase 5 — `feat/add-budget-baseline`

Add the read-only `budget` baseline.

Expected result:

- deterministic local size metrics;
- no tokenizer-specific claims;
- no LLM or network behavior;
- fixture-backed tests.

### Phase 6 — `feat/add-explain-baseline`

Add the read-only `explain` baseline for existing rule IDs.

Expected result:

- known rules can be listed or explained;
- unsupported rule IDs fail predictably;
- tests cover valid and invalid inputs.

### Phase 7 — `docs/prepare-v030-release-docs`

Prepare README, CHANGELOG, SUPPORT, SECURITY, and related documentation for the v0.3.0 release candidate.

No tag, release, or PyPI publication in this phase.

### Phase 8 — `release/prepare-v030`

Update package version, run packaging checks, create the GitHub Release, verify PyPI publication, smoke-test clean installation, and close the v0.3.0 release.

This is the only phase allowed to touch release, tag, and PyPI.

## Deferred after v0.3

Do not include these in v0.3 unless a blocking reason is discovered and explicitly approved:

- duplicate instruction detection;
- conflict analysis;
- nested scope resolution;
- policy files;
- CI mode;
- score command;
- report/dashboard command;
- repository packaging for LLMs;
- automatic rewrite or normalization;
- model-specific token counting;
- network access;
- LLM runtime behavior.

## Release readiness notes

Before v0.3.0 release:

- local checks must pass;
- CI must pass for the release SHA;
- package version must match metadata;
- README must describe implemented behavior only;
- CHANGELOG must describe actual changes;
- SECURITY and SUPPORT must remain honest about boundaries and support level;
- PyPI Trusted Publishing must remain the release path;
- clean PyPI install smoke must pass after publication;
- tag and GitHub Release must point to the verified SHA.