diff --git a/.agents/skills/autonomous-improve/kit.md b/.agents/skills/autonomous-improve/kit.md deleted file mode 100644 index 605ecf6..0000000 --- a/.agents/skills/autonomous-improve/kit.md +++ /dev/null @@ -1,321 +0,0 @@ ---- -schema: kit/1.0 -owner: citadel -slug: autonomous-improve -title: 'Autonomous Quality Loop — Score, Attack, Verify, Repeat' -summary: >- - Scores against a rubric, attacks the highest-leverage axis, verifies no - regressions, documents the learning, and loops. Re-scores from scratch each - iteration. -version: 1.0.0 -license: MIT -tags: - - quality - - autonomous - - improvement - - rubric - - iteration - - agents - - multi-agent -model: - provider: anthropic - name: claude-opus-4-6 - hosting: Anthropic API (api.anthropic.com) -tools: - - Read - - Write - - Edit - - Bash - - Glob - - Grep - - Agent -failures: - - problem: No rubric exists for the target. - resolution: >- - Phase 0 drafts one and halts for your approval. The rubric defines what - "better" means — it cannot be auto-approved. - scope: general - - problem: Fix attempt causes a regression in a previously-passing axis. - resolution: >- - Changes are reverted automatically. The loop logs the failed approach, - penalizes the axis in next-loop selection, and continues without - committing. - scope: general - - problem: No improvement in two consecutive loops (plateau). - resolution: >- - Level-Up Protocol triggers — halts, presents re-anchoring proposals for - the rubric at a higher quality tier. Resumes only after your approval. - scope: general - - problem: Security axis fails a programmatic check. - resolution: >- - Treated as a blocking issue — halts the entire loop. Security is the - floor, not one axis among equals. - scope: general -useCases: - - scenario: >- - Systematic quality improvement of a product, module, or codebase over - multiple sessions. - constraints: - - >- - Requires a rubric — Phase 0 creates one with your approval if none - exists - notFor: - - Quick bug fixes (use systematic-debugging) - - One-off code reviews (use code-review-5pass) - - scenario: >- - After shipping a fast MVP — identify and fix the highest-leverage rough - edges first. - constraints: [] - notFor: [] - - scenario: >- - Continuous improvement running overnight — pairs with the Daemon kit for - unattended operation. - constraints: - - Budget cap recommended for overnight runs - notFor: [] -inputs: - - name: target - description: >- - Slug identifying what to improve. Maps to .planning/rubrics/{target}.md. - If none exists, Phase 0 runs first. - - name: loops - description: >- - Number of improvement loops to run. If omitted, runs until all axes score - ≥8.0 or plateau is detected. - - name: score_only - description: >- - Run scoring only — produce the rubric scorecard without attacking any - axis. Use to get a baseline first. -outputs: - - name: scorecard - description: >- - Full rubric scorecard: scores from three evaluators, final (minimum), and - deltas from previous loop. - - name: loop_log - description: >- - .planning/improvement-logs/{target}/loop-{n}.md — what was attacked, - verification results, what changed, what was learned. - - name: handoff - description: >- - Summary: score movement per axis, outcome (improved/plateau/ceiling), next - recommended axis. ---- - -# Autonomous Quality Loop — Score, Attack, Verify, Repeat - -## Goal - -Quality doesn't improve through sprints. It improves through iteration — scoring -what exists, picking the worst thing, fixing it, and doing it again. The problem -is that iteration requires sustained attention. Sessions end. Context resets. -The loop stalls. - -This kit automates the iteration. It runs a tight cycle: score everything against -a rubric, select the single highest-leverage axis, attack it with a real fix, -verify the fix worked and nothing regressed, document what was learned, then loop. -Each iteration re-scores from scratch — because what you fixed this loop changes -the priority order for the next. - -The rubric is the key design choice. It's not a checklist. Each axis has three -anchors (what a 0 looks like, what a 5 looks like, what a 10 looks like), a weight, -and programmatic verification specs where possible. This makes scores arguable, -not vibes-based. Three independent evaluators score each axis and the minimum -score wins — because a low score from any evaluator is an unresolved problem, and -averaging would hide it. - -The loop stops when you tell it to, when a quality ceiling is reached (all axes >= 8.0), -or when the rubric has been extracted to its ceiling and needs to be re-anchored at -a higher standard. - -## When to Use - -- A product, module, or codebase that needs systematic quality improvement over time -- After shipping a fast MVP that you know has rough edges but aren't sure which to - address first -- When you want to improve a specific dimension (documentation, test coverage, - security posture) but aren't sure where the gaps actually are -- As a continuous improvement process across multiple sessions or overnight runs - (pairs with the Daemon kit for unattended operation) - -**Not for:** quick bug fixes (use Systematic Debugging), one-off code reviews -(use the Review kit), or simple refactors. The loop has overhead. It earns its -cost on targets where the improvement space is wide and the priority isn't obvious. - -**Good first run:** use `--score-only` to get your baseline before committing to -improvement loops. You'll see where you actually stand, which makes the case for -how many loops are warranted. - -## Setup - -**Create a rubric before running your first loop.** - -The rubric is a markdown file at `.planning/rubrics/{target}.md`. If it doesn't exist, -the kit's Phase 0 drafts one and asks for your approval. - -A good rubric has: -- 8-14 axes organized into 3-5 categories (don't go narrow, don't go sprawling) -- Each axis with a weight (0.0-1.0), a category, and three anchors (0/5/10) -- Programmatic verification specs for at least half the axes (things that can be - checked with a script, not just judged by eye) - -**Why approval is required:** the rubric defines what "better" means. If the axes are -wrong, the loop optimizes the wrong things. Phase 0 drafts; you decide. - -**Directory structure:** the kit uses `.planning/` for state. Create it at your project root -if it doesn't exist, or let Phase 0 create it on first run. - -## Steps - -### Phase 0: Rubric Bootstrap (first run only, requires your approval) - -Runs only when `.planning/rubrics/{target}.md` doesn't exist. - -1. If research on comparable products exists in `.planning/research/`, reads it -2. If not, runs a multi-scout research pass to understand the quality landscape -3. Drafts 8-14 axes with weights, categories, anchors, and verification specs -4. Presents the draft with rationale for each axis -5. **Stops. Waits for your approval.** You edit the rubric, confirm it, and the loop begins. - -This is the only mandatory human gate. Everything after this is autonomous. - ---- - -### Phase 1: Score (every loop, no exceptions, no caching) - -Three independent evaluator agents score every axis in the rubric simultaneously. -Each evaluator receives: -- The full rubric with axis definitions and anchors -- Read access to the target -- Their assigned persona (Experienced User, Newcomer, or Critical Reviewer) -- Instructions to score independently — they don't see each other's scores - -**For each axis, the final score is the minimum of the three evaluators.** Not the average. -A score of 6, 8, 7 produces a final score of 6. A low score from any evaluator -means there's a real unresolved problem somewhere. The minimum surfaces it. - -Programmatic checks run in parallel with evaluator scoring. A failed programmatic -check caps the axis at 5, regardless of evaluator scores. - -The scorecard: -``` -Axis | A | B | C | Prog | Final | Delta -------------------------|----|----|----|----- |-------|------ -security_posture | 7 | 8 | 6 | PASS | 6.0 | -test_coverage | 4 | 3 | 5 | FAIL | 3.0 | cap -documentation_accuracy | 6 | 6 | 7 | PASS | 6.0 | -``` - ---- - -### Phase 2: Select - -Choose the single axis to attack this loop using a selection formula: - -``` -score(axis) = (10 - current_score) × weight × effort_multiplier × recency_penalty -``` - -- **effort_multiplier**: low-effort changes = 1.0, medium = 0.7, high-effort = 0.4 - (lower-effort improvements get priority — more ground covered per loop) -- **recency_penalty**: 0.5 if this axis was attacked in either of the last two loops - (prevents fixation on one axis while others deteriorate) - -The selection and its rationale are shown before any attack begins. You can override -with `--axis {name}` to force a specific axis regardless of scoring. - ---- - -### Phase 3: Attack - -Execute the improvement. Strategy depends on the axis category: - -- **Technical axes** (test coverage, reliability, API consistency): run targeted - experiments, measure before/after, commit only if improvement is verified -- **Documentation axes**: read current docs, identify specific gaps or inaccuracies, - rewrite the specific sections — not wholesale rewrites unless the score is below 3 -- **Experience axes** (onboarding, error recovery, discoverability): structural fixes - plus documentation updates plus verification of the actual user path -- **Security axes**: read the specific code involved, make targeted changes, run - programmatic checks before committing - -Each approach is tried in isolation before committing. If multiple approaches are -evaluated, the decision record is written to the loop log — future loops can read -why the winner won. - ---- - -### Phase 4: Verify - -After the attack, re-score only the targeted axis (not a full re-score — that's Phase 1 -of the next loop): - -1. **Programmatic**: run the axis's specific checks — do they now pass? -2. **Structural**: verify structural requirements are met -3. **Perceptual**: spawn a single Newcomer evaluator (the hardest-to-satisfy persona) - to score the targeted axis independently -4. **Behavioral simulation** (for onboarding/UX axes): follow the actual user path - in a clean environment with no prior knowledge — does it work? - -**Regression check**: re-run programmatic checks on every axis that shares files with -the changes. If anything that was passing now fails: **revert the changes**. Don't commit -improvements that break something else. - -A behavioral FAIL overrides a passing perceptual score. If the user path is broken, -the loop doesn't commit regardless of what the evaluator said. - ---- - -### Phase 5: Document - -Every loop gets a log at `.planning/improvement-logs/{target}/loop-{n}.md`. -Even loops that end in no-change or revert. Especially those. - -The log records: -- The full scorecard with deltas from the previous loop -- What was changed and why that approach was chosen -- Verification results across all four tiers -- What was learned that future loops should know - -This institutional memory is what makes later loops smarter than earlier ones. - ---- - -### Phase 6: Loop or Exit - -The loop exits when: -1. A loop count (`--loops N`) is specified and N loops have completed -2. All axes score >= 8.0 (quality ceiling reached) -3. No axis improved > 0.5 in either of the last two loops and at least 3 loops - have completed (plateau — Level-Up Protocol is triggered) -4. You say stop - -**On plateau (Level-Up Protocol):** the current rubric has been extracted to its ceiling. -The kit freezes the current state, proposes re-anchored axes where the current 10 becomes -the new 5, and waits for your approval before continuing. This is how the quality bar -gets raised, not just approached. - ---- - -## Constraints - -- **Phase 0 requires your approval.** The rubric is the only thing that requires human - judgment. Once it's set, the loop runs autonomously. -- **The loop never writes to the live rubric.** Proposed axis additions or re-anchorings - go to `.planning/rubrics/{target}-proposals.md`. You approve before they take effect. -- **Level-Up Protocol requires your approval.** The loop halts and waits. It doesn't - self-upgrade the rubric. -- **Regressions block commits.** If the fix breaks something that was passing, the change - is reverted. The loop continues without a commit for that iteration. -- **Security axis failures are blocking.** A failed security programmatic check halts - the entire loop — it doesn't get treated as "just a low score." - -## Safety Notes - -- Use `--score-only` first to see your baseline before spending tokens on attack loops. - Knowing your scores in advance helps you calibrate how many loops are warranted. -- Each loop commits its changes separately. If a loop produces an unwanted result, - `git revert ` undoes that loop's changes without touching others. -- For overnight / unattended runs, pair this kit with the Daemon kit, which handles - session continuity and budget enforcement automatically. -- Set a loop count (`--loops 3`) for your first run rather than running until ceiling. - It's easier to add loops than to undo them. diff --git a/.agents/skills/code-review-5pass/kit.md b/.agents/skills/code-review-5pass/kit.md deleted file mode 100644 index 107654c..0000000 --- a/.agents/skills/code-review-5pass/kit.md +++ /dev/null @@ -1,329 +0,0 @@ ---- -schema: kit/1.0 -owner: citadel -slug: code-review-5pass -title: 5-Pass Code Review — What Linters Miss -summary: >- - 5 passes: correctness, security, performance, readability, consistency. Every - finding: file, line, severity, fix. Ends with PASS/CONDITIONAL/FAIL verdict. -version: 1.0.0 -license: MIT -tags: - - code-review - - security - - quality - - correctness - - performance - - developer-tools -model: - provider: anthropic - name: claude-opus-4-6 - hosting: Anthropic API (api.anthropic.com) -tools: - - Read - - Glob - - Grep - - Bash -failures: - - problem: 'No project conventions file found (CLAUDE.md, .eslintrc, etc.).' - resolution: >- - Skips convention-specific findings in Pass 5 but still flags internal - inconsistency within the reviewed code. - scope: general - - problem: 'Review target contains binary files, images, or lock files.' - resolution: Silently skipped. Only source files are reviewed. - scope: general - - problem: Diff range references commits that do not exist. - resolution: Reports the git error. Verify ref names and retry with a valid range. - scope: general -useCases: - - scenario: >- - Pre-merge review of a pull request — use --diff main..feature to focus on - changed lines. - constraints: [] - notFor: - - 'Generated files, lock files, configuration with no logic' - - scenario: >- - Security audit preparation — find obvious vulnerabilities before the - formal audit. - constraints: [] - notFor: [] - - scenario: Taking over code from another developer or after a fast-moving sprint. - constraints: [] - notFor: [] -inputs: - - name: review_target - description: >- - What to review: a file path, directory, git diff range (--diff HEAD~3 or - --diff main..feature), or nothing (defaults to staged + unstaged changes). -outputs: - - name: review_report - description: >- - Findings grouped by pass, sorted by severity. Each finding: file, line, - severity, problematic code, and specific fix. - - name: verdict - description: >- - PASS | CONDITIONAL | FAIL — with one-line rationale and finding counts by - severity. -selfContained: true ---- - -# 5-Pass Code Review — What Linters Miss - -## Goal - -Linters catch style. Formatters catch formatting. Type checkers catch type errors. -None of them catch the things that actually break production. - -This kit fills that gap. It runs five structured passes over your code to find -problems that require genuine comprehension to detect: a race condition between -two async operations, an unparameterized query that looks like string formatting, -an O(n²) loop that only matters when your dataset grows, a comment that hasn't -matched the code since the refactor six months ago. - -The output is a structured report with every finding numbered, located to the exact -line, and accompanied by a specific fix — not "consider improving this" but "change -line 47 from X to Y because Z." - -A verdict at the end (PASS / CONDITIONAL / FAIL) gives you the bottom line in one -word, with finding counts if you need to justify a review decision. - -## When to Use - -- Before merging any non-trivial pull request -- When taking over code written by someone else (or your past self) -- After a fast-moving sprint where correctness was deprioritized for speed -- Before a security audit — catch the obvious issues yourself first -- When a system is behaving unexpectedly and you suspect the bug is subtle - -**Skip this kit for:** generated files, lock files, configuration with no logic, -and code so simple the 5-pass structure would produce empty passes. - -**Use `/review --diff main..feature`** for PR reviews — it focuses on changed lines -and their context, distinguishing new-code findings from pre-existing issues. - -## Setup - -No setup required. The kit reads from your codebase directly. - -**Recommended:** have your project's config files at the repo root before running: -- TypeScript: `tsconfig.json` -- ESLint: `.eslintrc.*` or `eslint.config.*` -- Prettier: `.prettierrc.*` -- Python: `pyproject.toml` -- Rust: `Cargo.toml` - -These are loaded in Step 2 to establish what "correct" and "consistent" means for -your project specifically. Without them, the review still runs — it just uses -general best practices instead of your project's conventions. - -## Steps - -### Step 1: Resolve Scope - -Determine what's being reviewed based on the input: - -- **File path**: read that file -- **Directory**: glob for all source files recursively, skip generated files, - `node_modules`, lock files, and build artifacts -- **Diff range**: run `git diff ` to get changed files and hunks, then read - the full file for each changed file (context beyond the hunks matters) -- **No argument**: run `git diff HEAD` for staged + unstaged changes - -Read all files in scope before starting any pass. Do not re-read during individual passes. - ---- - -### Step 2: Load Project Conventions - -Read the project's config files to understand its standards before reviewing against them: -- What import style does it use? (Path aliases? Relative paths? Named vs. default exports?) -- What error handling pattern does it use? (throw vs. Result types vs. callbacks) -- What naming conventions does it follow? -- What test patterns exist? - -These become the baseline for Pass 5 (Consistency). Flag deviations from them, not from -generic "best practices" that may not apply to this project. - ---- - -### Pass 1: Correctness - -Scan every file for: - -- **Logic errors**: inverted conditions, wrong operators, incorrect boolean logic -- **Off-by-one errors**: in loops, slices, index access, range calculations -- **Null/undefined dereference**: accessing properties without guards when the value - could be null or undefined -- **Missing awaits**: async calls whose results are used without awaiting them -- **Race conditions**: shared mutable state accessed from concurrent async operations - without synchronization -- **Type coercion**: loose equality (`==`), implicit type conversions, surprising - JavaScript truthiness/falsiness -- **Resource leaks**: connections, handles, event listeners, or subscriptions opened - but never closed -- **Missing cleanup**: effects or lifecycle methods that set up state but don't - tear it down -- **Edge cases**: what happens with an empty array? A zero value? A negative number? - An extremely large input? -- **State mutations**: direct mutation of state that's supposed to be immutable - ---- - -### Pass 2: Security - -Scan for OWASP Top 10 and common vulnerabilities: - -- **Injection**: SQL, NoSQL, command, template, LDAP — any user input reaching a - query or command without parameterization -- **XSS**: user input rendered as HTML without sanitization, `dangerouslySetInnerHTML`, - `innerHTML`, unescaped template interpolation -- **Broken auth**: missing authentication checks on endpoints, broken access control, - privilege escalation paths, JWT validation gaps -- **Hardcoded secrets**: API keys, tokens, passwords, or connection strings in source - code instead of environment variables -- **Unsafe deserialization**: `eval()`, `Function()`, `JSON.parse` on untrusted input - without schema validation, `pickle.loads`, `yaml.load` without SafeLoader -- **SSRF**: user-controlled URLs passed to fetch/request without allowlist validation -- **Path traversal**: user input in file paths without sanitization -- **Insecure crypto**: MD5/SHA1 for passwords, ECB mode, hardcoded IVs, `Math.random()` - for security-sensitive values - ---- - -### Pass 3: Performance - -Scan for measurable performance problems — not micro-optimizations, but issues that -cause degradation at scale: - -- **Algorithmic**: O(n²) or worse in paths that scale with data size -- **Allocation in hot paths**: creating objects or arrays inside render functions, - animation loops, or frequently-called functions when they could be hoisted -- **Missing memoization**: expensive derivations recomputed on every call when inputs - haven't changed -- **N+1 queries**: database or API calls inside loops instead of batched operations -- **Unnecessary bundle weight**: importing entire libraries for one function -- **Render performance** (frontend): new object/array references causing unnecessary - re-renders, missing memoization on expensive derived values, inline function props - recreated on every render in high-frequency components -- **I/O in hot paths**: synchronous reads, blocking operations, or layout-forcing DOM - calls inside animation loops -- **Unbounded queries**: queries or list renders with no pagination or limits - ---- - -### Pass 4: Readability - -Scan for code that will cost the next developer (or you, in three months) extra time -to understand: - -- **Vague naming**: `data`, `info`, `result`, `handle`, `process` — names that describe - shape but not meaning -- **Misleading naming**: `isValid` returning a string, `getUser` having side effects -- **Function length**: functions over 50 lines doing multiple distinct things -- **Cognitive complexity**: three or more levels of nesting, complex boolean expressions - that could be extracted to named variables -- **Dead code**: unreachable branches, commented-out blocks, unused variables and imports -- **Stale comments**: comments that describe what the code did before a refactor -- **Magic values**: hardcoded numbers or strings without named constants -- **Mixed abstraction levels**: high-level orchestration and low-level implementation - interleaved in the same function - ---- - -### Pass 5: Consistency - -Scan against the project conventions loaded in Step 2: - -- Does the code follow the project's import ordering and alias usage? -- Does it use the project's error handling pattern? -- Does it follow the project's file organization conventions? -- Do new functions follow the existing signature and return type patterns? -- Do new identifiers follow the project's naming conventions? - -Also flag *internal inconsistency* — if some functions in a module throw and others -return null for errors, that's a finding even without an established convention. - ---- - -### Format Every Finding - -Each finding must include: -- **File**: the path -- **Line**: exact line number or range -- **Severity**: `CRITICAL`, `WARNING`, or `INFO` -- **Finding**: one sentence describing the problem -- **Code**: the specific problematic lines -- **Fix**: what to do about it — specific, not vague - -**Severity calibration:** -- `CRITICAL`: will cause bugs, security vulnerabilities, data loss, or crashes -- `WARNING`: causes problems under specific conditions, degrades performance, - or creates meaningful maintenance burden -- `INFO`: style improvement, minor clarity gain, preventive suggestion - ---- - -### Produce the Verdict - -``` -## Code Review: {target} - -Scope: {N files, M total lines} | Mode: {file | directory | diff} - ---- - -### Pass 1: Correctness -{findings or "No findings."} - -### Pass 2: Security -{findings or "No findings."} - -### Pass 3: Performance -{findings or "No findings."} - -### Pass 4: Readability -{findings or "No findings."} - -### Pass 5: Consistency -{findings or "No findings."} - ---- - -## Verdict: {PASS | CONDITIONAL | FAIL} -{one-line rationale} - -| Severity | Count | -|---|---| -| Critical | N | -| Warning | N | -| Info | N | -``` - -For diff reviews: distinguish findings in new/changed code from pre-existing issues -surfaced by context. Prioritize the new-code findings. - ---- - -## Constraints - -- **Don't duplicate linter output.** Don't flag things the project's linter or formatter - catches automatically (missing semicolons, indentation, unused vars that ESLint flags). - Focus on semantic issues that require comprehension to detect. -- **No false positives from skimming.** For every finding, verify you read the surrounding - code. A "missing null check" that's actually guarded by the caller is not a finding. - An "unused import" that's used in a type annotation is not a finding. -- **The review is the deliverable.** Don't offer to fix findings unless asked. The review - surfaces issues; the developer decides which ones to address and how. -- **Line numbers must be accurate.** A finding pointing to the wrong line is worse than - no finding. Verify each cited line against the actual file content. - -## Safety Notes - -- Pass 2 (Security) findings rated CRITICAL should be treated as blocking for production - deployments. Don't ship CRITICAL security findings. -- When reviewing authentication or authorization logic, read the entire auth flow, not - just the changed file — a bypass may be in an upstream guard, not the file being reviewed. -- If the review surfaces hardcoded credentials, do not include the actual credential value - in the report — just the location and the finding. - diff --git a/.agents/skills/codebase-map/kit.md b/.agents/skills/codebase-map/kit.md deleted file mode 100644 index 357ace2..0000000 --- a/.agents/skills/codebase-map/kit.md +++ /dev/null @@ -1,304 +0,0 @@ ---- -schema: kit/1.0 -owner: citadel -slug: codebase-map -title: Codebase Map — Structural Intelligence Index -summary: >- - Structural codebase index: files, exports, imports, dependency graph, roles. - Keyword search. Inject compact slices into agents to cut token usage 60-80%. -version: 1.0.0 -license: MIT -tags: - - codebase-intelligence - - context-optimization - - navigation - - agent-workflows - - developer-tools -model: - provider: anthropic - name: claude-sonnet-4-6 - hosting: Anthropic API (api.anthropic.com) -tools: - - Glob - - Read - - Bash - - Write -tech: - - typescript - - javascript - - python - - go - - rust -failures: - - problem: No source files found in target directory. - resolution: >- - Generates empty index (fileCount: 0). Not an error — verify the root path - points to the correct project directory. - scope: general - - problem: Index not found when running a query. - resolution: >- - Run the generation step first (without query terms), then re-run with - query terms. - scope: general - - problem: 'Very large repo (10,000+ files) causing performance concerns.' - resolution: >- - Walker is iterative, not recursive — no stack overflow risk. First run may - take 5-10 seconds; subsequent runs within the 5-minute cache window return - instantly. - scope: general -useCases: - - scenario: >- - Starting work on an unfamiliar codebase — build the index first to enable - fast, targeted navigation. - constraints: [] - notFor: - - Reading file contents (use Read) - - Searching string patterns inside files (use Grep/ripgrep) - - scenario: >- - Preparing context slices to inject into AI agents before a multi-file - task. - constraints: - - 15-file slice is typically 800-1200 tokens - notFor: [] - - scenario: >- - Getting a structural overview: file count, language breakdown, role - distribution, dependency topology. - constraints: [] - notFor: [] -inputs: - - name: project_root - description: >- - Root directory to index. Defaults to current working directory. Respects - .gitignore, skips node_modules, dist, build artifacts. - - name: query_terms - description: >- - Optional keywords to search after generation (e.g., "auth session token"). - If omitted, prints summary statistics. -outputs: - - name: index_file - description: >- - .planning/map/index.json — full structural index with paths, roles, - exports, imports, and dependency edges. - - name: stats_summary - description: >- - File count, line count, export count, dependency edge count, breakdown by - language and role. - - name: query_results - description: >- - If query terms provided: ranked list of matching files with relevance - scores and top exports. Capped at 20 results. - - name: context_slice - description: >- - Injection-ready block for agent prompts. 800-1200 tokens for 15 files — - replaces 2000-5000 tokens of exploratory file discovery. -prerequisites: - - name: Node.js 18+ for the map-index.js generator script. -selfContained: true ---- - -# Codebase Map — Structural Intelligence Index - -## Goal - -When an AI agent starts work on an unfamiliar codebase, it spends its first -few thousand tokens doing reconnaissance: globbing for files, reading directory -structures, searching for class definitions. This isn't wasted work — but it -*is* redundant work. The same questions get asked again and again across sessions, -across agents, across conversations. - -This kit solves that problem by building a persistent structural index on first run -and answering navigation questions from the index on every subsequent run. - -The output is a queryable map: every file in the project, its inferred role -(component, hook, store, route, service, test, config, etc.), its exports, its -imports, and its dependency edges. Query it with keywords, get ranked results -back in seconds without touching the filesystem. - -The secondary use case is agent context injection. A 15-file context slice from -the map is typically 800-1200 tokens. Without the map, an agent finding those -same 15 files through Glob and Read calls consumes 2000-5000 tokens and takes -multiple round-trips. For multi-agent workflows, that difference compounds fast. - -## When to Use - -**Generate the index when:** -- Starting work on any codebase you haven't mapped yet -- The codebase has changed significantly since the last map -- Onboarding a new AI agent or starting a new campaign - -**Query the index when:** -- Looking for files related to a feature, module, or concept -- Asking "where is the auth logic?" or "what files handle payments?" -- Preparing context to inject into a sub-agent prompt - -**Use the context slice format when:** -- Spawning agents that need to know "which files are relevant to X" -- Starting a fleet campaign where each agent covers a different domain -- Any workflow where reducing agent token usage matters - -## Setup - -**Prerequisites:** -- Node.js 18+ (for the index generator script) -- A project with TypeScript, JavaScript, Python, Go, or Rust source files - -**One-time initialization:** - -The `map-index.js` script lives in the Citadel harness at `scripts/map-index.js`. -If you're running this as a standalone kit, copy the script to your project's -`scripts/` directory, or run it from the Citadel harness pointed at your project root: - -```bash -node scripts/map-index.js --generate --root /path/to/your/project -``` - -The index is written to `.planning/map/index.json` inside the target project. -If `.planning/map/` doesn't exist, the script creates it. - -## Steps - -### Step 1: Generate the Index - -Run the index generator. For a first-time run or after significant codebase changes: - -```bash -node scripts/map-index.js --generate --root . -``` - -Add `--force` to bypass the 5-minute cache and rebuild from scratch: - -```bash -node scripts/map-index.js --generate --root . --force -``` - -**What happens during generation:** -1. The walker traverses the project tree, respecting `.gitignore` -2. Each source file is parsed for exports, imports, and top-level symbols -3. A role is inferred for each file based on naming patterns, directory placement, - and export shapes (e.g., a file exporting a function starting with `use` is a hook) -4. Import paths are resolved to build a dependency graph -5. The full index is written to `.planning/map/index.json` - -**Expected output:** -``` -Map index generated: 847 files, 2,341 dependency links -Languages: TypeScript (612), JavaScript (183), CSS (52) -Roles: component (201), hook (87), store (23), route (44), service (31), - test (156), config (89), utility (114), other (102) -Index written to .planning/map/index.json -``` - ---- - -### Step 2: Print Summary Statistics - -To get a structural overview of the codebase without querying anything specific: - -```bash -node scripts/map-index.js --stats -``` - -Outputs file count, line count, export count, dependency edge count, and -full breakdowns by language and by role. Useful for understanding the shape -of an unfamiliar project before diving in. - ---- - -### Step 3: Query the Index - -Search for files related to a concept, feature, or keyword: - -```bash -node scripts/map-index.js --query "auth session token" -node scripts/map-index.js --query "payment stripe checkout" -node scripts/map-index.js --query "user profile avatar upload" -``` - -**Scoring model:** -- Path match: +3 per matching term -- Export/symbol match: +5 per matching term -- Role match: +1 per matching term - -Results are sorted by score, capped at 20 files, and formatted for readability: - -``` -Results for "auth session token" (12 matches): - - Score Role Path - ------------------------------------------------------- - 18 service src/auth/session.service.ts - 15 hook src/auth/useSession.ts - 12 store src/stores/auth.store.ts - 10 route src/api/routes/auth.ts - 8 utility src/auth/token-utils.ts - ... -``` - ---- - -### Step 4: Generate a Context Slice (for Agent Injection) - -When you need to inject codebase context into an AI agent prompt, use the slice -format. It's compact by design — built to be injected, not displayed. - -```bash -node scripts/map-index.js --query "auth session" --max-files 15 -``` - -Output format: -``` -=== MAP SLICE: auth session === -18 service src/auth/session.service.ts [SessionService, createSession, invalidateSession] (187L) -15 hook src/auth/useSession.ts [useSession, useCurrentUser] (94L) -12 store src/stores/auth.store.ts [authStore, useAuthStore] (203L) -... -=== END MAP SLICE === -``` - -**Inject this block** at the start of any agent prompt that needs to find auth-related -files. The agent reads the slice instead of running its own file discovery — saving -tokens and time. - -**Token budget:** A 15-file slice is typically 800-1200 tokens. Equivalent exploratory -Glob + Read calls would consume 2000-5000 tokens across multiple round-trips. - ---- - -### Step 5: Refresh the Index - -The index is cached for 5 minutes. For active development sessions, this means -you only pay the generation cost once per session. After significant codebase -changes (a large refactor, adding a new module), force a rebuild: - -```bash -node scripts/map-index.js --generate --force -``` - -For CI or pre-session automation, add this to your session startup script. -The generation time on a 10K-file repo is typically 3-5 seconds. - ---- - -## Constraints - -- **Supported languages only:** TypeScript, JavaScript, Python, Go, Rust. Other - file types are skipped silently. The index will still work for mixed-language - repos — it just maps what it can parse. -- **Not a content search.** The map indexes structure (exports, imports, roles), - not file contents. For searching code patterns, use grep/ripgrep. The map tells - you *which files* to search; the search tool tells you *what's in them*. -- **Not a replacement for reading files.** A context slice tells an agent which - files are relevant. The agent still needs to read those files to understand - the actual code. -- **Cache TTL is 5 minutes.** Queries against a stale index may miss files added - since the last generation. Use `--force` if freshness is critical. - -## Safety Notes - -- The index contains file paths and exported symbol names — no file contents. It's - safe to share in agent prompts even if the codebase contains sensitive code. -- The `.planning/map/` directory should be added to `.gitignore`. The index is - generated from the codebase and doesn't need to be committed. -- On very large repos (50K+ files), first-run generation may take 30-60 seconds. - The cache makes subsequent runs fast. If generation time is a concern, scope - the `--root` to a subdirectory. diff --git a/.agents/skills/copilot-workflow/SKILL.md b/.agents/skills/copilot-workflow/SKILL.md deleted file mode 100644 index e7482c3..0000000 --- a/.agents/skills/copilot-workflow/SKILL.md +++ /dev/null @@ -1,38 +0,0 @@ ---- -name: copilot-workflow -description: Use when working on the `/copilot` experience in this repo, including `src/components/agent`, `src/app/copilot`, or `src/app/api/agent`. Do not use for market-only pages or provider/storage work unrelated to agent streaming. Expected output: code changes plus focused `/copilot` verification covering thread interactions and streaming behavior. -metadata: - short-description: Repo workflow for copilot features ---- - -# Copilot Workflow - -Use this skill for changes to agent UI state, streaming behavior, thread interactions, or the authenticated agent route. - -## Use When - -- the task changes `src/components/agent/*` -- the task changes `src/app/api/agent/route.ts` -- the task changes `/copilot` page behavior, model selection, or stream timelines - -## Do Not Use When - -- the task is market-only -- the task is generic styling or infrastructure work unrelated to agent flows - -## Workflow - -1. Read the nearest copilot AGENTS files for the touched code: - - `src/components/agent/AGENTS.md` for UI/thread-state work - - `src/app/api/agent/AGENTS.md` for streaming route work - Then read [docs/architecture/copilot.md](../../../docs/architecture/copilot.md). -2. Prefer extracting pure helpers from large client files instead of expanding hook/component files further. -3. Keep server-only prompt, auth, and rate-limit logic out of client modules. -4. Preserve thread rename/pin/delete/search behavior while refactoring. -5. Verify the full repo matrix and smoke `/copilot` interactions when affected. - -## Expected Output - -- summary of the user-visible `/copilot` impact -- note whether stream-state, route, or thread behavior changed -- exact verification commands run diff --git a/.agents/skills/markets-workflow/SKILL.md b/.agents/skills/markets-workflow/SKILL.md deleted file mode 100644 index fbda4e9..0000000 --- a/.agents/skills/markets-workflow/SKILL.md +++ /dev/null @@ -1,36 +0,0 @@ ---- -name: markets-workflow -description: Use when working on the market workspace, FMP integration, cache/storage behavior, market routes, or stock/watchlist/screener flows in this repo. Do not use for `/copilot`-only UI or generic Next.js work that does not touch market data. Expected output: code changes plus the relevant market verification commands and any migration or capability checks that were required. -metadata: - short-description: Repo workflow for market features ---- - -# Markets Workflow - -Use this skill for changes under `src/lib/server/markets`, `src/components/markets`, `src/app/(home)`, or `src/app/api/market*`, `watchlists`, and `screeners`. - -## Use When - -- the task changes FMP request behavior, cache policy, or market-time logic -- the task changes watchlists, screeners, symbol search, or stock dossier behavior -- the task changes market route error contracts or degraded states - -## Do Not Use When - -- the task is only about `/copilot`, thread state, or model routing -- the task is generic repo maintenance with no market feature impact - -## Workflow - -1. Read `src/lib/server/markets/AGENTS.md` and [docs/architecture/markets.md](../../../docs/architecture/markets.md). -2. When the change touches market pages or market UI, also check the root `AGENTS.md` guidance for the relevant market route/component subtree. -3. Check whether the change touches provider access, cache semantics, storage initialization, or market-time logic. -4. Keep route handlers thin and push market logic into focused server modules. -5. When splitting code, preserve stable facades so page and route imports stay readable. -6. Verify with the full repo matrix and add `pnpm markets:capabilities` when FMP capability assumptions changed. - -## Expected Output - -- concise summary of the market-facing change -- note whether migrations or capability checks were required -- exact verification commands run diff --git a/.agents/skills/project-scaffold/kit.md b/.agents/skills/project-scaffold/kit.md deleted file mode 100644 index bdc89d5..0000000 --- a/.agents/skills/project-scaffold/kit.md +++ /dev/null @@ -1,311 +0,0 @@ ---- -schema: kit/1.0 -owner: citadel -slug: project-scaffold -title: Project-Aware Scaffold — New Files That Fit -summary: >- - Reads your codebase, finds exemplars, generates new files matching your exact - conventions, wires into every registration point. No generic templates. -version: 1.0.0 -license: MIT -tags: - - scaffolding - - code-generation - - conventions - - developer-tools - - boilerplate -model: - provider: anthropic - name: claude-sonnet-4-6 - hosting: Anthropic API (api.anthropic.com) -tools: - - Read - - Write - - Edit - - Glob - - Grep - - Bash -tech: - - typescript - - javascript - - react -failures: - - problem: No exemplar files found for the requested type. - resolution: >- - Lists available file types in the codebase and asks which to use as the - exemplar. Never generates from memory when no precedent exists. - scope: general - - problem: Target file or directory already exists. - resolution: >- - Asks for confirmation before overwriting. Does not silently clobber - existing files. - scope: general - - problem: Typecheck fails after generation. - resolution: >- - Fixes the type errors before exiting. Does not leave the developer with - broken generated files. - scope: general - - problem: No wiring point found for the generated file. - resolution: >- - Notes the missing registration point explicitly in output rather than - silently leaving the file unwired. - scope: general -useCases: - - scenario: >- - Adding a new component, service, hook, or module to an established - project. - constraints: - - >- - Project must have at least 2 existing files of the same type to use as - exemplars - notFor: - - Files with no precedent in the project - - Modifying existing files - - scenario: >- - Onboarding to a new codebase and adding a feature without deeply studying - the conventions first. - constraints: [] - notFor: [] - - scenario: >- - Ensuring generated files are fully wired in — barrel exports, route - registration, etc. — on first write. - constraints: [] - notFor: [] -inputs: - - name: file_type - description: >- - Kind of file to generate: component, hook, service, route, module, domain, - utility. If ambiguous, asks one clarifying question. - - name: name - description: >- - What to call it. The kit infers correct casing (PascalCase, kebab-case, - etc.) from existing codebase conventions. - - name: description - description: >- - Optional: what the new file does. Used to generate meaningful internals - rather than empty stubs. -outputs: - - name: generated_files - description: >- - Main file, and optionally test file, types file, barrel update, story - file, style file — whichever the project conventions call for. - - name: wiring_summary - description: >- - List of registration points updated: barrel exports, route registrations, - module registries, nav configs, etc. - - name: conventions_source - description: Paths to the exemplar files used as the convention reference. -selfContained: true ---- - -# Project-Aware Scaffold — New Files That Fit - -## Goal - -Every project has conventions. Some are documented; most are implicit — visible -in the patterns of the files that already exist. When you add a new file manually, -you internalize those patterns automatically. When a generic AI template generates -one, the patterns are wrong: wrong import style, wrong naming, wrong structure, -missing the test file, not wired into the barrel export. - -This kit solves that by reading the codebase *before generating anything*. It finds -2-3 existing files of the same type as what you're creating — the project's own -exemplars — and replicates their exact patterns. Not "a React component pattern." -Your project's React component pattern. - -It also handles wiring: finding every registration point the exemplars use (barrel -exports, route configs, module registries, navigation arrays, type unions) and -adding the new file to each one, in the exact format the project already uses. - -The result: a file that passes code review as if a senior developer who knows this -codebase wrote it. Because it was written from that developer's prior work. - -## When to Use - -- Adding a new component, service, hook, route, module, or utility to an existing project -- Any time you want the generated file to match your conventions exactly -- When you're onboarding and want to add something without deeply studying the conventions first -- When you need the file wired in — not just created, but actually connected - -**Don't use for:** -- Files with no precedent in the project (work with the user to establish the pattern first) -- Modifying existing files (just edit them directly) -- Projects with no established conventions yet - -## Setup - -No setup required. The kit reads conventions from your existing codebase. - -One thing to have ready: know the type and name of what you want to create. -"A React component called UserProfile" or "a service for handling Stripe webhooks" -is enough context to start. - -## Steps - -### Step 1: Parse the Request - -From the user's input, extract: -- **type**: component | hook | service | route | module | domain | utility | custom -- **name**: what to call it (the kit normalizes casing to match the project convention) -- **description**: what it does, if provided (makes internals more meaningful) - -If the type is ambiguous, ask *one* clarifying question. Not a form — one question. - ---- - -### Step 2: Find Exemplars - -Search the codebase for 2-3 existing files of the same type. These are the -convention reference: - -| Type | Where to Look | -|---|---| -| component | `**/*.tsx` in the same directory or sibling directories | -| hook | `**/hooks/**`, `**/use*.ts` | -| service | `**/services/**`, `**/lib/**` | -| route | Router config files, `**/routes.*`, `**/pages/**` | -| module/domain | Top-level feature directories | -| utility | `**/utils/**`, `**/helpers/**` | - -**From each exemplar, extract:** -1. File naming convention (PascalCase? kebab-case? camelCase?) -2. Directory placement (co-located? separate directory?) -3. Import style (path aliases? relative? named or default exports?) -4. Export style (named exports? default? re-exported from barrel?) -5. Internal patterns (how state is managed, error handling, JSDoc or not) -6. Test co-location (`.test.ts` next to file? `__tests__/` directory?) -7. Types pattern (inline? separate `.types.ts`? shared types file?) - -Before generating anything, output a 3-5 line summary of the conventions found. -This confirms the kit is working from the right patterns before writing files. - ---- - -### Step 3: Determine the File Set - -Based on the exemplars, determine which files to generate. Only generate what -the project's conventions call for: - -| File | Generate if... | -|---|---| -| Main file | Always | -| Types file (`.types.ts`) | Project separates types for this type of file | -| Test file (`.test.ts`) | Project co-locates tests for this type of file | -| Barrel/index update | Project uses barrel exports and the directory has one | -| Style file | Project uses co-located styles for this type | -| Story file (`.stories.tsx`) | Project has Storybook stories for this type | - -**Do not generate:** -- Empty placeholder files with only a stub comment -- Test files containing only empty `describe` / `it` stubs with no assertions -- Types files that only re-export from elsewhere -- Any file type the project doesn't already use - ---- - -### Step 4: Generate Files - -For each file in the set, adapt the closest exemplar: - -1. Match the exemplar's structure exactly — same section order, same patterns -2. Replace names and specific logic, keep structural patterns -3. Every generated file must be syntactically valid and importable -4. No placeholder comments (`// implement me`, `// Add logic here`) -5. No empty function bodies unless the exemplar has them -6. Minimal but real — a component renders something, a service has at least one - real method, a hook returns a typed value - -**Specific rules by type:** - -For components: match props pattern (interface vs. type, inline vs. separate), -state management pattern, utility imports (cn, clsx, etc.), and forwardRef/memo -usage if the exemplar uses them. - -For services/modules: match initialization pattern (constructor vs. factory vs. -singleton), error handling (throw vs. Result type vs. callbacks), and async patterns. - -For hooks: match naming, parameter patterns, return types, and cleanup handling. - ---- - -### Step 5: Wire It In - -A new file that isn't connected to anything is a file waiting to cause a -"where did this come from?" moment three months later. Find every registration -point the exemplars use and add the new file to each one. - -**Common wiring points:** - -| Registration Point | How to Find It | What to Add | -|---|---|---| -| Barrel exports | `index.ts` in same or parent directory | `export { NewThing } from './NewThing'` | -| Route registration | Router config (search for exemplar's route) | New route entry | -| Module registry | Bootstrap/registration file | New registration call | -| Navigation/sidebar | Nav config array | New entry (if appropriate) | -| Lazy loading map | Dynamic import map | New lazy import | -| Type unions | Discriminated unions listing variants | New variant | - -**Rules for wiring:** -- Only wire into registration points the exemplars actually use -- Match the exact format — same spacing, trailing commas, comment style -- If alphabetical ordering is used, maintain it -- Never create new registration points — only add to existing ones - ---- - -### Step 6: Verify - -After all files are generated and wired: - -1. Run the project's typecheck command. Every generated file must pass. - Fix any type errors before exiting — do not leave the user with broken files. -2. Verify the main file is importable from outside its directory via the convention - (barrel export, direct import, or however the project imports this type). -3. Re-read the exemplars one more time and compare against the generated output. - Flag and fix any deviations noticed in this final pass. - -**Output the scaffold summary:** - -``` -SCAFFOLD COMPLETE - -Created: - - src/components/UserProfile/UserProfile.tsx (component) - - src/components/UserProfile/UserProfile.test.tsx (test) - - src/components/UserProfile/UserProfile.types.ts (types) - -Wired into: - - src/components/index.ts (barrel export) - - src/routes/routes.ts (route registration) - -Conventions from: - - src/components/PostCard/PostCard.tsx - - src/components/CommentList/CommentList.tsx - -Typecheck: PASS -``` - ---- - -## Constraints - -- **Exemplar-driven, not template-driven.** The kit generates nothing from memory or - generic templates. If it can't find 2+ exemplars of the requested type, it asks - rather than guessing. -- **No placeholders.** Generated code is functional on first write. Stub comments - and empty function bodies are not acceptable output. -- **No scope creep.** The kit creates what was asked for and wires it in. It doesn't - refactor nearby files, update documentation, or make "improvements" to exemplars - it reads. -- **Only files the project already uses.** No Storybook files if the project doesn't - use Storybook. No `.types.ts` files if the project puts types inline. Match the - project, not a preferred convention. - -## Safety Notes - -- Before generating, confirm the target directory is correct — it's faster to specify - the right location upfront than to move files after wiring. -- If the generated file replaces an existing one (confirmed by you), review the wiring - changes carefully — the old file may have been wired differently than the new one. -- The typecheck step catches most generated code issues. If your project's typecheck - is slow, the verification step still runs — the kit doesn't skip it. diff --git a/.agents/skills/repo-review/SKILL.md b/.agents/skills/repo-review/SKILL.md deleted file mode 100644 index 96111b0..0000000 --- a/.agents/skills/repo-review/SKILL.md +++ /dev/null @@ -1,42 +0,0 @@ ---- -name: repo-review -description: Use when reviewing or finishing changes in this repo and you need the standard Yurie Markets verification and review pass. Do not use for implementation-only work when no review or final verification is requested. Expected output: prioritized findings first if any exist, then the exact verification commands run or blocked. -metadata: - short-description: Repo review and verification pass ---- - -# Repo Review - -Use this skill for final review, stabilization passes, or when a task asks whether the repo change is safe to ship. - -## Use When - -- the user asks for a review -- the task is complete and needs the repo-standard verification pass -- a refactor changed multiple modules and needs a contract check - -## Do Not Use When - -- the task is still in active implementation and no review is requested -- the work is exploratory and not yet ready for verification - -## Workflow - -1. Read [docs/agents/code-review.md](../../../docs/agents/code-review.md) and [docs/agents/verification.md](../../../docs/agents/verification.md). -2. Review findings first, ordered by severity and contract risk. -3. Run: - -```bash -pnpm test -pnpm lint -pnpm typecheck -pnpm build -``` - -4. Add scoped checks when the touched area requires them. - -## Expected Output - -- findings first with file references when issues exist -- explicit statement when no findings were found -- exact verification commands run and any blocked checks diff --git a/.agents/skills/systematic-debugging/kit.md b/.agents/skills/systematic-debugging/kit.md deleted file mode 100644 index fa183fb..0000000 --- a/.agents/skills/systematic-debugging/kit.md +++ /dev/null @@ -1,250 +0,0 @@ ---- -schema: kit/1.0 -owner: citadel -slug: systematic-debugging -title: Systematic Debugging — Root Cause Before Fix -summary: >- - 4-phase debug protocol: observe → hypothesize → verify → fix. No code changes - without a confirmed root cause. Emergency stop after two failed attempts. -version: 1.0.0 -license: MIT -tags: - - debugging - - root-cause-analysis - - quality - - methodology - - developer-tools -model: - provider: anthropic - name: claude-sonnet-4-6 - hosting: Anthropic API (api.anthropic.com) -tools: - - Read - - Grep - - Bash -failures: - - problem: Bug cannot be reproduced. - resolution: >- - Stop at Phase 1. Document every known triggering condition precisely. Ask - for more context rather than forming hypotheses from irreproducible - symptoms. - scope: general - - problem: Two consecutive fix attempts have failed. - resolution: >- - Emergency Stop Rule: return to Phase 2 with entirely new hypotheses. - Re-read the relevant code from scratch first. Never attempt a third guess - from the same understanding. - scope: general - - problem: Bug is in a dependency or generated file that cannot be modified. - resolution: >- - Document the root cause but do not touch the dependency. Propose a - workaround in the consuming code and note the upstream issue. - scope: general -useCases: - - scenario: >- - A non-trivial bug where the cause is not immediately obvious from the - error. - constraints: [] - notFor: - - >- - Trivial typos or config mistakes where the error points directly to the - fix - - scenario: An intermittent failure that is hard to reproduce consistently. - constraints: - - Must be reproduced at least once before forming hypotheses - notFor: [] - - scenario: A debugging session where a previous fix attempt made things worse. - constraints: [] - notFor: [] -inputs: - - name: bug_description - description: >- - Error message, stack trace, or behavioral description. Can be a file path, - pasted error, or natural language description of wrong behavior. -outputs: - - name: problem_statement - description: >- - Structured statement: "{Component} does {X} when it should do {Y}, - triggered by {condition}, at {file}:{line}" - - name: root_cause - description: >- - Causal explanation: "The bug occurs because {cause}. This happens when - {trigger}." - - name: fix - description: >- - Minimal code change addressing the root cause, with notes on any related - occurrences found and fixed. -selfContained: true ---- - -# Systematic Debugging — Root Cause Before Fix - -## Goal - -Most bugs get fixed wrong the first time because they get fixed *before they get understood*. -A developer sees an error, makes a plausible guess, applies a change, and either gets lucky -or creates three new bugs in the process. - -This kit enforces a single rule: **no code changes without a confirmed root cause.** - -The four-phase protocol — observe, hypothesize, verify, fix — sounds obvious, but the -discipline of writing down a problem statement before forming any hypothesis, and -*verifying* the hypothesis before touching code, eliminates a category of debugging -failure that experience alone doesn't prevent. - -The emergency stop after two failed fixes is the other critical feature. If you've tried -twice and the bug is still there, you're guessing. The kit forces you to stop, re-read, -and form new hypotheses instead of escalating the damage. - -## When to Use - -- Any non-trivial bug where the cause isn't immediately obvious from the error -- Intermittent failures that are hard to pin down -- Bugs where a previous fix attempt made things worse -- Any debugging session that's already consumed more than 20 minutes without a fix - -**Skip this kit for:** trivial typos, obvious off-by-ones visible in the diff, and -configuration mistakes where the error message points directly to the fix. - -## Setup - -No setup required. This kit works on any codebase in any language. - -If the project has a typecheck command (`npm run typecheck`, `cargo check`, `mypy`, etc.), -have it ready — you'll run it after applying the fix to verify no regressions. - -If the project has a test suite, know how to run the relevant subset. A failing test -is the ideal reproduction vehicle. - -## Steps - -### Phase 1: Observation and Reproduction - -Before forming any theory about what's wrong, establish what *is* wrong — precisely. - -**1.1 Read the error fully.** Not the first line. The whole thing. Stack traces contain -the actual failure location. Error messages often contain the fix. Read before searching. - -**1.2 Reproduce the issue.** A bug you can't reproduce is a bug you can't verify you've -fixed: -- For type errors: run typecheck and read the full output -- For runtime errors: identify the exact inputs or user action that triggers it -- For behavioral bugs: document expected vs. actual behavior in one sentence each - -**1.3 Isolate to a specific location:** -- What file? What function or component? What line range? -- What are the inputs when it fails? -- Is this consistent or intermittent? (If intermittent, document every triggering condition - you can identify before proceeding) - -**Checkpoint — write this before continuing:** -``` -Problem statement: "{Component/function} does {X} when it should do {Y}, -triggered by {condition}, located at {file}:{approximate line}" -``` - -Do not proceed to Phase 2 until this statement is written. - ---- - -### Phase 2: Hypothesis and Verification - -**2.1 Form up to three hypotheses** for WHY the bug exists. Force yourself to rank them: -``` -H1: {most likely cause} — evidence: {what in the code suggests this} -H2: {second candidate} — evidence: {what makes this plausible} -H3: {third candidate} — evidence: {why this can't be ruled out} -``` - -**2.2 Design a verification step for each hypothesis.** This means adding a diagnostic -that *observes without changing*: -- A `console.log` or `print` at a specific point to check a value -- Reading a specific variable or state at the moment of failure -- Checking whether a condition is true when you expect it to be false - -**CRITICAL:** Do not change any logic during Phase 2. Only observe. - -**2.3 Run the verification.** Which hypothesis was confirmed? Which were eliminated? - -If all three hypotheses are eliminated: you have new information. That's progress. -Form a new set of hypotheses using what you just learned and repeat from 2.1. - ---- - -### Phase 3: Root Cause Analysis - -Once a hypothesis is confirmed, the real work begins: understanding *why*, not just *where*. - -**3.1 Trace the causal chain backward** from the symptom to the source: -- Who calls the failing function, and with what arguments? -- Where do those arguments come from? -- What assumption does the code make about the input that's being violated? - -**Write the root cause statement:** -``` -"The bug occurs because {cause}. This happens when {trigger condition}. -The incorrect assumption is {what the code expects vs. what it receives}." -``` - -**3.2 Check for siblings.** Is this pattern used elsewhere? -- Search the codebase for the same function signature, the same assumption, the same pattern -- A bug born from a wrong assumption is often present in 2-3 other places - ---- - -### Phase 4: Implementation - -**4.1 Write a failing test** that reproduces the bug before fixing it (if a test framework -exists). This is your proof that the bug was real and your verification that the fix works. - -**4.2 Apply the minimal fix.** Change only what's necessary to address the root cause: -- Fix the incorrect assumption at its source, not at every symptom -- Do not refactor surrounding code while fixing the bug -- Do not add "defensive" checks for conditions that shouldn't occur - -**4.3 Verify the fix:** -- The test case (if written) now passes -- Typecheck passes with no new errors -- The original reproduction scenario no longer triggers the bug -- Any related occurrences found in 3.2 are also fixed or documented - ---- - -### Emergency Stop Rule - -**If two fix attempts have failed: stop.** - -Do not try a third fix based on the same understanding of the problem. The root cause -analysis was wrong somewhere. Your options: - -1. Return to Phase 2 with fresh eyes and new hypotheses — read the failing code again - from scratch before forming them -2. Ask for more context about the system's intended behavior -3. Check whether there's a higher-level architectural issue (the bug is a symptom, - not the disease) - -Three failed fixes in a row is the definition of guessing. The kit protects against it. - ---- - -## Constraints - -- **No fixes in Phase 2.** Verification is diagnostic-only. Adding a `console.log` is fine. - Changing logic is not. -- **One hypothesis confirmed is enough to proceed.** You don't need to test all three. - But you must test at least one before writing the fix. -- **The fix addresses the root cause, not the symptom.** If the root cause is "caller - passes null when the function expects a number," the fix is at the call site, not adding - `if (value === null) return` inside the function. -- **No scope creep during the fix.** The bug fix PR contains only the fix. Related - improvements go in a separate commit. - -## Safety Notes - -- Before fixing a bug in a shared module used by many callers, search for all call sites - and verify the fix doesn't break any of them -- If the fix changes a public API (function signature, return type, event shape), treat - it as a breaking change and communicate it -- If Phase 2 reveals that the "bug" is actually the specified behavior and the spec is - wrong, stop and clarify the requirement before touching code — you may be about to - break something intentional diff --git a/AGENTS.md b/AGENTS.md index 67243e7..08c6063 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,9 +2,9 @@ ## Scope -This file applies repo-wide. Prefer the nearest nested `AGENTS.md` for subtree-specific rules, and keep long playbooks in `docs/agents/*`, `docs/architecture/*`, or `.agents/skills/*`. +This file applies repo-wide. Prefer the nearest nested `AGENTS.md` for subtree-specific rules, and keep long playbooks in `docs/agents/*` or `docs/architecture/*`. -If an agent keeps making the same mistake, update the closest relevant `AGENTS.md` or skill instead of repeating the instruction in prompts. +If an agent keeps making the same mistake, update the closest relevant `AGENTS.md` instead of repeating the instruction in prompts. ## Product Surfaces @@ -23,7 +23,6 @@ Keep behavior stable. Prefer incremental refactors over architecture rewrites. - `src/lib/shared`: shared types and pure helpers; prefer narrow imports over broad barrels - `docs/agents`: durable agent-facing guidance - `docs/architecture`: system notes that are too detailed for AGENTS files -- `.agents/skills`: repeatable repo workflows Read [docs/agents/repo-map.md](docs/agents/repo-map.md) for the expanded layout and [docs/agents/verification.md](docs/agents/verification.md) for scoped verification follow-ups. @@ -59,7 +58,7 @@ pnpm build - Client components must not import server-only modules. - Keep tests in domain-local `__tests__` folders. - Do not add or reintroduce catch-all barrels under `src/lib/shared`. -- Do not duplicate durable instructions in ad hoc prompts when they belong in AGENTS or a skill. +- Do not duplicate durable instructions in ad hoc prompts when they belong in AGENTS. ## Done Checklist @@ -76,12 +75,6 @@ Check the nearest nested `AGENTS.md` when working in specialized areas: - `src/lib/server/markets/AGENTS.md` - `src/components/agent/AGENTS.md` -For longer workflows, use the matching skills: - -- `.agents/skills/markets-workflow/SKILL.md` -- `.agents/skills/copilot-workflow/SKILL.md` -- `.agents/skills/repo-review/SKILL.md` - Only add another nested `AGENTS.md` when a subtree has durable rules that differ from its parent. ## Cursor Cloud Notes diff --git a/README.md b/README.md index ce4fa70..073ff13 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,6 @@ The app runs on [http://localhost:3000](http://localhost:3000). - Start with `AGENTS.md` for the repo contract. - Use the nearest nested `AGENTS.md` for specialized areas, including `src/app/api`, `src/app/api/agent`, `src/lib/server/markets`, and `src/components/agent`. -- Repo-specific repeatable workflows live under `.agents/skills`. - Expanded repo guidance lives in `docs/agents/*` and architecture notes live in `docs/architecture/*`. ## Required environment diff --git a/docs/agents/repo-map.md b/docs/agents/repo-map.md index 19d86e2..d895bb0 100644 --- a/docs/agents/repo-map.md +++ b/docs/agents/repo-map.md @@ -32,6 +32,5 @@ - `AGENTS.md`: root repo contract - `src/**/AGENTS.md`: local rules for specialized subtrees, with nearer files overriding broader ones - `src/app/api/agent/AGENTS.md`: streaming-specific contract for the authenticated agent route -- `.agents/skills/*/SKILL.md`: repeatable repo workflows - `docs/references/*`: external reference docs vendored into the repo for offline/consistent agent access - `.codex/config.toml`: Codex discovery tuning diff --git a/src/lib/server/markets/AGENTS.md b/src/lib/server/markets/AGENTS.md index e507a04..a9a0f39 100644 --- a/src/lib/server/markets/AGENTS.md +++ b/src/lib/server/markets/AGENTS.md @@ -4,7 +4,7 @@ Applies to `src/lib/server/markets/**`. Use this subtree for market-time rules, FMP access, cache policy, storage invariants, and stable market-service facades. -For longer background, read [docs/architecture/markets.md](../../../docs/architecture/markets.md). For workflow and verification, use `.agents/skills/markets-workflow/SKILL.md`. +For longer background, read [docs/architecture/markets.md](../../../docs/architecture/markets.md). For verification follow-ups, read [docs/agents/verification.md](../../../docs/agents/verification.md). ## Invariants