Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ node_modules/
dist/
.codebase-index/
.codebase-index.json
.codebase-context/
.codebase/
*.log
.DS_Store
.env
Expand Down
43 changes: 42 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,71 @@
# Changelog

## [Unreleased]

## [1.4.0] - 2026-01-28

### Added

- **Memory System**: New `remember` and `get_memory` tools capture team conventions, decisions, and gotchas
- **Types**: `convention` | `decision` | `gotcha`
- **Categories**: `tooling`, `architecture`, `testing`, `dependencies`, `conventions`
- **Storage**: `.codebase-context/memory.json` with content-based hash IDs (commit this)
- **Safety**: `get_memory` truncates unfiltered results to 20 most recent
- **Integration with `get_team_patterns`**: Appends relevant memories when category overlaps
- **Integration with `search_codebase`**: Surfaces `relatedMemories` via keyword match in search results

### Changed

- **File Structure**: All MCP files now organized in `.codebase-context/` folder for cleaner project root
- Vector DB: `.codebase-index/` → `.codebase-context/index/`
- Intelligence: `.codebase-intelligence.json` → `.codebase-context/intelligence.json`
- Keyword index: `.codebase-index.json` → `.codebase-context/index.json`
- **Migration**: Automatic on server startup (legacy JSON preserved; vector DB directory moved)

### Fixed

- **Startup safety**: Validates `ROOT_PATH` before running migration to avoid creating directories on typo paths

### Why This Feature

Patterns show "what" (97% use inject) but not "why" (standalone compatibility). AGENTS.md can't capture every hard-won lesson. Decision memory gives AI agents access to the team's battle-tested rationale.

**Design principle**: Tool must be self-evident without AGENTS.md rules. "Is this about HOW (record) vs WHAT (don't record)"

**Inspired by**: v1.1 Pattern Momentum (temporal dimension) + memory systems research (Copilot Memory, Gemini Memory)

## [1.3.3] - 2026-01-18

### Fixed

- **Security**: Resolve `pnpm audit` advisories by updating `hono` to 4.11.4 and removing the vulnerable `diff` transitive dependency (replaced `ts-node` with `tsx` for `pnpm dev`).

### Changed

- **Docs**: Clarify private `internal-docs/` submodule setup, add `npx --yes` tip, document `CODEBASE_ROOT`, and list `get_indexing_status` tool.
- **Submodule**: Disable automatic updates for `internal-docs` (`update = none`).

### Removed

- **Dev**: Remove local-only `test-context.cjs` helper script.

## [1.3.2] - 2026-01-16

### Changed

- **Embeddings**: Batch embedding now uses a single Transformers.js pipeline call per batch for higher throughput.
- **Dependencies**: Bump `@modelcontextprotocol/sdk` to 1.25.2.

## [1.3.1] - 2026-01-05

### Fixed

- **Auto-Heal Semantic Search**: Detects LanceDB schema corruption (missing `vector` column), triggers re-indexing, and retries search instead of silently falling back to keyword-only results.

## [1.3.0] - 2026-01-01

### Added

- **Workspace Detection**: Monorepo support for Nx, Turborepo, Lerna, and pnpm workspaces
- New utility: `src/utils/workspace-detection.ts`
- Functions: `scanWorkspacePackageJsons()`, `detectWorkspaceType()`, `aggregateWorkspaceDependencies()`
Expand All @@ -36,13 +75,16 @@
- **Dependency Detection**: Added `@nx/` and `@nrwl/` prefix matching for build tools

### Fixed

- **detectMetadata() bug**: All registered analyzers now contribute to codebase metadata (previously only the first analyzer was called)
- Added `mergeMetadata()` helper with proper array deduplication and layer merging

### Changed

- Updated roadmap: v1.3 is now "Extensible Architecture Foundation"

### Acknowledgements

Thanks to [@aolin480](https://github.com/aolin480) for accelerating the workspace detection roadmap and identifying the detectMetadata() limitation in their fork.

## 1.2.2 (2025-12-31)
Expand All @@ -55,7 +97,6 @@ Thanks to [@aolin480](https://github.com/aolin480) for accelerating the workspac

## 1.2.1 (2025-12-31)


### Fixed

- **MCP Protocol Compatibility**: Fixed stderr output during MCP STDIO handshake for strict clients
Expand Down
99 changes: 76 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,54 +28,107 @@ If your environment prompts on first run, use `npx --yes ...` (or `npx -y ...`)
- **Testing conventions** → `Jest`: 74%, `Playwright`: 6%
- **Framework patterns** → Angular signals, standalone components, etc.
- **Circular dependency detection** → Find toxic import cycles between files

- **Memory system** → Record "why" behind choices so AI doesn't repeat mistakes

## How It Works

When generating code, the agent checks your patterns first:

| Without MCP | With MCP |
|-------------|----------|
| Uses `constructor(private svc: Service)` | Uses `inject()` (97% team adoption) |
| Suggests `primeng/button` directly | Uses `@mycompany/ui-toolkit` wrapper |
| Generic Jest setup | Your team's actual test utilities |
| Without MCP | With MCP |
| ---------------------------------------- | ------------------------------------ |
| Uses `constructor(private svc: Service)` | Uses `inject()` (97% team adoption) |
| Suggests `primeng/button` directly | Uses `@mycompany/ui-toolkit` wrapper |
| Generic Jest setup | Your team's actual test utilities |

### Tip: Auto-invoke in your rules

Add this to your `.cursorrules`, `CLAUDE.md`, or `AGENTS.md`:

```
When generating or reviewing code, use codebase-context tools to check team patterns first.
## Codebase Context

**At start of each task:** Call `get_memory` to load team conventions.

**CRITICAL:** When user says "remember this" or "record this":
- STOP immediately and call `remember` tool FIRST
- DO NOT proceed with other actions until memory is recorded
- This is a blocking requirement, not optional
```

Now the agent checks patterns automatically instead of waiting for you to ask.

## Tools

| Tool | Purpose |
|------|---------|
| `search_codebase` | Semantic + keyword hybrid search |
| `get_component_usage` | Find where a library/component is used |
| `get_team_patterns` | Pattern frequencies + canonical examples |
| `get_codebase_metadata` | Project structure overview |
| `get_indexing_status` | Indexing progress + last stats |
| `get_style_guide` | Query style guide rules |
| `detect_circular_dependencies` | Find import cycles between files |
| `refresh_index` | Re-index the codebase |
| Tool | Purpose |
| ------------------------------ | --------------------------------------------- |
| `search_codebase` | Semantic + keyword hybrid search |
| `get_component_usage` | Find where a library/component is used |
| `get_team_patterns` | Pattern frequencies + canonical examples |
| `get_codebase_metadata` | Project structure overview |
| `get_indexing_status` | Indexing progress + last stats |
| `get_style_guide` | Query style guide rules |
| `detect_circular_dependencies` | Find import cycles between files |
| `remember` | Record memory (conventions/decisions/gotchas) |
| `get_memory` | Query recorded memory by category/keyword |
| `refresh_index` | Re-index the codebase |

## File Structure

The MCP creates the following structure in your project:

```
.codebase-context/
├── memory.json # Team knowledge (commit this)
├── intelligence.json # Pattern analysis (generated)
├── index.json # Keyword index (generated)
└── index/ # Vector database (generated)
```

**Recommended `.gitignore`:** The vector database and generated files can be large. Add this to your `.gitignore` to keep them local while sharing team memory:

```gitignore
# Codebase Context MCP - ignore generated files, keep memory
.codebase-context/*
!.codebase-context/memory.json
```

### Memory System

Patterns tell you _what_ the team does ("97% use inject"), but not _why_ ("standalone compatibility"). Use `remember` to capture rationale that prevents repeated mistakes:

```typescript
// AI won't change this again after recording the decision
remember({
type: 'decision',
category: 'dependencies',
memory: 'Use node-linker: hoisted, not isolated',
reason:
"Some packages don't declare transitive deps. Isolated forces manual package.json additions."
});
```

Memories surface automatically in `search_codebase` results and `get_team_patterns` responses.

**Early baseline — known quirks:**

- Agents may bundle multiple things into one entry
- Duplicates can happen if you record the same thing twice
- Edit `.codebase-context/memory.json` directly to clean up
- Be explicit: "Remember this: use X not Y"

## Configuration

| Variable | Default | Description |
|----------|---------|-------------|
| `EMBEDDING_PROVIDER` | `transformers` | `openai` (fast, cloud) or `transformers` (local, private) |
| `OPENAI_API_KEY` | - | Required if provider is `openai` |
| `CODEBASE_ROOT` | - | Project root to index (CLI arg takes precedence) |
| `CODEBASE_CONTEXT_DEBUG` | - | Set to `1` to enable verbose logging (startup messages, analyzer registration) |
| Variable | Default | Description |
| ------------------------ | -------------- | ------------------------------------------------------------------------------ |
| `EMBEDDING_PROVIDER` | `transformers` | `openai` (fast, cloud) or `transformers` (local, private) |
| `OPENAI_API_KEY` | - | Required if provider is `openai` |
| `CODEBASE_ROOT` | - | Project root to index (CLI arg takes precedence) |
| `CODEBASE_CONTEXT_DEBUG` | - | Set to `1` to enable verbose logging (startup messages, analyzer registration) |

## Performance Note

This tool runs **locally** on your machine using your hardware.

- **Initial Indexing**: The first run works hard. It may take several minutes (e.g., ~2-5 mins for 30k files) to compute embeddings for your entire codebase.
- **Caching**: Subsequent queries are instant (milliseconds).
- **Updates**: Currently, `refresh_index` re-scans the codebase. True incremental indexing (processing only changed files) is on the roadmap.
Expand Down
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "codebase-context",
"version": "1.3.3",
"version": "1.4.0",
"description": "MCP server that helps AI agents understand your codebase - patterns, libraries, architecture, monorepo support",
"type": "module",
"main": "./dist/lib.js",
Expand Down Expand Up @@ -92,7 +92,7 @@
"@xenova/transformers": "^2.17.0",
"fuse.js": "^7.0.0",
"glob": "^10.3.10",
"hono": "4.11.4",
"hono": "4.11.7",
"ignore": "^5.3.1",
"typescript": "^5.3.3",
"uuid": "^9.0.1",
Expand Down
20 changes: 10 additions & 10 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading