Skip to content

calinfaja/KnowlinMCP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KnowlinMCP

Per-project knowledge database with hybrid semantic search, exposed as an MCP server. Captures insights, indexes docs and session transcripts, retrieves them with dense + sparse + reranking search.

                          KnowlinMCP
  +-----------+     +-------------------+     +-----------+
  | Claude    |     |   MCP Server      |     | .knowledge-db/
  | Gemini    |<--->|   (stdio)         |<--->|   entries.jsonl
  | Codex     |     |  knowlin_search   |     |   embeddings.npy
  | Cursor    |     |  knowlin_get      |     |   sessions/
  | VS Code   |     |  knowlin_stats    |     |   docs/
  +-----------+     |  knowlin_ingest   |     +-----------+
                    +-------------------+
                            |
  +-------------------------+-------------------------+
  |                         |                         |
  v                         v                         v
  Dense Search          Sparse Search           Cross-encoder
  (BGE-small 384d)      (SPLADE++ sparse)        Reranker
  |                         |                         |
  +------------+------------+                         |
               |                                      |
               v                                      |
         RRF Fusion -------> Intent Weighting ------->+
         (per source)        DEBUG -> sessions
                             HOWTO -> docs
                             RECALL -> sessions

Install

git clone <repo> && cd knowlin-mcp
./install.sh              # creates .venv, installs deps + MCP support
./install.sh --with-pdf   # also install PDF ingestion

Or manually:

pip install -e ".[mcp]"

Quick Start (30 seconds)

# 1. Initialize in your project
cd /your/project
knowlin init                    # creates .knowledge-db/ + .mcp.json

# 2. Index your docs
knowlin ingest all              # indexes docs/ and Claude sessions

# 3. Search
knowlin search "authentication"

That's it. Claude Code (and other MCP clients) can now use knowlin_search automatically via the .mcp.json created by init.

How It Works

Three sources, one search:

Source What Auto-discovered from
kb Manually captured insights knowlin capture "..."
docs Markdown, PDF, text files docs/, doc/, or sources.yaml paths
sessions Claude Code transcripts ~/.claude/projects/

Search pipeline: Every query is classified by intent (debug? howto? recall?), then searched with dense embeddings + sparse keywords + RRF fusion per source, fused across sources with intent-aware weights, and reranked with a cross-encoder. ~30ms via TCP server.

Incremental ingestion: SHA-256 file hashing tracks what's been processed. Only new or changed files are re-indexed. Run knowlin ingest all anytime -- it's fast.

CLI

# Search (default: compact format, all sources)
knowlin search "query"
knowlin search "query" -s kb -s docs       # specific sources
knowlin search "query" -f detailed         # verbose output
knowlin search "query" -f json             # machine-readable
knowlin search "query" --type warning      # filter by type
knowlin search "query" --since 2026-01-01  # date filter

# Capture knowledge
knowlin capture "JWT must be validated server-side" --type warning --tags "auth,jwt"

# Ingest
knowlin ingest all              # docs + sessions (incremental)
knowlin ingest docs             # docs only
knowlin ingest sessions         # sessions only
knowlin ingest all --full       # force re-process everything

# Manage
knowlin init                    # set up project (.knowledge-db/ + .mcp.json)
knowlin stats                   # entry counts per source
knowlin doctor --fix            # health check and auto-repair
knowlin sources --init          # create sources.yaml template
knowlin server start            # TCP server for ~30ms queries (foreground)

Entry types: finding, solution, pattern, warning, decision, discovery

Source Configuration

Without config, KnowlinMCP auto-discovers docs/, doc/, INFOS/ directories and Claude sessions from ~/.claude/projects/.

For explicit control, edit .knowledge-db/sources.yaml (created by knowlin init):

docs:
  paths:
    - docs/                       # relative to project root
    - ~/Desktop/INFOS/            # absolute path (~ expanded)
  # include: ["*.md", "*.txt", "*.pdf", "*.rst"]
  # exclude: ["drafts/**", "*.tmp"]

sessions:
  auto_discover: true             # scan ~/.claude/projects/

MCP Server

knowlin init writes .mcp.json for Claude Code. For other clients:

Gemini CLI, Codex, Cursor, VS Code

Gemini CLI (~/.gemini/settings.json):

{ "mcpServers": { "knowlin-mcp": { "command": "knowlin-mcp" } } }

Codex CLI (~/.codex/config.toml):

[mcp_servers.knowlin-mcp]
command = "knowlin-mcp"

Cursor (.cursor/mcp.json):

{ "mcpServers": { "knowlin-mcp": { "command": "knowlin-mcp" } } }

VS Code (.vscode/mcp.json):

{ "servers": { "knowlin-mcp": { "type": "stdio", "command": "knowlin-mcp" } } }

4 tools exposed: knowlin_search, knowlin_get, knowlin_stats, knowlin_ingest.

Python API

from knowlin_mcp import KnowledgeDB, MultiSourceSearch

db = KnowledgeDB("/path/to/project")
results = db.search("query", limit=5)

ms = MultiSourceSearch("/path/to/project")
results = ms.search("how to configure auth", sources=["kb", "docs"])

Storage

.knowledge-db/
  sources.yaml              # source config (optional)
  entries.jsonl              # curated KB (source of truth)
  embeddings.npy             # dense vectors (384-dim)
  sparse_index.json          # SPLADE++ sparse vectors
  sessions/                  # ingested session transcripts
    entries.jsonl, embeddings.npy, session-registry.json
  docs/                      # ingested documentation chunks
    entries.jsonl, embeddings.npy, doc-registry.json

Development

git clone <repo> && cd knowlin-mcp
./install.sh
.venv/bin/pytest tests/ -v           # 266 tests
.venv/bin/ruff check src/ tests/     # lint
.venv/bin/black src/ tests/          # format

License

MIT

About

Hybrid semantic knowledge database with MCP server and multi-source search

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors