llmwikify

Build persistent, LLM-maintained knowledge bases — Based on Karpathy's LLM Wiki Principles

⚠️ Beta Release — You may encounter bugs or breaking changes. Please report issues on GitHub.

🎯 What is llmwikify?

llmwikify is a general-purpose LLM-Wiki management tool that helps you build and maintain a persistent knowledge base. Unlike RAG systems that rediscover knowledge from scratch on every query, llmwikify incrementally builds and maintains a structured, interlinked wiki that compounds over time.

Core Philosophy

The wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read.

Based on Karpathy's LLM Wiki Principles:

📚 Raw sources — Your immutable source documents in raw/
📝 The wiki — LLM-maintained markdown pages with cross-references
⚙️ The schema — wiki.md that tells the LLM how to maintain the wiki

✨ Features

Core

SQLite FTS5 search — Porter stemmer, BM25 ranking, 0.06s for 157 pages
Bidirectional references — Automatic [[wikilink]] detection with section-level granularity
Query compounding — Save query answers as persistent wiki pages (wiki_synthesize)
Query sink — Buffer pending updates for later review with urgency tracking

Source Analysis (v0.26.0+)

analyze-source CLI with --all and --force support
Caches LLM extraction results (entities, relations, suggested pages)
Powers schema-aware lint gap detection

Cross-Source Synthesis (v0.28.0+)

Detects reinforced claims, contradictions, knowledge gaps across sources
Returns suggestions only — human decides what to do with them
CLI: llmwikify suggest-synthesis [source]

Smart Lint 2.0 (v0.28.0+)

Detects broken links, orphan pages, contradictions, data gaps
New: outdated pages, knowledge gaps, redundancy alerts
CLI: llmwikify lint [--format=full|brief|recommendations|json]
CLI: llmwikify knowledge-gaps

Knowledge Graph (v0.22.0+)

LLM auto-extracts concept relationships (8 relation types, 3 confidence levels)
Graph queries: neighbors, shortest path, statistics, context
Community detection via Leiden/Louvain algorithms
Surprise Score reports for unexpected connections

Graph Analyzer (v0.28.0+)

PageRank centrality scoring — identify core concepts
Hub/Authority analysis — find highly connected pages
Community auto-labeling and bridge node detection
Suggested page generation for orphan concepts
CLI: llmwikify graph-analyze [--json] [--report]

Graph Visualization (v0.23.0+)

Interactive HTML (pyvis), SVG (graphviz), GraphML (Gephi)

Agent Layer (v0.30.0+) ⚠️ DEPRECATED

Built-in Agent has moved to an independent project. Use external AI agents with the MCP protocol:

llmwikify mcp — Start MCP server for Agent integration
All 20+ wiki tools are available via standard MCP protocol

Legacy Agent is kept for backward compatibility only and will be removed in a future version.

Autonomous Wiki Maintenance — 8 sub-systems: WikiAgent, AgentRunner, TaskScheduler, MemoryManager, NotificationManager, HooksSystem, ToolsRegistry, DreamEditor
Dream Confirmation Flow — Agent proposes changes, human confirms (respects "stay involved" principle)
Scheduled Tasks — Cron-based periodic lint, source analysis, knowledge gap detection
Hook System — Pre/post operation callbacks for custom workflows

Web UI (v0.30.0+)

React + TypeScript SPA — 18 components, Vitest tested
Markdown Editor — Real-time preview, front matter panel
Interactive Graph View — D3.js visualization with PageRank sizing, community coloring, bridge node highlighting
Insights Dashboard — Cross-source synthesis, knowledge gaps, graph analysis
Agent Interface ⚠️ DEPRECATED — Legacy agent UI (removed, use external agents via MCP)
Project Metadata — llmwikify · project-name display, version number indicator

Additional

File extraction — PDF, Word, Excel, PowerPoint, images, audio, YouTube, web URLs via MarkItDown
File watcher — Watch raw/ for new files, optional auto-ingest
MCP server — 20 tools for LLM/Agent integration
Performance — Batch inserts, PRAGMA optimizations, 10-20x faster than naive implementation

📦 Installation

# Basic (zero dependencies)
pip install llmwikify

# Full (all features)
pip install llmwikify[all]

# Development
git clone https://github.com/sn0wfree/llmwikify.git
cd llmwikify
pip install -e ".[dev]"

Optional Extras

Extra	Purpose
`extractors`	Enhanced file extraction (PDF, Office, images, audio)
`mcp`	MCP server support
`watch`	File system watching
`graph`	Graph visualization + community detection
`web`	Web UI support
`all`	Everything above

🚀 Quick Start

1. Initialize

llmwikify init
# Creates: raw/, wiki/, wiki.md, .llmwikify.db

2. Ingest Sources

llmwikify ingest document.pdf           # Extract content
llmwikify ingest document.pdf --self-create  # Auto-create wiki pages
llmwikify ingest https://example.com/article
llmwikify ingest https://youtube.com/watch?v=abc123
llmwikify batch raw/pdfs/ --self-create  # Batch ingest

3. Search and Query

llmwikify search "topic" -l 10
llmwikify references "Page Name" --detail
llmwikify lint --format=brief

4. Analyze Knowledge Graph

llmwikify graph-analyze              # PageRank, communities, suggestions
llmwikify graph-analyze --json       # Programmatic output
llmwikify graph-analyze --report     # Detailed suggested pages report
llmwikify suggest-synthesis          # Cross-source synthesis suggestions
llmwikify knowledge-gaps             # Knowledge gap analysis

5. MCP Server for Agents

llmwikify mcp                        # STDIO (default)
llmwikify mcp --transport http       # HTTP
llmwikify serve --web                # MCP + Web UI

💻 Python API

from llmwikify import Wiki
from pathlib import Path

wiki = Wiki(Path("/path/to/wiki"))
wiki.init()

# Ingest source
result = wiki.ingest_source("document.pdf")

# Create pages
wiki.write_page("Test Page", "# Title\n\nContent with [[Link]]", page_type="Concept")

# Search
results = wiki.search("topic", limit=10)

# Synthesize query answers (knowledge compounding)
wiki.synthesize_query(query="Q?", answer="A...", source_pages=["Page1", "Page2"])

# Knowledge graph
engine = wiki.get_relation_engine()
engine.get_neighbors("Concept")
engine.get_path("A", "B")

# Health check
lint_result = wiki.lint(generate_investigations=True)

# Cross-source synthesis
wiki.suggest_synthesis()

# Graph analysis
graph_result = wiki.graph_analyze()

🗄️ MCP Server (20 Tools)

Tool	Description
`wiki_init`	Initialize wiki structure
`wiki_ingest`	Ingest a source file
`wiki_write_page`	Write/update a wiki page
`wiki_read_page`	Read a wiki page
`wiki_search`	Full-text search with snippets
`wiki_lint`	Health check
`wiki_status`	Status overview
`wiki_log`	Append log entry
`wiki_recommend`	Get recommendations
`wiki_build_index`	Build reference index
`wiki_read_schema`	Read wiki.md (schema)
`wiki_update_schema`	Update wiki.md
`wiki_synthesize`	Save query answer as wiki page
`wiki_sink_status`	Sink buffer overview
`wiki_references`	Page references
`wiki_graph`	Graph query/modify
`wiki_graph_analyze`	Graph export/detect/report/analyze
`wiki_analyze_source`	Analyze raw source file
`wiki_suggest_synthesis`	Cross-source synthesis suggestions
`wiki_knowledge_gaps`	Knowledge gap + outdated + redundancy

🔍 QMD Hybrid Search (Optional)

For larger wikis (1000+ pages), enable QMD for semantic search with LLM reranking:

Hybrid: BM25 keyword + vector embeddings
Query Expansion: LLM generates semantic variants
LLM Reranking: Cross-encoder reorders results
Auto Recommendation: Prompts to enable at scale

# Check status and recommendations
llmwikify qmd status

# Start QMD MCP server (separate process)
qmd mcp --http --port 8181

# Use QMD backend
llmwikify search "your query" --backend qmd
llmwikify qmd search "your query"

See QMD Setup Guide for installation instructions.

🖥️ Web UI

Start the unified FastAPI web server:

llmwikify serve --web                  # Starts MCP + Web UI + REST API on http://localhost:8765
llmwikify serve --web --auth-token=key # With optional API key authentication

Architecture (FastAPI):

🔄 MCP Protocol — /mcp endpoint for AI agent integration
🌐 REST API — /api/wiki/* endpoints with auto-generated docs at /docs
🖥️ Web UI — React SPA static file serving

Features:

📝 Markdown Editor — Live preview, front matter support, wikilink autocomplete
🌐 Graph View — D3.js interactive visualization, PageRank sizing, community colors
📊 Insights Panel — Cross-source synthesis, knowledge gaps, graph analysis
🤖 Agent Console — Chat interface, scheduled tasks, dream proposals & confirmations
📈 Health Dashboard — Broken links, orphans, stale pages, knowledge growth
🔍 Full-text Search — FTS5-powered search with snippets
🔑 Optional Auth — API key authentication for production deployments

⚙️ Configuration

Create .wiki-config.yaml in your wiki root:

orphan_detection:
  exclude_patterns:
    - '^\d{4}-\d{2}-\d{2}$'  # Date pages
    - '^meeting-.*'           # Meeting notes
  archive_directories:
    - 'archive'
    - 'logs'

llm:
  provider: "openai"
  model: "gpt-4o"
  api_key: "env:OPENAI_API_KEY"

mcp:
  host: "127.0.0.1"
  port: 8765
  transport: "stdio"

See Configuration Guide for full options.

📊 CLI Commands

Command	Description	Command	Description
`init`	Initialize wiki	`lint`	Health check
`ingest`	Ingest source	`status`	Status overview
`analyze-source`	Analyze source file	`log`	Record log
`write_page`	Create page	`references`	Show references
`read_page`	Read page	`build-index`	Build index
`search`	Full-text search	`batch`	Batch ingest
`synthesize`	Save query as page	`suggest-synthesis`	Cross-source analysis
`sink-status`	Sink overview	`knowledge-gaps`	Gap analysis
`watch`	Watch for files	`graph-query`	Graph queries
`graph-analyze`	Graph analysis	`export-graph`	Export visualization
`community-detect`	Detect communities	`report`	Surprise report
`mcp`	Start MCP server	`serve`	MCP + Web UI

📖 Documentation

Architecture — Technical architecture, data flows, components
Configuration Guide — Detailed config options
LLM Wiki Principles — Karpathy's original vision
Migration Guide — Version migration notes
Contributing — Development workflow
Known Issues — Known issues and planned fixes

🧪 Testing

pytest                           # All 879+ Python tests
pytest --cov=src/llmwikify       # With coverage
pytest tests/test_p1_features.py # Specific module

# Frontend tests
cd src/llmwikify/web/webui && npm test

🤝 Contributing

Contributions welcome! See CONTRIBUTING.md for development setup, coding standards, and contribution workflow.

🙏 Acknowledgments

llm-wiki-kit — Original inspiration
Andrej Karpathy — LLM Wiki Principles
Obsidian — Markdown wiki platform
MCP — Model Context Protocol

📄 License

MIT License — See LICENSE file.

📬 Contact

GitHub: @sn0wfree
Email: linlu1234567@sina.com
Discussions: GitHub Discussions

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/workflows		.github/workflows
archive		archive
docs		docs
examples		examples
scripts		scripts
src/llmwikify		src/llmwikify
tests		tests
wiki		wiki
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
.wiki-config.yaml.example		.wiki-config.yaml.example
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
MIGRATION.md		MIGRATION.md
Makefile		Makefile
README.md		README.md
opencode.json		opencode.json
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

llmwikify

🎯 What is llmwikify?

Core Philosophy

✨ Features

Core

Source Analysis (v0.26.0+)

Cross-Source Synthesis (v0.28.0+)

Smart Lint 2.0 (v0.28.0+)

Knowledge Graph (v0.22.0+)

Graph Analyzer (v0.28.0+)

Graph Visualization (v0.23.0+)

Agent Layer (v0.30.0+) ⚠️ DEPRECATED

Web UI (v0.30.0+)

Additional

📦 Installation

Optional Extras

🚀 Quick Start

1. Initialize

2. Ingest Sources

3. Search and Query

4. Analyze Knowledge Graph

5. MCP Server for Agents

💻 Python API

🗄️ MCP Server (20 Tools)

🔍 QMD Hybrid Search (Optional)

🖥️ Web UI

⚙️ Configuration

📊 CLI Commands

📖 Documentation

🧪 Testing

🤝 Contributing

🙏 Acknowledgments

📄 License

📬 Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages