Build persistent, LLM-maintained knowledge bases β Based on Karpathy's LLM Wiki Principles
β οΈ Beta Release β You may encounter bugs or breaking changes. Please report issues on GitHub.
llmwikify is a general-purpose LLM-Wiki management tool that helps you build and maintain a persistent knowledge base. Unlike RAG systems that rediscover knowledge from scratch on every query, llmwikify incrementally builds and maintains a structured, interlinked wiki that compounds over time.
The wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read.
Based on Karpathy's LLM Wiki Principles:
- π Raw sources β Your immutable source documents in
raw/ - π The wiki β LLM-maintained markdown pages with cross-references
- βοΈ The schema β
wiki.mdthat tells the LLM how to maintain the wiki
- SQLite FTS5 search β Porter stemmer, BM25 ranking, 0.06s for 157 pages
- Bidirectional references β Automatic
[[wikilink]]detection with section-level granularity - Query compounding β Save query answers as persistent wiki pages (
wiki_synthesize) - Query sink β Buffer pending updates for later review with urgency tracking
analyze-sourceCLI with--alland--forcesupport- Caches LLM extraction results (entities, relations, suggested pages)
- Powers schema-aware lint gap detection
- Detects reinforced claims, contradictions, knowledge gaps across sources
- Returns suggestions only β human decides what to do with them
- CLI:
llmwikify suggest-synthesis [source]
- Detects broken links, orphan pages, contradictions, data gaps
- New: outdated pages, knowledge gaps, redundancy alerts
- CLI:
llmwikify lint [--format=full|brief|recommendations|json] - CLI:
llmwikify knowledge-gaps
- LLM auto-extracts concept relationships (8 relation types, 3 confidence levels)
- Graph queries: neighbors, shortest path, statistics, context
- Community detection via Leiden/Louvain algorithms
- Surprise Score reports for unexpected connections
- PageRank centrality scoring β identify core concepts
- Hub/Authority analysis β find highly connected pages
- Community auto-labeling and bridge node detection
- Suggested page generation for orphan concepts
- CLI:
llmwikify graph-analyze [--json] [--report]
- Interactive HTML (pyvis), SVG (graphviz), GraphML (Gephi)
Built-in Agent has moved to an independent project. Use external AI agents with the MCP protocol:
llmwikify mcpβ Start MCP server for Agent integration- All 20+ wiki tools are available via standard MCP protocol
Legacy Agent is kept for backward compatibility only and will be removed in a future version.
- Autonomous Wiki Maintenance β 8 sub-systems: WikiAgent, AgentRunner, TaskScheduler, MemoryManager, NotificationManager, HooksSystem, ToolsRegistry, DreamEditor
- Dream Confirmation Flow β Agent proposes changes, human confirms (respects "stay involved" principle)
- Scheduled Tasks β Cron-based periodic lint, source analysis, knowledge gap detection
- Hook System β Pre/post operation callbacks for custom workflows
- React + TypeScript SPA β 18 components, Vitest tested
- Markdown Editor β Real-time preview, front matter panel
- Interactive Graph View β D3.js visualization with PageRank sizing, community coloring, bridge node highlighting
- Insights Dashboard β Cross-source synthesis, knowledge gaps, graph analysis
- Agent Interface
β οΈ DEPRECATED β Legacy agent UI (removed, use external agents via MCP) - Project Metadata β
llmwikify Β· project-namedisplay, version number indicator
- File extraction β PDF, Word, Excel, PowerPoint, images, audio, YouTube, web URLs via MarkItDown
- File watcher β Watch
raw/for new files, optional auto-ingest - MCP server β 20 tools for LLM/Agent integration
- Performance β Batch inserts, PRAGMA optimizations, 10-20x faster than naive implementation
# Basic (zero dependencies)
pip install llmwikify
# Full (all features)
pip install llmwikify[all]
# Development
git clone https://github.com/sn0wfree/llmwikify.git
cd llmwikify
pip install -e ".[dev]"| Extra | Purpose |
|---|---|
extractors |
Enhanced file extraction (PDF, Office, images, audio) |
mcp |
MCP server support |
watch |
File system watching |
graph |
Graph visualization + community detection |
web |
Web UI support |
all |
Everything above |
llmwikify init
# Creates: raw/, wiki/, wiki.md, .llmwikify.dbllmwikify ingest document.pdf # Extract content
llmwikify ingest document.pdf --self-create # Auto-create wiki pages
llmwikify ingest https://example.com/article
llmwikify ingest https://youtube.com/watch?v=abc123
llmwikify batch raw/pdfs/ --self-create # Batch ingestllmwikify search "topic" -l 10
llmwikify references "Page Name" --detail
llmwikify lint --format=briefllmwikify graph-analyze # PageRank, communities, suggestions
llmwikify graph-analyze --json # Programmatic output
llmwikify graph-analyze --report # Detailed suggested pages report
llmwikify suggest-synthesis # Cross-source synthesis suggestions
llmwikify knowledge-gaps # Knowledge gap analysisllmwikify mcp # STDIO (default)
llmwikify mcp --transport http # HTTP
llmwikify serve --web # MCP + Web UIfrom llmwikify import Wiki
from pathlib import Path
wiki = Wiki(Path("/path/to/wiki"))
wiki.init()
# Ingest source
result = wiki.ingest_source("document.pdf")
# Create pages
wiki.write_page("Test Page", "# Title\n\nContent with [[Link]]", page_type="Concept")
# Search
results = wiki.search("topic", limit=10)
# Synthesize query answers (knowledge compounding)
wiki.synthesize_query(query="Q?", answer="A...", source_pages=["Page1", "Page2"])
# Knowledge graph
engine = wiki.get_relation_engine()
engine.get_neighbors("Concept")
engine.get_path("A", "B")
# Health check
lint_result = wiki.lint(generate_investigations=True)
# Cross-source synthesis
wiki.suggest_synthesis()
# Graph analysis
graph_result = wiki.graph_analyze()| Tool | Description |
|---|---|
wiki_init |
Initialize wiki structure |
wiki_ingest |
Ingest a source file |
wiki_write_page |
Write/update a wiki page |
wiki_read_page |
Read a wiki page |
wiki_search |
Full-text search with snippets |
wiki_lint |
Health check |
wiki_status |
Status overview |
wiki_log |
Append log entry |
wiki_recommend |
Get recommendations |
wiki_build_index |
Build reference index |
wiki_read_schema |
Read wiki.md (schema) |
wiki_update_schema |
Update wiki.md |
wiki_synthesize |
Save query answer as wiki page |
wiki_sink_status |
Sink buffer overview |
wiki_references |
Page references |
wiki_graph |
Graph query/modify |
wiki_graph_analyze |
Graph export/detect/report/analyze |
wiki_analyze_source |
Analyze raw source file |
wiki_suggest_synthesis |
Cross-source synthesis suggestions |
wiki_knowledge_gaps |
Knowledge gap + outdated + redundancy |
For larger wikis (1000+ pages), enable QMD for semantic search with LLM reranking:
- Hybrid: BM25 keyword + vector embeddings
- Query Expansion: LLM generates semantic variants
- LLM Reranking: Cross-encoder reorders results
- Auto Recommendation: Prompts to enable at scale
# Check status and recommendations
llmwikify qmd status
# Start QMD MCP server (separate process)
qmd mcp --http --port 8181
# Use QMD backend
llmwikify search "your query" --backend qmd
llmwikify qmd search "your query"See QMD Setup Guide for installation instructions.
Start the unified FastAPI web server:
llmwikify serve --web # Starts MCP + Web UI + REST API on http://localhost:8765
llmwikify serve --web --auth-token=key # With optional API key authenticationArchitecture (FastAPI):
- π MCP Protocol β
/mcpendpoint for AI agent integration - π REST API β
/api/wiki/*endpoints with auto-generated docs at/docs - π₯οΈ Web UI β React SPA static file serving
Features:
- π Markdown Editor β Live preview, front matter support, wikilink autocomplete
- π Graph View β D3.js interactive visualization, PageRank sizing, community colors
- π Insights Panel β Cross-source synthesis, knowledge gaps, graph analysis
- π€ Agent Console β Chat interface, scheduled tasks, dream proposals & confirmations
- π Health Dashboard β Broken links, orphans, stale pages, knowledge growth
- π Full-text Search β FTS5-powered search with snippets
- π Optional Auth β API key authentication for production deployments
Create .wiki-config.yaml in your wiki root:
orphan_detection:
exclude_patterns:
- '^\d{4}-\d{2}-\d{2}$' # Date pages
- '^meeting-.*' # Meeting notes
archive_directories:
- 'archive'
- 'logs'
llm:
provider: "openai"
model: "gpt-4o"
api_key: "env:OPENAI_API_KEY"
mcp:
host: "127.0.0.1"
port: 8765
transport: "stdio"See Configuration Guide for full options.
| Command | Description | Command | Description |
|---|---|---|---|
init |
Initialize wiki | lint |
Health check |
ingest |
Ingest source | status |
Status overview |
analyze-source |
Analyze source file | log |
Record log |
write_page |
Create page | references |
Show references |
read_page |
Read page | build-index |
Build index |
search |
Full-text search | batch |
Batch ingest |
synthesize |
Save query as page | suggest-synthesis |
Cross-source analysis |
sink-status |
Sink overview | knowledge-gaps |
Gap analysis |
watch |
Watch for files | graph-query |
Graph queries |
graph-analyze |
Graph analysis | export-graph |
Export visualization |
community-detect |
Detect communities | report |
Surprise report |
mcp |
Start MCP server | serve |
MCP + Web UI |
- Architecture β Technical architecture, data flows, components
- Configuration Guide β Detailed config options
- LLM Wiki Principles β Karpathy's original vision
- Migration Guide β Version migration notes
- Contributing β Development workflow
- Known Issues β Known issues and planned fixes
pytest # All 879+ Python tests
pytest --cov=src/llmwikify # With coverage
pytest tests/test_p1_features.py # Specific module
# Frontend tests
cd src/llmwikify/web/webui && npm testContributions welcome! See CONTRIBUTING.md for development setup, coding standards, and contribution workflow.
- llm-wiki-kit β Original inspiration
- Andrej Karpathy β LLM Wiki Principles
- Obsidian β Markdown wiki platform
- MCP β Model Context Protocol
MIT License β See LICENSE file.
- GitHub: @sn0wfree
- Email: linlu1234567@sina.com
- Discussions: GitHub Discussions