Skip to content

sn0wfree/llmwikify

Repository files navigation

llmwikify

Build persistent, LLM-maintained knowledge bases β€” Based on Karpathy's LLM Wiki Principles

PyPI version Python 3.10+ License: MIT Tests: 1008+ passing


⚠️ Beta Release β€” You may encounter bugs or breaking changes. Please report issues on GitHub.


🎯 What is llmwikify?

llmwikify is a general-purpose LLM-Wiki management tool that helps you build and maintain a persistent knowledge base. Unlike RAG systems that rediscover knowledge from scratch on every query, llmwikify incrementally builds and maintains a structured, interlinked wiki that compounds over time.

Core Philosophy

The wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read.

Based on Karpathy's LLM Wiki Principles:

  • πŸ“š Raw sources β€” Your immutable source documents in raw/
  • πŸ“ The wiki β€” LLM-maintained markdown pages with cross-references
  • βš™οΈ The schema β€” wiki.md that tells the LLM how to maintain the wiki

✨ Features

Core

  • SQLite FTS5 search β€” Porter stemmer, BM25 ranking, 0.06s for 157 pages
  • Bidirectional references β€” Automatic [[wikilink]] detection with section-level granularity
  • Query compounding β€” Save query answers as persistent wiki pages (wiki_synthesize)
  • Query sink β€” Buffer pending updates for later review with urgency tracking

Source Analysis (v0.26.0+)

  • analyze-source CLI with --all and --force support
  • Caches LLM extraction results (entities, relations, suggested pages)
  • Powers schema-aware lint gap detection

Cross-Source Synthesis (v0.28.0+)

  • Detects reinforced claims, contradictions, knowledge gaps across sources
  • Returns suggestions only β€” human decides what to do with them
  • CLI: llmwikify suggest-synthesis [source]

Smart Lint 2.0 (v0.28.0+)

  • Detects broken links, orphan pages, contradictions, data gaps
  • New: outdated pages, knowledge gaps, redundancy alerts
  • CLI: llmwikify lint [--format=full|brief|recommendations|json]
  • CLI: llmwikify knowledge-gaps

Knowledge Graph (v0.22.0+)

  • LLM auto-extracts concept relationships (8 relation types, 3 confidence levels)
  • Graph queries: neighbors, shortest path, statistics, context
  • Community detection via Leiden/Louvain algorithms
  • Surprise Score reports for unexpected connections

Graph Analyzer (v0.28.0+)

  • PageRank centrality scoring β€” identify core concepts
  • Hub/Authority analysis β€” find highly connected pages
  • Community auto-labeling and bridge node detection
  • Suggested page generation for orphan concepts
  • CLI: llmwikify graph-analyze [--json] [--report]

Graph Visualization (v0.23.0+)

  • Interactive HTML (pyvis), SVG (graphviz), GraphML (Gephi)

Agent Layer (v0.30.0+) ⚠️ DEPRECATED

Built-in Agent has moved to an independent project. Use external AI agents with the MCP protocol:

  • llmwikify mcp β€” Start MCP server for Agent integration
  • All 20+ wiki tools are available via standard MCP protocol

Legacy Agent is kept for backward compatibility only and will be removed in a future version.

  • Autonomous Wiki Maintenance β€” 8 sub-systems: WikiAgent, AgentRunner, TaskScheduler, MemoryManager, NotificationManager, HooksSystem, ToolsRegistry, DreamEditor
  • Dream Confirmation Flow β€” Agent proposes changes, human confirms (respects "stay involved" principle)
  • Scheduled Tasks β€” Cron-based periodic lint, source analysis, knowledge gap detection
  • Hook System β€” Pre/post operation callbacks for custom workflows

Web UI (v0.30.0+)

  • React + TypeScript SPA β€” 18 components, Vitest tested
  • Markdown Editor β€” Real-time preview, front matter panel
  • Interactive Graph View β€” D3.js visualization with PageRank sizing, community coloring, bridge node highlighting
  • Insights Dashboard β€” Cross-source synthesis, knowledge gaps, graph analysis
  • Agent Interface ⚠️ DEPRECATED β€” Legacy agent UI (removed, use external agents via MCP)
  • Project Metadata β€” llmwikify Β· project-name display, version number indicator

Additional

  • File extraction β€” PDF, Word, Excel, PowerPoint, images, audio, YouTube, web URLs via MarkItDown
  • File watcher β€” Watch raw/ for new files, optional auto-ingest
  • MCP server β€” 20 tools for LLM/Agent integration
  • Performance β€” Batch inserts, PRAGMA optimizations, 10-20x faster than naive implementation

πŸ“¦ Installation

# Basic (zero dependencies)
pip install llmwikify

# Full (all features)
pip install llmwikify[all]

# Development
git clone https://github.com/sn0wfree/llmwikify.git
cd llmwikify
pip install -e ".[dev]"

Optional Extras

Extra Purpose
extractors Enhanced file extraction (PDF, Office, images, audio)
mcp MCP server support
watch File system watching
graph Graph visualization + community detection
web Web UI support
all Everything above

πŸš€ Quick Start

1. Initialize

llmwikify init
# Creates: raw/, wiki/, wiki.md, .llmwikify.db

2. Ingest Sources

llmwikify ingest document.pdf           # Extract content
llmwikify ingest document.pdf --self-create  # Auto-create wiki pages
llmwikify ingest https://example.com/article
llmwikify ingest https://youtube.com/watch?v=abc123
llmwikify batch raw/pdfs/ --self-create  # Batch ingest

3. Search and Query

llmwikify search "topic" -l 10
llmwikify references "Page Name" --detail
llmwikify lint --format=brief

4. Analyze Knowledge Graph

llmwikify graph-analyze              # PageRank, communities, suggestions
llmwikify graph-analyze --json       # Programmatic output
llmwikify graph-analyze --report     # Detailed suggested pages report
llmwikify suggest-synthesis          # Cross-source synthesis suggestions
llmwikify knowledge-gaps             # Knowledge gap analysis

5. MCP Server for Agents

llmwikify mcp                        # STDIO (default)
llmwikify mcp --transport http       # HTTP
llmwikify serve --web                # MCP + Web UI

πŸ’» Python API

from llmwikify import Wiki
from pathlib import Path

wiki = Wiki(Path("/path/to/wiki"))
wiki.init()

# Ingest source
result = wiki.ingest_source("document.pdf")

# Create pages
wiki.write_page("Test Page", "# Title\n\nContent with [[Link]]", page_type="Concept")

# Search
results = wiki.search("topic", limit=10)

# Synthesize query answers (knowledge compounding)
wiki.synthesize_query(query="Q?", answer="A...", source_pages=["Page1", "Page2"])

# Knowledge graph
engine = wiki.get_relation_engine()
engine.get_neighbors("Concept")
engine.get_path("A", "B")

# Health check
lint_result = wiki.lint(generate_investigations=True)

# Cross-source synthesis
wiki.suggest_synthesis()

# Graph analysis
graph_result = wiki.graph_analyze()

πŸ—„οΈ MCP Server (20 Tools)

Tool Description
wiki_init Initialize wiki structure
wiki_ingest Ingest a source file
wiki_write_page Write/update a wiki page
wiki_read_page Read a wiki page
wiki_search Full-text search with snippets
wiki_lint Health check
wiki_status Status overview
wiki_log Append log entry
wiki_recommend Get recommendations
wiki_build_index Build reference index
wiki_read_schema Read wiki.md (schema)
wiki_update_schema Update wiki.md
wiki_synthesize Save query answer as wiki page
wiki_sink_status Sink buffer overview
wiki_references Page references
wiki_graph Graph query/modify
wiki_graph_analyze Graph export/detect/report/analyze
wiki_analyze_source Analyze raw source file
wiki_suggest_synthesis Cross-source synthesis suggestions
wiki_knowledge_gaps Knowledge gap + outdated + redundancy

πŸ” QMD Hybrid Search (Optional)

For larger wikis (1000+ pages), enable QMD for semantic search with LLM reranking:

  • Hybrid: BM25 keyword + vector embeddings
  • Query Expansion: LLM generates semantic variants
  • LLM Reranking: Cross-encoder reorders results
  • Auto Recommendation: Prompts to enable at scale
# Check status and recommendations
llmwikify qmd status

# Start QMD MCP server (separate process)
qmd mcp --http --port 8181

# Use QMD backend
llmwikify search "your query" --backend qmd
llmwikify qmd search "your query"

See QMD Setup Guide for installation instructions.


πŸ–₯️ Web UI

Start the unified FastAPI web server:

llmwikify serve --web                  # Starts MCP + Web UI + REST API on http://localhost:8765
llmwikify serve --web --auth-token=key # With optional API key authentication

Architecture (FastAPI):

  • πŸ”„ MCP Protocol β€” /mcp endpoint for AI agent integration
  • 🌐 REST API β€” /api/wiki/* endpoints with auto-generated docs at /docs
  • πŸ–₯️ Web UI β€” React SPA static file serving

Features:

  • πŸ“ Markdown Editor β€” Live preview, front matter support, wikilink autocomplete
  • 🌐 Graph View β€” D3.js interactive visualization, PageRank sizing, community colors
  • πŸ“Š Insights Panel β€” Cross-source synthesis, knowledge gaps, graph analysis
  • πŸ€– Agent Console β€” Chat interface, scheduled tasks, dream proposals & confirmations
  • πŸ“ˆ Health Dashboard β€” Broken links, orphans, stale pages, knowledge growth
  • πŸ” Full-text Search β€” FTS5-powered search with snippets
  • πŸ”‘ Optional Auth β€” API key authentication for production deployments

βš™οΈ Configuration

Create .wiki-config.yaml in your wiki root:

orphan_detection:
  exclude_patterns:
    - '^\d{4}-\d{2}-\d{2}$'  # Date pages
    - '^meeting-.*'           # Meeting notes
  archive_directories:
    - 'archive'
    - 'logs'

llm:
  provider: "openai"
  model: "gpt-4o"
  api_key: "env:OPENAI_API_KEY"

mcp:
  host: "127.0.0.1"
  port: 8765
  transport: "stdio"

See Configuration Guide for full options.


πŸ“Š CLI Commands

Command Description Command Description
init Initialize wiki lint Health check
ingest Ingest source status Status overview
analyze-source Analyze source file log Record log
write_page Create page references Show references
read_page Read page build-index Build index
search Full-text search batch Batch ingest
synthesize Save query as page suggest-synthesis Cross-source analysis
sink-status Sink overview knowledge-gaps Gap analysis
watch Watch for files graph-query Graph queries
graph-analyze Graph analysis export-graph Export visualization
community-detect Detect communities report Surprise report
mcp Start MCP server serve MCP + Web UI

πŸ“– Documentation


πŸ§ͺ Testing

pytest                           # All 879+ Python tests
pytest --cov=src/llmwikify       # With coverage
pytest tests/test_p1_features.py # Specific module

# Frontend tests
cd src/llmwikify/web/webui && npm test

🀝 Contributing

Contributions welcome! See CONTRIBUTING.md for development setup, coding standards, and contribution workflow.


πŸ™ Acknowledgments

  • llm-wiki-kit β€” Original inspiration
  • Andrej Karpathy β€” LLM Wiki Principles
  • Obsidian β€” Markdown wiki platform
  • MCP β€” Model Context Protocol

πŸ“„ License

MIT License β€” See LICENSE file.

πŸ“¬ Contact

About

llmwikify

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors