dynamik-dev
diff --git a/‎.github/workflows/publish.yml‎
Lines changed: 29 additions & 0 deletions b/‎.github/workflows/publish.yml‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎.github/workflows/test.yml‎
Lines changed: 19 additions & 0 deletions b/‎.github/workflows/test.yml‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 5 additions & 0 deletions b/‎.gitignore‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎.refdocs.json‎
Lines changed: 11 additions & 0 deletions b/‎.refdocs.json‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 163 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 163 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 111 additions & 0 deletions b/‎README.md‎
Lines changed: 111 additions & 0 deletions
@@ -0,0 +1,29 @@
+name: Publish to GitHub Packages
+
+on:
+  push:
+    tags:
+      - "v*"
+
+permissions:
+  contents: read
+  packages: write
+
+jobs:
+  publish:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 22
+          registry-url: https://npm.pkg.github.com
+
+      - run: npm ci
+      - run: npm test
+      - run: npx tsc
+
+      - run: npm publish
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -0,0 +1,19 @@
+name: Tests
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 22
+
+      - run: npm ci
+      - run: npm test
@@ -0,0 +1,5 @@
+node_modules/
+dist/
+.refdocs-index.json
+refdocs
+*.tsbuildinfo
@@ -0,0 +1,11 @@
+{
+  "paths": ["ref-docs"],
+  "index": ".refdocs-index.json",
+  "chunkMaxTokens": 800,
+  "chunkMinTokens": 100,
+  "boostFields": {
+    "title": 2,
+    "headings": 1.5,
+    "body": 1
+  }
+}
@@ -0,0 +1,163 @@
+# refdocs
+
+A local CLI tool that indexes markdown documentation and exposes fast fuzzy search with intelligent chunking. Designed to give LLM coding agents efficient, token-conscious access to project documentation without MCP servers, network calls, or full-file context dumps.
+
+## Architecture
+
+```
+refdocs/
+├── src/
+│   ├── index.ts          # CLI entrypoint (commander)
+│   ├── indexer.ts         # Walks target dir, chunks md files, builds search index
+│   ├── chunker.ts         # Splits markdown by heading hierarchy into right-sized chunks
+│   ├── search.ts          # MiniSearch wrapper, query + rank + format results
+│   └── config.ts          # Reads .refdocs.json config
+├── .refdocs.json          # Example config
+├── package.json
+├── tsconfig.json
+└── README.md
+```
+
+## Tech Stack
+
+- **Runtime**: Node/Bun (target `bun build --compile` for single binary)
+- **Language**: TypeScript, strict mode
+- **Search engine**: MiniSearch — pure JS, ~7kb, fuzzy matching, field boosting, prefix search
+- **CLI framework**: Commander
+- **Markdown parsing**: markdown-it or remark for heading extraction (evaluate which is lighter)
+- **Zero external services** — no network calls, no API keys, everything local
+
+## Config
+
+`.refdocs.json` at project root:
+
+```json
+{
+  "paths": ["ref-docs"],
+  "index": ".refdocs-index.json",
+  "chunkMaxTokens": 800,
+  "chunkMinTokens": 100,
+  "boostFields": {
+    "title": 2,
+    "headings": 1.5,
+    "body": 1
+  }
+}
+```
+
+- `paths` — array of directories to index (relative to project root)
+- `index` — where to persist the serialized search index (gitignored)
+- `chunkMaxTokens` — upper bound for chunk size, rough estimate (chars / 4)
+- `chunkMinTokens` — minimum chunk size; merge small sections with their parent
+- `boostFields` — field relevance weights for search ranking
+
+## CLI Commands
+
+### `refdocs index`
+
+Walk all configured paths, chunk every `.md` file, build and persist the MiniSearch index.
+
+- Parse each markdown file into chunks split by heading boundaries (h1 > h2 > h3)
+- Each chunk gets metadata: `{ id, file, title, headings, body, startLine, endLine }`
+- Small sections (below `chunkMinTokens`) merge into their parent heading's chunk
+- Large sections (above `chunkMaxTokens`) split at paragraph boundaries
+- Serialize index to `.refdocs-index.json`
+- Print summary: files indexed, chunks created, index size
+
+### `refdocs search <query>`
+
+Fuzzy search the index and return the top chunks.
+
+- Load persisted index (error if not built yet)
+- Run MiniSearch with fuzzy matching (fuzzy: 0.2), prefix search enabled
+- Return top 3 results by default
+- Output format: each chunk preceded by a comment with source file and line range
+
+**Flags:**
+- `-n, --results <count>` — number of results (default: 3, max: 10)
+- `-f, --file <pattern>` — filter results to files matching glob
+- `--json` — output results as JSON array instead of formatted text
+- `--raw` — output chunk body only, no metadata header (for piping)
+
+### `refdocs list`
+
+List all indexed files and their chunk counts. Useful for verifying what's in the index.
+
+### `refdocs info <file>`
+
+Show all chunks for a specific file with their headings and token estimates.
+
+## Chunking Strategy
+
+This is the core value of the tool. Chunks must be:
+
+1. **Semantically coherent** — never split mid-section. Heading boundaries are the primary split points.
+2. **Right-sized for LLM context** — 100-800 tokens. Big enough to be useful, small enough to not waste context.
+3. **Hierarchical** — each chunk carries its full heading breadcrumb (e.g. `Configuration > Database > Connections`) so the LLM understands where the chunk fits.
+
+Algorithm:
+1. Parse markdown into AST
+2. Walk AST and split at heading nodes (h1, h2, h3)
+3. Each section becomes a candidate chunk with its heading breadcrumb
+4. If chunk < minTokens, merge with previous sibling or parent
+5. If chunk > maxTokens, split at paragraph boundaries (double newline)
+6. Attach metadata: source file path, line range, heading trail
+
+## Output Format
+
+Default output for `refdocs search "data transformers"`:
+
+```
+# [1] spatie-laravel-data/transformers.md:15-48
+# Transformers > Built-in Transformers
+
+Transformers are used to convert data properties when...
+<chunk body here>
+
+---
+
+# [2] spatie-laravel-data/creating-data-objects.md:72-95
+# Creating Data Objects > Casting and Transforming
+
+When creating a data object from a request...
+<chunk body here>
+```
+
+JSON output (`--json`) returns:
+
+```json
+[
+  {
+    "score": 12.45,
+    "file": "spatie-laravel-data/transformers.md",
+    "lines": [15, 48],
+    "headings": ["Transformers", "Built-in Transformers"],
+    "body": "..."
+  }
+]
+```
+
+## Design Principles
+
+- **No runtime dependencies beyond the binary** — everything bundles into one file
+- **Fast** — indexing a typical ref-docs folder (50 files) should take <1s. Search should be <50ms.
+- **Deterministic** — same docs, same index. No embeddings, no ML, no probabilistic retrieval.
+- **Composable** — output is plain text or JSON. Pipe it wherever you want.
+- **Offline** — works air-gapped, on a plane, in a container with no egress
+
+## Code Style
+
+- TypeScript strict mode, no `any`
+- Pure functions where possible, side effects at the edges (CLI entrypoint, file I/O)
+- No classes unless genuinely needed — prefer modules with exported functions
+- Error messages should be actionable: "Index not found. Run `refdocs index` first."
+- Tests with Vitest, focus on chunker logic and search relevance
+
+## Future Considerations (not MVP)
+
+- `refdocs watch` — rebuild index on file change
+- `refdocs add <url>` — fetch a URL, convert to markdown, save to ref-docs
+- `refdocs update` — re-pull docs from configured upstream sources (git repos, URLs)
+- MCP server mode — expose search as an MCP tool for editors that prefer it
+- Token counting with tiktoken instead of chars/4 estimate
+- Embedding-based search as optional mode (would require onnxruntime or similar)
@@ -0,0 +1,111 @@
+# refdocs
+
+Index your markdown docs. Search them fast. Get back only what matters.
+
+Built for LLM coding agents that need token-conscious access to project documentation — no network calls, no API keys, no MCP servers. Just a single binary and a JSON index file.
+
+```bash
+$ refdocs search "database connections"
+
+# [1] config/database.md:12-34
+# Configuration > Database > Connections
+
+Connection pooling is configured via the `pool` key in your
+database config. Each connection type supports `min`, `max`,
+and `idle_timeout` options...
+
+---
+
+# [2] guides/troubleshooting.md:88-104
+# Troubleshooting > Database > Connection Refused
+
+If you see "ECONNREFUSED", check that your database server
+is running and the host/port in your config matches...
+```
+
+refdocs chunks markdown at heading boundaries into 100-800 token pieces, indexes them with fuzzy search, and returns only the relevant chunks — not entire files.
+
+## Install
+
+From GitHub Packages:
+
+```bash
+npm install -g @dynamik-dev/refdocs --registry=https://npm.pkg.github.com
+```
+
+Or build from source:
+
+```bash
+bun install && bun run build
+```
+
+Produces a standalone `./refdocs` binary. Or run directly:
+
+```bash
+bun src/index.ts <command>
+```
+
+## Usage
+
+```bash
+# Point at your docs directory
+echo '{ "paths": ["docs"] }' > .refdocs.json
+
+# Build the index
+refdocs index
+# Indexed 42 files -> 156 chunks (45.2 KB, 320ms)
+
+# Search
+refdocs search "authentication"
+refdocs search "config" -n 5              # top 5 results
+refdocs search "api" -f "api/**/*.md"     # filter by file glob
+refdocs search "hooks" --json             # structured output
+refdocs search "auth" --raw               # body only, for piping
+
+# Inspect the index
+refdocs list                              # files and chunk counts
+refdocs info "api/auth.md"               # chunks in a specific file
+```
+
+## How it works
+
+1. **Index** — parses each `.md` file into an AST, splits at h1/h2/h3 boundaries, merges small sections, splits large ones at paragraph breaks. Each chunk keeps its full heading breadcrumb (`Config > Database > Connections`).
+
+2. **Search** — fuzzy matching (20% edit tolerance) with prefix search and field boosting. Titles weighted 2x, headings 1.5x, body 1x. Results ranked by TF-IDF. File-level glob filtering via `-f`.
+
+3. **Output** — human-readable by default, `--json` for structured consumption, `--raw` for piping. Each result includes source file, line range, and heading trail.
+
+## Configuration
+
+`.refdocs.json` at project root:
+
+```json
+{
+  "paths": ["docs"],
+  "index": ".refdocs-index.json",
+  "chunkMaxTokens": 800,
+  "chunkMinTokens": 100,
+  "boostFields": { "title": 2, "headings": 1.5, "body": 1 }
+}
+```
+
+All fields optional. See [Configuration](docs/configuration.md) for details.
+
+## Documentation
+
+- [Getting Started](docs/getting-started.md) — installation, quick start, and overview
+- [CLI Reference](docs/cli-reference.md) — commands, flags, output formats, and exit codes
+- [Configuration](docs/configuration.md) — `.refdocs.json` options with defaults and examples
+- [Chunking](docs/chunking.md) — the 3-pass splitting algorithm and chunk structure
+- [Search](docs/search.md) — fuzzy matching, boosting, scoring, and index persistence
+
+## Tech
+
+| Dependency | Role |
+|------------|------|
+| [MiniSearch](https://github.com/lucaong/minisearch) | Full-text fuzzy search (~7kb, pure JS) |
+| [Commander](https://github.com/tj/commander.js) | CLI framework |
+| [mdast-util-from-markdown](https://github.com/syntax-tree/mdast-util-from-markdown) | Markdown AST parsing |
+| [picomatch](https://github.com/micromatch/picomatch) | Glob pattern matching |
+
+Zero external services. Works offline, in containers, on planes.