MinSync

Manifest-based incremental vector DB indexing CLI. No git required. Written in Rust.

Install

git clone https://github.com/NomaDamas/MinSync.git
cd MinSync
cargo build --release
export OPENAI_API_KEY="sk-..."

Usage

minsync init                          # initialize .minsync/
minsync sync                          # index files (incremental)
minsync sync --full                   # rebuild from scratch
minsync query "search text" --k 5     # semantic search
minsync watch                         # watch .md/.txt files and incrementally re-index on change
minsync status                        # sync state
minsync check                         # health check
minsync verify --fix                  # consistency check + repair

How it works

MinSync scans your directory, detects file changes via manifest comparison (mtime + size + SHA-256 content hash), chunks changed files with a recursive chunker (built on chonkie-core's split/merge primitives) — paragraph→sentence→line boundaries merged to a size budget — embeds them via OpenAI, and stores vectors locally. Only changed content gets re-embedded. Stale chunks are automatically swept.

State lives in .minsync/. Delete it to start fresh.

Vector store

MinSync stores vectors in an embedded LanceDB database (vectorstore.id = "lancedb", the default). MinSync vendors protoc through protobuf-src for its own build script on non-Windows targets; Windows builds and LanceDB's dependency build scripts need a protoc binary available on PATH.

Set the embedding dimension to match your embedder in .minsync/config.toml:

[vectorstore]
id = "lancedb"
[vectorstore.options]
dimension = 1536   # openai:text-embedding-3-small; use 384 for e5-small, 1024 for bge-m3

ANN indexing. A fresh table is searched by exhaustive flat scan (exact, 100% recall, but O(n)). Once a table grows past index_build_threshold chunks (default 256), MinSync builds an IVF-HNSW-SQ approximate-nearest-neighbour index — pinned to cosine distance to match query time — on the next sync/flush, so similarity search is accelerated. Newly synced chunks are immediately searchable via LanceDB's combined indexed + flat-over-delta search; once the unindexed delta reaches index_optimize_delta_threshold (default 10,000) MinSync folds it into the existing index incrementally (no full rebuild, no k-means retrain). No manual step is required.

These two thresholds are independent tuning knobs, not a min/max pair and not capacity limits — any number of vectors can be stored and searched. index_build_threshold gates the one-time index build (compared against total rows); index_optimize_delta_threshold gates each incremental optimize (compared against the unindexed delta only). Override them in .minsync/config.toml:

[vectorstore.options]
dimension = 1536
index_build_threshold = 256              # build the ANN index once total rows hit this
index_optimize_delta_threshold = 50000   # raise to optimize less often; lower for tighter query latency

Local embeddings (no OpenAI) — Hugging Face TEI

You can run MinSync entirely offline using a local Text Embeddings Inference server. No OPENAI_API_KEY needed.

1. Install and launch TEI (macOS Apple Silicon)

brew install text-embeddings-inference

text-embeddings-router --model-id intfloat/multilingual-e5-small --port 8080 --dtype float32

The first run downloads the model (~470 MB) to ~/.cache/huggingface. Once it's ready:

curl http://localhost:8080/health   # should return 200

2. Configure MinSync

Either run minsync init --embedder tei:intfloat/multilingual-e5-small and then edit .minsync/config.toml, or set the [embedder] section directly:

[embedder]
id = "tei:intfloat/multilingual-e5-small"
base_url = "http://localhost:8080"
query_prefix = "query: "
passage_prefix = "passage: "

e5 models require input prefixes for best retrieval quality: query: is prepended to search queries, passage: to indexed documents. MinSync applies these automatically from the config.

3. LanceDB dimension note

intfloat/multilingual-e5-small produces 384-dimensional vectors. Set the dimension to match the embedder:

[vectorstore]
id = "lancedb"
[vectorstore.options]
dimension = 384

4. Run normally

minsync sync --full          # first sync (--full because init baselines the manifest)
minsync query "검색어" --k 5
minsync watch

Alternative models

Model	Dim	Notes
`intfloat/multilingual-e5-small`	384	Multilingual incl. Korean; recommended default
`dragonkue/multilingual-e5-small-ko-v2`	384	Korean-tuned variant of e5-small
`BAAI/bge-m3`	1024	No prefix needed; larger model

.minsyncignore

.gitignore syntax. Exclude files from indexing:

target/
*.png
*.pdf

Development

CI assumes the standard GitHub-hosted runners on Ubuntu, macOS, and Windows, with Rust 1.91 installed through rustup. The build also expects a working C compiler toolchain, vendored protoc from protobuf-src for MinSync's non-Windows build script, setup-protoc for Windows and LanceDB dependency build scripts, LanceDB native dependencies built on CI, and no secrets for normal CI runs.

cargo test            # full test suite
cargo clippy          # lint
cargo fmt             # format

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MinSync

Install

Usage

How it works

Vector store

Local embeddings (no OpenAI) — Hugging Face TEI

1. Install and launch TEI (macOS Apple Silicon)

2. Configure MinSync

3. LanceDB dimension note

4. Run normally

Alternative models

.minsyncignore

Development

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MinSync

Install

Usage

How it works

Vector store

Local embeddings (no OpenAI) — Hugging Face TEI

1. Install and launch TEI (macOS Apple Silicon)

2. Configure MinSync

3. LanceDB dimension note

4. Run normally

Alternative models

.minsyncignore

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages