A Go implementation of the Burkhard-Keller Tree (BK-Tree) for fast approximate string matching in metric spaces.
- Generic distance function — plug in any metric (Levenshtein, Hamming, or your own)
- Single tree + Forest —
BKTreefor general use;Forestpartitions by string length for natural Hamming support and Levenshtein length-based pruning - Exists shortcut —
Exists(word, maxDist)stops at the first match, faster thanQuerywhen you only need a boolean - Sorted results —
Queryreturns results ordered by distance (then word) - Unicode-aware — Levenshtein distance operates on runes, not raw bytes
- Zero dependencies — standard library only
go get github.com/IcyDesert/bktreepackage main
import (
"fmt"
"github.com/IcyDesert/bktree"
)
func main() {
tree := bktree.New(bktree.Levenshtein)
words := []string{"book", "books", "cake", "boo", "cape"}
for _, w := range words {
tree.Add(w)
}
// Find all words within edit distance 2 of "boak"
results := tree.Query("boak", 2)
for _, r := range results {
fmt.Printf("%s (distance: %d)\n", r.Word, r.Distance)
}
// Output:
// book (distance: 1)
// boo (distance: 2)
// books (distance: 2)
// Just check if anything is close enough
if tree.Exists("hallo", 1) {
fmt.Println("found a near-match")
}
}Forest groups words by length into separate trees. This makes Hamming distance trivial — only the same-length tree is ever queried.
forest := bktree.NewForest(bktree.Hamming)
words := []string{"000", "001", "010", "111"}
for _, w := range words {
forest.Add(w)
}
results := forest.Query("001", 1)
for _, r := range results {
fmt.Printf("%s (hamming: %d)\n", r.Word, r.Distance)
}
// Output:
// 001 (hamming: 0)
// 000 (hamming: 1)// Jaccard distance on character sets (example)
func jaccard(a, b string) int {
// ... compute distance ...
return distance
}
tree := bktree.New(jaccard)
tree.Add("foo")
tree.Add("bar")
results := tree.Query("baz", 2)Trees can be saved to disk with gob encoding. Only the topology is persisted — the distance function is not saved. You must provide the same function when loading.
import "os"
// Build
forest := bktree.NewForest(bktree.Levenshtein)
forest.Add("book")
forest.Add("books")
// Save
f, _ := os.Create(bktree.DefaultFilename("index.gob", bktree.Levenshtein))
// Creates "index_levenshtein.gob"
forest.Save(f)
f.Close()
// Load (elsewhere, later)
f, _ = os.Open("index_levenshtein.gob")
defer f.Close()
loaded, err := bktree.LoadForest(f, bktree.Levenshtein)
if err != nil {
log.Fatal(err)
}
// Use exactly like before
results := loaded.Query("boak", 2)BKTree uses the same pattern with Save and Load.
For Levenshtein and other reasonable metrics, if dist(a, b) <= d then |len(a) - len(b)| <= d. Forest.Query exploits this to skip entire length-buckets, reducing the search space.
For Hamming, the same logic collapses to checking only the exact-length bucket.