This project focuses on comparing three storage/indexing strategies for log search, with datasets sized Small (~10), Medium (500), and Large (5000) for context.
Hybrid Index (Bucket + Inverted Index)
An in-memory approach that combines a bucket index for (category + level) and an inverted index for tokenized message text. Token lookup supports prefix matching (e.g. sess → session), then candidates are intersected and finished with a substring check. Build time is O(n + totalTokens). Typical searches are sub-linear; worst case remains O(n). Fastest in RAM, but not persistent.
flowchart LR
Q["Query (text/category/level)"] --> B["Bucket Index (cat+lvl)"]
Q --> T["Token Index (inverted)"]
B --> I["Intersect candidates"]
T --> I
I --> F["Final substring filter"]
F --> R["Results"]
SQLite FTS
An on-disk solution using a normal logs table plus an FTS5 virtual table for message text. Text queries use MATCH, while category/level/date are standard SQL filters. Build time is O(n + totalTokens). Search is sub-linear in practice, worst case O(n). This is the most robust option for true full-text search at scale.
flowchart LR
Q["Query (text/category/level)"] --> M["FTS MATCH"]
Q --> S["SQL filters"]
M --> J["Join logs + fts"]
S --> J
J --> R["Results"]
SwiftData (Token-Aware)
A persistent SwiftData model that stores a precomputed tokens array per log. Structured fields are filtered with predicates, then token prefix checks plus substring validation are performed in memory. Build time is O(n). Text search trends toward O(n), but is faster than a pure contains scan.
flowchart LR
Q["Query (text/category/level)"] --> P["SwiftData predicates (structured)"]
P --> C["Candidate set"]
C --> T["Token prefix check (in memory)"]
T --> F["Substring check"]
F --> R["Results"]
Tokenization is lowercased, diacritic-insensitive, split on non-alphanumeric, with a basic stop-words list (e.g. the, a, an, and, or, to, of, in, on, for, with, is, are, was, were, at, by).
- Hybrid Index: fastest in RAM, not persistent. Sub-linear search in typical cases.
- SQLite FTS: best full-text performance and durability.
- SwiftData: easiest native persistence, but not a true FTS engine.