Skip to content

This project focuses on comparing three storage/indexing strategies for log search.

Notifications You must be signed in to change notification settings

Nemanja92/Logger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logger

This project focuses on comparing three storage/indexing strategies for log search, with datasets sized Small (~10), Medium (500), and Large (5000) for context.

Storage & Indexing Options

Hybrid Index (Bucket + Inverted Index) An in-memory approach that combines a bucket index for (category + level) and an inverted index for tokenized message text. Token lookup supports prefix matching (e.g. sesssession), then candidates are intersected and finished with a substring check. Build time is O(n + totalTokens). Typical searches are sub-linear; worst case remains O(n). Fastest in RAM, but not persistent.

flowchart LR
  Q["Query (text/category/level)"] --> B["Bucket Index (cat+lvl)"]
  Q --> T["Token Index (inverted)"]
  B --> I["Intersect candidates"]
  T --> I
  I --> F["Final substring filter"]
  F --> R["Results"]
Loading

SQLite FTS An on-disk solution using a normal logs table plus an FTS5 virtual table for message text. Text queries use MATCH, while category/level/date are standard SQL filters. Build time is O(n + totalTokens). Search is sub-linear in practice, worst case O(n). This is the most robust option for true full-text search at scale.

flowchart LR
  Q["Query (text/category/level)"] --> M["FTS MATCH"]
  Q --> S["SQL filters"]
  M --> J["Join logs + fts"]
  S --> J
  J --> R["Results"]
Loading

SwiftData (Token-Aware) A persistent SwiftData model that stores a precomputed tokens array per log. Structured fields are filtered with predicates, then token prefix checks plus substring validation are performed in memory. Build time is O(n). Text search trends toward O(n), but is faster than a pure contains scan.

flowchart LR
  Q["Query (text/category/level)"] --> P["SwiftData predicates (structured)"]
  P --> C["Candidate set"]
  C --> T["Token prefix check (in memory)"]
  T --> F["Substring check"]
  F --> R["Results"]
Loading

Tokenization & Stop Words

Tokenization is lowercased, diacritic-insensitive, split on non-alphanumeric, with a basic stop-words list (e.g. the, a, an, and, or, to, of, in, on, for, with, is, are, was, were, at, by).

Performance Summary

  • Hybrid Index: fastest in RAM, not persistent. Sub-linear search in typical cases.
  • SQLite FTS: best full-text performance and durability.
  • SwiftData: easiest native persistence, but not a true FTS engine.

About

This project focuses on comparing three storage/indexing strategies for log search.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages