Skip to content

chrismaximin/sqlitesweep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQLiteSweep

Query millions of SQLite databases across remote hosts via SSH, aggregating results in real-time. Designed for multi-tenant apps where each tenant has their own SQLite database.

demo

Installation

gem install sqlitesweep

Or add to your Gemfile:

gem "sqlitesweep"

Usage

sqlitesweep \
  -q 'SELECT count(*) FROM products WHERE price > 0' \
  -a sum \
  -s 'rails runner "Account.active.find_each { |a| puts a.db_uri }"' \
  -c 16

The -s (source) command should output one database URI per line. URIs can be:

  • Local paths: /data/tenants/acme.sqlite3
  • File URIs: file:///data/tenants/acme.sqlite3
  • SSH URIs: ssh://deploy@web1.example.com/data/tenants/acme.sqlite3

Results go to stdout, progress goes to stderr — so it's pipe-friendly:

sqlitesweep -q "SELECT count(*) FROM users" -a sum -s "cat db_uris.txt" > result.txt

Actions

Action Description
sum Sum the first column across all databases (default)
average / avg Average the first column across all databases
list Write all rows to a JSONL file, print the file path

Options

Flag Long Default Description
-q --query (required) SQL query to execute on each database
-a --action sum sum, average/avg, or list
-s --source (required) Shell command that outputs database URIs
-c --concurrency 8 Max parallel query workers
--max-ssh 50 Max SSH master connections
--no-live false Disable live progress display
--batch-size 4 Databases to query per SSH call
--ssh-timeout 10 SSH connect timeout (seconds)
--query-timeout 30 Per-query timeout (seconds)

Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                              sqlitesweep                                     │
└──────────────────────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────┐
  │  Source Command  (-s)       │   Any shell command that outputs database
  │                             │   URIs to stdout, one per line.
  │  rails runner "..."        │   e.g. Rails runner, cat, curl, script...
  └─────────────┬───────────────┘
                │
                │  streams URIs line by line
                ▼
  ┌─────────────────────────────┐
  │  URI Router                 │   Parses each URI and routes it:
  │                             │
  │  /local/path.sqlite3  ─────────▶  direct to Worker Pool
  │  ssh://user@host/path  ────────▶  to Host Batcher
  └─────────────────────────────┘
                │
       ┌────────┴────────┐
       ▼                 ▼
  ┌──────────┐    ┌──────────────────────────────────────────────────────┐
  │  Local   │    │  Host Batcher                                       │
  │  Query   │    │                                                     │
  │          │    │  Groups URIs by host. Flushes when a batch reaches  │
  │  Opens   │    │  --batch-size (default 4) or after 200ms timeout.   │
  │  SQLite  │    │                                                     │
  │  directly│    │  ┌──────────┐  ┌──────────┐  ┌──────────┐          │
  │  via gem │    │  │ @host-1  │  │ @host-2  │  │ @host-3  │  ...     │
  │          │    │  │ db1,db2  │  │ db4      │  │ db5,db6  │          │
  │          │    │  │ db3      │  │          │  │ db7,db8  │          │
  │          │    │  └────┬─────┘  └────┬─────┘  └────┬─────┘          │
  └────┬─────┘    └───────┼─────────────┼─────────────┼────────────────┘
       │                  │             │             │
       │                  ▼             ▼             ▼
       │           ┌─────────────────────────────────────────────────┐
       │           │  SSH Connection Manager                         │
       │           │                                                 │
       │           │  One ControlMaster per host (reused across all  │
       │           │  queries to that host). Multiplexed via Unix    │
       │           │  socket — no repeated TCP/auth handshakes.      │
       │           │                                                 │
       │           │  Semaphore caps total masters at --max-ssh.     │
       │           │  BatchMode=yes — no password prompts.           │
       │           └──────────────────────┬──────────────────────────┘
       │                                  │
       │                                  ▼
       │           ┌─────────────────────────────────────────────────┐
       │           │  Remote Query (one SSH call per batch)          │
       │           │                                                 │
       │           │  ssh user@host '                                │
       │           │    printf "%s\t" /path/db1;                     │
       │           │    sqlite3 -json /path/db1 "SELECT ..."; echo;  │
       │           │    printf "%s\t" /path/db2;                     │
       │           │    sqlite3 -json /path/db2 "SELECT ..."; echo;  │
       │           │  '                                              │
       │           │                                                 │
       │           │  Multiple databases queried in a single SSH     │
       │           │  round-trip. Results parsed from tab-delimited  │
       │           │  output back into individual results.           │
       │           └──────────────────────┬──────────────────────────┘
       │                                  │
       ▼                                  ▼
  ┌──────────────────────────────────────────────────────────────────────┐
  │  Worker Pool  (-c, default 8 threads)                                │
  │                                                                      │
  │  Fixed thread pool runs queries in parallel. Back-pressure via       │
  │  caller-runs policy: when all threads are busy, the submitting       │
  │  thread executes the work itself instead of queuing unboundedly.     │
  └───────────────────────────────┬──────────────────────────────────────┘
                                  │
                                  │  results (thread-safe)
                                  ▼
  ┌──────────────────────────────────────────────────────────────────────┐
  │  Aggregator                                                          │
  │                                                                      │
  │  Mutex-protected accumulator. Mode depends on --action:              │
  │                                                                      │
  │    sum      Running total of first column value from each database.  │
  │    average  Running total / number of databases queried.             │
  │    list     Streams rows to a JSONL temp file (bounded memory).      │
  └───────────────────────────────┬──────────────────────────────────────┘
                                  │
                ┌─────────────────┴─────────────────┐
                ▼                                   ▼
  ┌──────────────────────────┐        ┌──────────────────────────┐
  │  Live Display (stderr)   │        │  Final Result (stdout)   │
  │                          │        │                          │
  │  Queried: 8421           │        │  sum/average ▶ number    │
  │  Errors: 2               │        │  list ▶ /tmp/file.jsonl  │
  │  Rate: 1204/s            │        │                          │
  │  Elapsed: 7.0s           │        │  Pipe-friendly: stderr   │
  │  Result: 421052          │        │  for progress, stdout    │
  │                          │        │  for the answer.         │
  └──────────────────────────┘        └──────────────────────────┘

Requirements

  • Ruby >= 4.0
  • sqlite3 available on remote hosts (for SSH queries)
  • SSH agent or key-based auth configured (BatchMode is enforced — no password prompts)

Development

bundle install
bundle exec rake test

Run the integration test:

bundle exec ruby test/integration/test_local_sweep.rb

Watch the live progress display with a slow-drip demo:

ruby test/integration/harness/live_demo.rb
ruby test/integration/harness/live_demo.rb --count 50 --delay 0.5
ruby test/integration/harness/live_demo.rb --action list

Run the benchmark:

ruby test/integration/harness/sweep_bench.rb --db-count 1000

Docker-based SSH testing

docker compose -f test/integration/harness/docker-compose.yml up -d
ruby test/integration/harness/sweep_bench.rb --docker --db-count 500

License

MIT

About

Query millions of SQLite databases across remote hosts via SSH, aggregating results in real-time. Designed for multi-tenant apps where each tenant has their own SQLite database.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages