Skip to content

[RFC] Resolving App Cold Starts and Idle RAM Bloat via SQLite-First, Container Tuning, and Host-Level zswap #109

Description

@BeSovereign

RFC: Reducing App Cold Starts and Idle RAM Bloat via SQLite-First, Container Tuning, and Host-Level zswap

1. The Problem

Currently, the Shard Core manages resource consumption on low-end VPS hosts by stopping inactive containers (docker stop) via app_lifecycle.py. While this frees up RAM, it introduces severe UX drawbacks:

  • High Latency (Cold Starts): Re-activating heavy apps takes 15–30+ seconds because database and application runtimes must boot from scratch.
  • Bad UX Feedback: Traefik's app-error middleware catches the resulting 502 Bad Gateway and displays an unstyled splash page saying "Unknown Status..." with a flashing 2-second hard reload loop.

2. Architectural Design Decision: Rejecting Shared DB/Redis Engines

We explicitly reject the approach of using a single shared database or Redis instance across all applications. Doing so introduces severe architectural anti-patterns:

  • Security & Isolation: Shared databases break isolation; credential leaks or SQL injections in one app could expose all other apps.
  • Single Point of Failure: If the shared DB crashes due to memory limits, all apps on the shard go offline.
  • Version Lock-in: Apps requiring different DB versions (e.g., PostgreSQL 15 vs. 17) would block each other from updating.

Instead, we propose keeping strictly isolated containers but optimizing their idle footprint down to a minimum using three technical levers: SQLite-First (A), Aggressive Database Tuning (B), and Host-Level Swap/zswap (C).


3. Concrete Application Implementations

Here is how we will apply these rules to three target applications on the Shard:

Application Database Engine Sidecars Optimization Target
Vaultwarden SQLite (Embedded) None Limit total RAM to ~15MB RSS.
Paperless-ngx SQLite (Embedded) Redis (Single task broker) Cap Redis memory via command flags.
Immich PostgreSQL (Tuned) Redis, ML Container Restrict PG buffers & limit ML workers.

1. Vaultwarden (Rust Bitwarden Clone)

Vaultwarden is highly optimized and runs SQLite by default without external dependencies.

  • Database: SQLite. No dedicated database container.
  • Optimizations: Set container memory limits to 30MB (it idle-runs at ~15MB RSS). No tuning required.

2. Paperless-ngx

Paperless-ngx supports SQLite natively for single-user scenarios (our primary target).

  • Database: Enforce SQLite in our templates (avoid Postgres).
  • Sidecars: Requires Redis for Celery task queuing.
  • Optimizations: Tune the paperless-redis container by passing memory-cap flags directly to the start command:
    command: ["redis-server", "--maxmemory", "50mb", "--maxmemory-policy", "allkeys-lru"]
    This keeps the Redis memory footprint below 3MB while fully isolating the queue.

3. Immich

Immich is heavy and strictly requires PostgreSQL. It cannot run on SQLite.

  • Database: PostgreSQL.
  • Sidecars: Redis, Machine Learning (ML) container, Server.
  • Optimizations:
    1. Tune PostgreSQL: Limit database caches and connections in the compose template:
      command: ["postgres", "-c", "shared_buffers=16MB", "-c", "max_connections=10", "-c", "work_mem=1MB"]
      This drops the Postgres idle overhead from 45MB to ~12MB RSS.
    2. Tune ML Container: Set IMMICH_MACHINE_LEARNING_WORKERS=1 and set strict CPU/RAM limit limits (e.g. mem_limit: 150m) to prevent RAM spikes during photo uploads.

4. The Host-Level Mitigation: Active zswap to Prevent I/O Lockups

Even with tuned containers, running 5+ apps with their own mini-Postgres/Redis instances will total ~100-150MB of idle memory.
To prevent this memory from clogging physical RAM on XS/S instances without causing I/O lockups (Thrashing), the host OS must be prepared accordingly:

  • Host Setup: Configure a 4–8 GB swapfile on the host VPS.
  • zswap Pool: Enable zswap (Compressed Cache for Swap) using fast compression (e.g., lz4 or zstd) during OS provisioning.
  • Swappiness: Set sysctl vm.swappiness=80-100. The host kernel will compress the idle processes of PostgreSQL/Redis in RAM (typically a 3:1 ratio). When accessed, pages are decompressed in microseconds, bypassing slow disk I/O bottleneck freezes (an issue common on cheap VPS providers like Netcup).
  • Docker safety margins: Every container must have strict memswap_limit configurations. If a container exceeds its allowed swap budget, the OOM-killer terminates it instead of letting the host freeze in %iowait.

5. Action Items

  • [freeshard-controller] Add swapfile, zswap configuration, and vm.swappiness setups to the VM provisioning steps in ssh.py.
  • [freeshard] Implement a client-side fetch-polling script in splash.html to query app status asynchronously and prevent the white page reload flash.
  • [app-repository] Apply tuning parameters (shared buffers, Redis maxmemory, etc.) and SQLite default setups to Vaultwarden, Paperless-ngx, and Immich templates.
  • [freeshard] Update app_lifecycle.py to monitor %iowait from /proc/stat and dynamically adjust container states if I/O bottlenecks occur.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions