Skip to content

[Bug]: Docker v0.9.0 read-only compose breaks boot — ~/.crawl4ai not mounted & writable tmpfs are root-owned, so the worker crash-loops and nothing binds :11235 #2027

Description

@SetagGnaw

crawl4ai version

0.9.0 — Docker server image unclecode/crawl4ai:latest (the secure-by-default Compose stack).

Expected Behavior

docker compose up (with CRAWL4AI_API_TOKEN set so gunicorn binds a non-loopback interface) should start the FastAPI/gunicorn server and bind :11235. curl http://localhost:11235/health should return {"status":"ok"} and the crawl endpoints should work.

Current Behavior

The gunicorn worker crash-loops on boot and nothing ever binds :11235; curl http://localhost:11235/health returns curl: (56) Recv failure: Connection reset by peer (the host port-proxy accepts the TCP connection, but there is no upstream listener).

Root causes (all verified on unclecode/crawl4ai:latest, Docker Desktop, container running as appuser, uid/gid 999):

  1. ~/.crawl4ai is never mounted — the primary failure. crawl4ai creates its home dir /home/appuser/.crawl4ai (DB, logs, seeder index) via os.makedirs(...) at import time (crawl4ai/async_database.py). No tmpfs covers it and read_only: true makes the rootfs read-only, so import raises OSError: [Errno 30] Read-only file system: '/home/appuser/.crawl4ai'. The worker fails to boot, respawns, and supervisord gives up (gunicorn entered FATAL state, too many start retries).

  2. The "writable" tmpfs mounts come up root:root and are unwritable by appuser. Observed inside the running container:

    /tmp                        1777 root:root  WRITABLE
    /home/appuser/.cache         755 root:root  denied
    /var/lib/crawl4ai/outputs    700 root:root  denied   (mode=0700 in compose)
    /var/lib/redis               750 root:root  denied
    

    A bare tmpfs mounted over a directory that already exists in the image takes the mountpoint's mode (.cache 755, redis 750) with root:root ownership — only /tmp (underlying mode 1777) is writable. Since both gunicorn and redis run as appuser (supervisord.confuser=appuser), redis can't persist to /var/lib/redis and the server can't write artifacts to /var/lib/crawl4ai/outputs. (Note: this corrects an earlier claim that bare tmpfs are always 1777/writable — that holds for a non-existent mountpoint like --tmpfs /foo, but not for these pre-existing image dirs.)

  3. The ~/.cache tmpfs shadows the baked-in Chromium. The Dockerfile bakes Playwright's Chromium into /home/appuser/.cache/ms-playwright and PLAYWRIGHT_BROWSERS_PATH is unset. A tmpfs over the whole ~/.cache hides it, so even after (1) and (2) are fixed, crawling fails because Playwright can't find the browser. Verified: with --tmpfs /home/appuser/.cache, ls ~/.cache/ms-playwrightNo such file or directory.

  4. gunicorn ≥26 control socket (non-fatal, but noisy). gunicorn opens a control socket under $HOME/.gunicorn by default; $HOME (/home/appuser) is on the read-only rootfs with no tmpfs, so every boot logs [ERROR] Control server error: [Errno 30] Read-only file system: '/home/appuser/.gunicorn'.

Net effect: import-time / worker failure → no listener on :11235 → connection reset from the host.

Is this reproducible?

Yes

Inputs Causing the Bug

# Image:  unclecode/crawl4ai:latest  (v0.9.0)
# Config: the secure-by-default docker-compose.yml shipped on the 0.9.0 branch
# Relevant settings:
read_only: true
user: "appuser"
tmpfs:
  - /tmp
  - /var/lib/redis
  - /var/lib/crawl4ai/outputs:mode=0700
  - /home/appuser/.cache

Steps to Reproduce

1. Use the v0.9.0 docker-compose.yml (read_only: true + user: appuser + the tmpfs list above).
2. Set CRAWL4AI_API_TOKEN (so gunicorn binds [::]:11235 instead of loopback) and run:
     docker compose up -d
3. docker logs <container>  ->  OSError: Read-only file system: '/home/appuser/.crawl4ai';
   the worker exits and respawns; "gunicorn entered FATAL state".
4. curl http://localhost:11235/health  ->  curl: (56) Recv failure: Connection reset by peer

Code snippets

# Proposed fix: give the non-root runtime user (uid/gid 999 = appuser) ownership of
# the writable tmpfs, add the missing ~/.crawl4ai mount, and stop shadowing the
# baked Chromium by scoping the cache tmpfs to just the writable subdir.
read_only: true
tmpfs:
  - /tmp
  - /var/lib/redis:uid=999,gid=999,mode=0700
  - /var/lib/crawl4ai/outputs:uid=999,gid=999,mode=0700
  - /home/appuser/.crawl4ai:uid=999,gid=999,mode=0700
  - /home/appuser/.cache/url_seeder:uid=999,gid=999,mode=0700   # NOT all of ~/.cache (that shadows ms-playwright)
  - /home/appuser/.gunicorn:uid=999,gid=999,mode=0700           # or pass gunicorn --no-control-socket

Verified with this change: the container boots healthy, /health returns 200, a live crawl of https://example.com returns success: true with rendered markdown, and no read-only / permission errors remain in the logs.

Supporting Information

OS

macOS (Docker Desktop); the container itself is Linux (python:3.12-slim-bookworm).

Python version

3.12 (inside the unclecode/crawl4ai:latest image)

Browser

Chromium (Playwright, baked into the image)

Browser version

Playwright Chromium build chromium-1223

Error logs & Screenshots (if applicable)

OSError: [Errno 30] Read-only file system: '/home/appuser/.crawl4ai'
  File ".../crawl4ai/async_database.py", line 20, in <module>
    os.makedirs(DB_PATH, exist_ok=True)
[ERROR] Worker (pid:68) exited with code 3.
[ERROR] Shutting down: Master
[ERROR] Reason: Worker failed to boot.
WARN exited: gunicorn (exit status 3; not expected)
INFO gave up: gunicorn entered FATAL state, too many start retries too quickly

# non-fatal, every boot:
[ERROR] Control server error: [Errno 30] Read-only file system: '/home/appuser/.gunicorn'

Metadata

Metadata

Assignees

No one assigned

    Labels

    ⚙ DoneBug fix, enhancement, FR that's completed pending release🐞 BugSomething isn't working🐳 DockerDocker issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions