crawl4ai version
0.9.0 — Docker server image unclecode/crawl4ai:latest (the secure-by-default Compose stack).
Expected Behavior
docker compose up (with CRAWL4AI_API_TOKEN set so gunicorn binds a non-loopback interface) should start the FastAPI/gunicorn server and bind :11235. curl http://localhost:11235/health should return {"status":"ok"} and the crawl endpoints should work.
Current Behavior
The gunicorn worker crash-loops on boot and nothing ever binds :11235; curl http://localhost:11235/health returns curl: (56) Recv failure: Connection reset by peer (the host port-proxy accepts the TCP connection, but there is no upstream listener).
Root causes (all verified on unclecode/crawl4ai:latest, Docker Desktop, container running as appuser, uid/gid 999):
-
~/.crawl4ai is never mounted — the primary failure. crawl4ai creates its home dir /home/appuser/.crawl4ai (DB, logs, seeder index) via os.makedirs(...) at import time (crawl4ai/async_database.py). No tmpfs covers it and read_only: true makes the rootfs read-only, so import raises OSError: [Errno 30] Read-only file system: '/home/appuser/.crawl4ai'. The worker fails to boot, respawns, and supervisord gives up (gunicorn entered FATAL state, too many start retries).
-
The "writable" tmpfs mounts come up root:root and are unwritable by appuser. Observed inside the running container:
/tmp 1777 root:root WRITABLE
/home/appuser/.cache 755 root:root denied
/var/lib/crawl4ai/outputs 700 root:root denied (mode=0700 in compose)
/var/lib/redis 750 root:root denied
A bare tmpfs mounted over a directory that already exists in the image takes the mountpoint's mode (.cache 755, redis 750) with root:root ownership — only /tmp (underlying mode 1777) is writable. Since both gunicorn and redis run as appuser (supervisord.conf → user=appuser), redis can't persist to /var/lib/redis and the server can't write artifacts to /var/lib/crawl4ai/outputs. (Note: this corrects an earlier claim that bare tmpfs are always 1777/writable — that holds for a non-existent mountpoint like --tmpfs /foo, but not for these pre-existing image dirs.)
-
The ~/.cache tmpfs shadows the baked-in Chromium. The Dockerfile bakes Playwright's Chromium into /home/appuser/.cache/ms-playwright and PLAYWRIGHT_BROWSERS_PATH is unset. A tmpfs over the whole ~/.cache hides it, so even after (1) and (2) are fixed, crawling fails because Playwright can't find the browser. Verified: with --tmpfs /home/appuser/.cache, ls ~/.cache/ms-playwright → No such file or directory.
-
gunicorn ≥26 control socket (non-fatal, but noisy). gunicorn opens a control socket under $HOME/.gunicorn by default; $HOME (/home/appuser) is on the read-only rootfs with no tmpfs, so every boot logs [ERROR] Control server error: [Errno 30] Read-only file system: '/home/appuser/.gunicorn'.
Net effect: import-time / worker failure → no listener on :11235 → connection reset from the host.
Is this reproducible?
Yes
Inputs Causing the Bug
# Image: unclecode/crawl4ai:latest (v0.9.0)
# Config: the secure-by-default docker-compose.yml shipped on the 0.9.0 branch
# Relevant settings:
read_only: true
user: "appuser"
tmpfs:
- /tmp
- /var/lib/redis
- /var/lib/crawl4ai/outputs:mode=0700
- /home/appuser/.cache
Steps to Reproduce
1. Use the v0.9.0 docker-compose.yml (read_only: true + user: appuser + the tmpfs list above).
2. Set CRAWL4AI_API_TOKEN (so gunicorn binds [::]:11235 instead of loopback) and run:
docker compose up -d
3. docker logs <container> -> OSError: Read-only file system: '/home/appuser/.crawl4ai';
the worker exits and respawns; "gunicorn entered FATAL state".
4. curl http://localhost:11235/health -> curl: (56) Recv failure: Connection reset by peer
Code snippets
# Proposed fix: give the non-root runtime user (uid/gid 999 = appuser) ownership of
# the writable tmpfs, add the missing ~/.crawl4ai mount, and stop shadowing the
# baked Chromium by scoping the cache tmpfs to just the writable subdir.
read_only: true
tmpfs:
- /tmp
- /var/lib/redis:uid=999,gid=999,mode=0700
- /var/lib/crawl4ai/outputs:uid=999,gid=999,mode=0700
- /home/appuser/.crawl4ai:uid=999,gid=999,mode=0700
- /home/appuser/.cache/url_seeder:uid=999,gid=999,mode=0700 # NOT all of ~/.cache (that shadows ms-playwright)
- /home/appuser/.gunicorn:uid=999,gid=999,mode=0700 # or pass gunicorn --no-control-socket
Verified with this change: the container boots healthy, /health returns 200, a live crawl of https://example.com returns success: true with rendered markdown, and no read-only / permission errors remain in the logs.
Supporting Information
OS
macOS (Docker Desktop); the container itself is Linux (python:3.12-slim-bookworm).
Python version
3.12 (inside the unclecode/crawl4ai:latest image)
Browser
Chromium (Playwright, baked into the image)
Browser version
Playwright Chromium build chromium-1223
Error logs & Screenshots (if applicable)
OSError: [Errno 30] Read-only file system: '/home/appuser/.crawl4ai'
File ".../crawl4ai/async_database.py", line 20, in <module>
os.makedirs(DB_PATH, exist_ok=True)
[ERROR] Worker (pid:68) exited with code 3.
[ERROR] Shutting down: Master
[ERROR] Reason: Worker failed to boot.
WARN exited: gunicorn (exit status 3; not expected)
INFO gave up: gunicorn entered FATAL state, too many start retries too quickly
# non-fatal, every boot:
[ERROR] Control server error: [Errno 30] Read-only file system: '/home/appuser/.gunicorn'
crawl4ai version
0.9.0 — Docker server image
unclecode/crawl4ai:latest(the secure-by-default Compose stack).Expected Behavior
docker compose up(withCRAWL4AI_API_TOKENset so gunicorn binds a non-loopback interface) should start the FastAPI/gunicorn server and bind:11235.curl http://localhost:11235/healthshould return{"status":"ok"}and the crawl endpoints should work.Current Behavior
The gunicorn worker crash-loops on boot and nothing ever binds
:11235;curl http://localhost:11235/healthreturnscurl: (56) Recv failure: Connection reset by peer(the host port-proxy accepts the TCP connection, but there is no upstream listener).Root causes (all verified on
unclecode/crawl4ai:latest, Docker Desktop, container running asappuser, uid/gid 999):~/.crawl4aiis never mounted — the primary failure. crawl4ai creates its home dir/home/appuser/.crawl4ai(DB, logs, seeder index) viaos.makedirs(...)at import time (crawl4ai/async_database.py). No tmpfs covers it andread_only: truemakes the rootfs read-only, so import raisesOSError: [Errno 30] Read-only file system: '/home/appuser/.crawl4ai'. The worker fails to boot, respawns, and supervisord gives up (gunicorn entered FATAL state, too many start retries).The "writable" tmpfs mounts come up
root:rootand are unwritable byappuser. Observed inside the running container:A bare tmpfs mounted over a directory that already exists in the image takes the mountpoint's mode (
.cache755,redis750) withroot:rootownership — only/tmp(underlying mode 1777) is writable. Since both gunicorn and redis run asappuser(supervisord.conf→user=appuser), redis can't persist to/var/lib/redisand the server can't write artifacts to/var/lib/crawl4ai/outputs. (Note: this corrects an earlier claim that bare tmpfs are always1777/writable — that holds for a non-existent mountpoint like--tmpfs /foo, but not for these pre-existing image dirs.)The
~/.cachetmpfs shadows the baked-in Chromium. The Dockerfile bakes Playwright's Chromium into/home/appuser/.cache/ms-playwrightandPLAYWRIGHT_BROWSERS_PATHis unset. A tmpfs over the whole~/.cachehides it, so even after (1) and (2) are fixed, crawling fails because Playwright can't find the browser. Verified: with--tmpfs /home/appuser/.cache,ls ~/.cache/ms-playwright→No such file or directory.gunicorn ≥26 control socket (non-fatal, but noisy). gunicorn opens a control socket under
$HOME/.gunicornby default;$HOME(/home/appuser) is on the read-only rootfs with no tmpfs, so every boot logs[ERROR] Control server error: [Errno 30] Read-only file system: '/home/appuser/.gunicorn'.Net effect: import-time / worker failure → no listener on
:11235→ connection reset from the host.Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
Verified with this change: the container boots healthy,
/healthreturns200, a live crawl ofhttps://example.comreturnssuccess: truewith rendered markdown, and no read-only / permission errors remain in the logs.Supporting Information
OS
macOS (Docker Desktop); the container itself is Linux (
python:3.12-slim-bookworm).Python version
3.12 (inside the
unclecode/crawl4ai:latestimage)Browser
Chromium (Playwright, baked into the image)
Browser version
Playwright Chromium build
chromium-1223Error logs & Screenshots (if applicable)