Skip to content

[BUG] Docker Compose: SIGILL (exit 132) on startup #38

@bmurrtech

Description

@bmurrtech

Description

Through some debugging, I managed to fix the src code and get things running locally via Docker. Here's the bug report I had AI generate with the fix that worked for me:

Docker Compose: SIGILL (exit 132) on startup and misleading FastAPI auth warning

Area: Docker / docker-compose / server.py / video_generation_agent
Severity: High — container could crash immediately (SIGILL); Medium — confusing auth warning when env vars were set


Problem

Running OpenSwarm with docker-compose up --build could fail in two independent ways:

  1. Crash: The process exited with code 132 (SIGILL — illegal instruction), often after logs made it look like configuration was fine (including model API keys loaded from .env).
  2. Misleading auth: Users who set API_TOKEN still saw
    WARNING:agency_swarm.integrations.fastapi:App token is not set. Authentication will be disabled.
    because Agency Swarm’s FastAPI integration only reads APP_TOKEN by default (app_token_env="APP_TOKEN").

Together, the auth warning could be mistaken as the cause of a subsequent crash; in practice the SIGILL was tied to native libraries loaded during Video Agent / tool import, not the bearer token.


Root cause

1. SIGILL (exit 132)

Agency Swarm builds a preview Agency during FastAPI startup (run_fastapicreate_agency()). That constructs every agent, including the video generation agent, which registers all modules under video_generation_agent/tools/.

Several tools imported cv2 (OpenCV) and google.genai / google.genai.types at module import time. Those stacks ship native code (SIMD, gRPC) that can raise SIGILL under:

  • Docker Desktop QEMU user emulation (e.g. linux/amd64 image on Apple Silicon), or
  • CPUs without the features the wheels assume.

So the interpreter died during tool discovery, not necessarily on the first line of server.py.

2. “App token is not set” with API_TOKEN only

run_fastapi uses os.getenv("APP_TOKEN"). API_TOKEN was never read by the framework, so a valid HTTP secret under the wrong name left auth disabled and the warning enabled.

Security note: Do not map OPENAI_API_KEYAPP_TOKEN; model keys must stay separate from HTTP bearer secrets. The implemented behavior is API_TOKENAPP_TOKEN only when APP_TOKEN is unset.

3. Contributing factor: BLAS / base image (mitigation)

NumPy / SciPy / related stacks can also contribute to SIGILL in bad CPU/QEMU combinations. Mitigations applied: Python 3.12 slim base and OpenBLAS / thread caps in the Dockerfile.


Steps to reproduce (pre-fix behavior)

On a revision before the fixes below:

  1. Clone the repo and cd into the project root.
  2. cp .env.example .env — set OPENAI_API_KEY (and optionally other model keys). Either omit APP_TOKEN or set only API_TOKEN.
  3. Run docker-compose up --build.
  4. Observe: FastAPI logs App token is not set (if no effective APP_TOKEN); then SIGILL / exit 132 during agency construction, often when the Video Agent’s tools are imported.

Post-fix: The same flow should reach Uvicorn listening on 0.0.0.0:8080 without SIGILL during import.


Expected vs actual

Expected Actual (before fix)
Startup Server listens; tools load without pulling heavy natives until used SIGILL during import-time native loads
HTTP auth A single documented env name (or alias) enables Bearer auth API_TOKEN ignored → warning + disabled auth
Logs Failures point at the real fault Warning suggested “token” failure ahead of native crash

Resolution (for maintainers)

What was implemented:

Change Purpose
Lazy imports in video tools and video_utils Defer cv2, google.genai, moviepy / heavy paths until run() or the specific helper — avoids SIGILL at agency construction / tool registration.
server.py If APP_TOKEN is empty and API_TOKEN is set, set APP_TOKEN from API_TOKEN so FastAPI auth can enable without renaming env vars.
Dockerfile python:3.12-slim-bookworm; OPENBLAS_CORETYPE=Haswell, OPENBLAS_NUM_THREADS=1, OMP_NUM_THREADS=1, MKL_NUM_THREADS=1 to reduce BLAS-related SIGILL risk in containers.
.env.example, README.md Document APP_TOKEN, API_TOKEN, and Docker env_file behavior.

Checklist:

  • Lazy cv2 where used by video tools / video_utils.
  • Lazy google.genai / google.genai.types (and client creation) until needed.
  • Lazy moviepy / numpy in AddSubtitles.run().
  • API_TOKENAPP_TOKEN when APP_TOKEN unset (server.py).
  • Docs: .env.example, README.md (Docker + tokens).
  • Dockerfile: Python 3.12 + OpenBLAS/thread env.
  • Remove temporary debug NDJSON / compose bind mounts used during investigation (not part of the product fix).

Files affected

File Remedy
Dockerfile Base python:3.12-slim-bookworm; env vars to cap OpenBLAS/OpenMP/MKL threads and OPENBLAS_CORETYPE=Haswell for safer SIMD profile in containers.
server.py After load_dotenv(), copy API_TOKENAPP_TOKEN when APP_TOKEN is missing/whitespace.
.env.example Document APP_TOKEN, API_TOKEN, alias behavior for Docker / python server.py.
README.md Docker section: env_file; clarify APP_TOKEN vs API_TOKEN.
docker-compose.yml No functional change required for this bug; ensure no debug-only bind mounts remain in production-oriented compose.
video_generation_agent/tools/utils/video_utils.py Remove top-level import cv2; import inside functions. Remove top-level from google import genai; import inside get_gemini_client().
video_generation_agent/tools/TrimVideo.py Lazy import cv2 inside _trim_video_blocking / _extract_first_frame.
video_generation_agent/tools/EditAudio.py Lazy import cv2 inside _mix_audio_video_blocking / _extract_first_frame.
video_generation_agent/tools/EditVideoContent.py Remove top-level from google.genai import types; import inside _run_extend. Lazy import cv2 in _extract_first_frame.
video_generation_agent/tools/GenerateVideo.py Remove top-level from google.genai.types import ...; import inside _generate_with_veo.
video_generation_agent/tools/AddSubtitles.py Move moviepy / numpy imports into run().
video_generation_agent/tools/CombineImages.py Lazy from google import genai at start of run().
video_generation_agent/tools/EditImage.py Lazy from google import genai at start of run().
video_generation_agent/tools/GenerateImage.py Lazy from google import genai at start of run().

Verification (post-fix)

  1. From repo root: docker-compose up --build with .env containing at least one provider key.
  2. Confirm logs: agency initialization completes, endpoints created, Uvicorn running on http://0.0.0.0:8080.
  3. Open http://localhost:8080/docs (or agency routes under /open-swarm/).
  4. For HTTP auth: set APP_TOKEN or API_TOKEN; confirm the “App token is not set” warning does not appear when a non-empty secret is provided (via either name).

Follow-ups (optional backlog)

  • CORS: With APP_TOKEN set and permissive CORS defaults, Agency Swarm may log CORS/credentials warnings; consider documenting or wrapping cors_origins for stricter browser clients.
  • Platform: If SIGILL reappears on unusual hosts, document platform: linux/arm64 vs linux/amd64 tradeoffs for Apple Silicon vs Linux CI.

References

  • Agency Swarm FastAPI: run_fastapi(..., app_token_env="APP_TOKEN") — bearer token from environment at runtime.
  • Exit code 132: Unix 128 + 4SIGILL.

Plugins

No response

OpenSwarm version

No response

Steps to reproduce

No response

Screenshot and/or share link

No response

Operating System

No response

Terminal

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions