Skip to content

bug(db): asyncpg connections broken by task cancellation hang the next request (no pool_pre_ping) #77

@pushiwuhua7

Description

@pushiwuhua7

Summary

backend/src/analytics_agent/db/base.py creates the async engine with only echo and SQLite's check_same_thread — no pool parameters at all. For MySQL / PostgreSQL
(asyncpg), this causes two problems:

  1. Critical bug: pool_pre_ping defaults to False. Under concurrent load, when a request is cancelled (client disconnect, upstream timeout, asyncio.CancelledError,
    etc.), asyncpg leaves that connection in a broken state and returns it to the pool. The next checkout that grabs it hangs forever — no exception, no log line, just a
    silent stall. Symptom: after a burst of load the backend stops responding to any DB-backed request, but the process is still alive and the logs look clean.
  2. Secondary problem: pool_size / max_overflow / pool_recycle / pool_timeout all fall back to SQLAlchemy defaults (5 / 10 / -1 / 30). That's too small for
    production deployments and there's no way to tune it without code changes.

Reproduction

Requires Postgres + asyncpg:

  1. Point DATABASE_URL at Postgres (postgresql+asyncpg://...).
  2. Start the backend and hit any DB-backed endpoint with concurrency ≥ pool_size. Cancel a portion of those requests mid-flight (curl Ctrl-C, or just close the browser
    tab).
  3. After a few rounds, subsequent requests block forever on session checkout. Nothing short of restarting the process recovers.

SQLite is unaffected, so this is invisible in the default local-dev configuration.

Proposed fix

Expose the standard SQLAlchemy pool knobs as environment variables with sane defaults, applied only to non-SQLite engines. pool_pre_ping=True is the key bug fix.

Env Default
DB_POOL_SIZE 10
DB_MAX_OVERFLOW 20
DB_POOL_RECYCLE 1800
DB_POOL_PRE_PING true
DB_POOL_TIMEOUT 10

Touches three files: .env.example, backend/src/analytics_agent/config.py, backend/src/analytics_agent/db/base.py — +33 / -5 lines total.

Reference

I implemented this fix on a fork and have been running it for a while without issues:

Happy to clean this up into a PR if maintainers are open to the approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions