bug(db): asyncpg connections broken by task cancellation hang the next request (no pool_pre_ping)

  ## Summary

  `backend/src/analytics_agent/db/base.py` creates the async engine with only `echo` and SQLite's `check_same_thread` — no pool parameters at all. For MySQL / PostgreSQL
  (asyncpg), this causes two problems:

  1. **Critical bug**: `pool_pre_ping` defaults to `False`. Under concurrent load, when a request is cancelled (client disconnect, upstream timeout, `asyncio.CancelledError`,
  etc.), asyncpg leaves that connection in a broken state and returns it to the pool. The next checkout that grabs it **hangs forever** — no exception, no log line, just a
  silent stall. Symptom: after a burst of load the backend stops responding to any DB-backed request, but the process is still alive and the logs look clean.
  2. **Secondary problem**: `pool_size` / `max_overflow` / `pool_recycle` / `pool_timeout` all fall back to SQLAlchemy defaults (5 / 10 / -1 / 30). That's too small for
  production deployments and there's no way to tune it without code changes.

  ## Reproduction
  
  Requires Postgres + asyncpg:

  1. Point `DATABASE_URL` at Postgres (`postgresql+asyncpg://...`).
  2. Start the backend and hit any DB-backed endpoint with concurrency ≥ `pool_size`. Cancel a portion of those requests mid-flight (curl Ctrl-C, or just close the browser
  tab).
  3. After a few rounds, subsequent requests block forever on session checkout. Nothing short of restarting the process recovers.

  SQLite is unaffected, so this is invisible in the default local-dev configuration.

  ## Proposed fix

  Expose the standard SQLAlchemy pool knobs as environment variables with sane defaults, **applied only to non-SQLite engines**. `pool_pre_ping=True` is the key bug fix.

  | Env | Default |
  | --- | --- |
  | `DB_POOL_SIZE` | `10` |
  | `DB_MAX_OVERFLOW` | `20` |
  | `DB_POOL_RECYCLE` | `1800` |
  | `DB_POOL_PRE_PING` | `true` |
  | `DB_POOL_TIMEOUT` | `10` |

  Touches three files: `.env.example`, `backend/src/analytics_agent/config.py`, `backend/src/analytics_agent/db/base.py` — +33 / -5 lines total.

  ## Reference
  
  I implemented this fix on a fork and have been running it for a while without issues:

  - pushiwuhua7/analytics-agent@6996d50 `feat(db): configurable connection pool tuning for MySQL / PostgreSQL`
  - branch: `feat/db-pool-config`

  Happy to clean this up into a PR if maintainers are open to the approach.


Env	Default
`DB_POOL_SIZE`	`10`
`DB_MAX_OVERFLOW`	`20`
`DB_POOL_RECYCLE`	`1800`
`DB_POOL_PRE_PING`	`true`
`DB_POOL_TIMEOUT`	`10`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(db): asyncpg connections broken by task cancellation hang the next request (no pool_pre_ping) #77

Summary

Reproduction

Proposed fix

Reference

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug(db): asyncpg connections broken by task cancellation hang the next request (no pool_pre_ping) #77

Description

Summary

Reproduction

Proposed fix

Reference

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions