fix(celery): cache Celery apps to stop per-send memory leak in CeleryExecutor#68905
Open
arkadiuszbach wants to merge 1 commit into
Open
fix(celery): cache Celery apps to stop per-send memory leak in CeleryExecutor#68905arkadiuszbach wants to merge 1 commit into
arkadiuszbach wants to merge 1 commit into
Conversation
create_celery_app() rebuilt a brand-new Celery app on every workload send. When sends run in-process (a single task per heartbeat, or sync_parallelism == 1) this happens in the long-lived scheduler/executor process, so each rebuilt app — and its broker/result-backend connection pools — accumulates and is never released, leaking memory over time. PR apache#60675 assumed sends always run in throwaway ProcessPoolExecutor subprocesses, where a fresh app per send is harmless. The in-process hot path in _send_workloads_to_celery() breaks that assumption. Cache apps by their resolved (team-aware) name so each distinct app is built once per process and reused. Config is static per app-name per process, so this is safe for both call sites: subprocesses still build their own app once, and the in-process path stops recreating it on every send. Add regression tests for app caching/reuse and per-team app isolation.
e51915a to
b1dab47
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
create_celery_app()is now cached by the resolved (team-aware) app name, so aCelery app is built once per process and reused on subsequent sends instead of
being rebuilt on every workload send.
Why / Root cause
CeleryExecutor._send_workloads_to_celery()has two paths:ProcessPoolExecutor. Eachsubprocess builds its own Celery app and is then torn down, so a fresh app per
send is harmless.
sync_parallelism == 1,send_workload_to_executor()runs directly in thelong-lived scheduler/executor process.
send_workload_to_executor()callscreate_celery_app()on every invocation.On the in-process path this meant a brand-new Celery app (with new broker /
result-backend connection pools) was constructed for every single send and
never released — a steady memory leak in a long-running scheduler process.
PR #60675 introduced the team-aware app construction under the assumption that
sends always happen in throwaway subprocesses. That assumption does not hold for
the in-process path.
Why this matters in practice (Helm / Kubernetes)
Since #58733, the official Helm chart derives
celery.sync_parallelismfrom the scheduler's CPU limit (millicores rounded upto whole cores). Any scheduler with a CPU limit of ≤ 1000m (≤ 1 core) —
a very common configuration — therefore renders
sync_parallelism = 1, whichmakes
_send_workloads_to_celery()take the in-process hot path on everysend. These deployments hit the leak continuously.
Fix
_celery_app_cache: dict[str, Celery]keyed by theresolved (team-suffixed) app name.
create_celery_app(), return the cached app if one exists for that name,otherwise build it and store it before returning.
Celery config is static per app-name per process, so reuse is safe for both call
sites:
Testing
test_app_is_cached_and_reused—create_celery_app(conf)returns the sameinstance on repeated calls.
test_distinct_teams_get_distinct_apps(3.2+) — different teams get distinctapps, same team reuses one app.
_clear_celery_app_cacheautouse fixture isolates the cache withinTestCreateCeleryAppTeamIsolation, andTestMultiTeamCeleryExecutorsetup/teardown clear it too (the cache, being keyed by app name, otherwise
leaks built apps across tests). Both clears are required — verified empirically:
removing them makes
test_team_specific_broker_not_overwrittenfail.Notes
The regression tests fail when the production cache change is reverted.
Reproducing
details
Was generative AI tooling used to co-author this PR?
Generated-by: [Claude Opus 4.8] following the guidelines