Skip to content

Conversation

@drazisil-codecov
Copy link
Contributor

@drazisil-codecov drazisil-codecov commented Jan 14, 2026

Summary

Adds a feature-flagged option to use per-task database sessions instead of shared worker-level sessions. This prevents session contamination when a task crashes or times out mid-transaction.

Problem

When tasks share DB sessions at the worker level, a crash or timeout can leave the session in an invalid state ('prepared' or pending rollback). Subsequent tasks fail with SQLAlchemy transaction errors.

Impact: ~10,000 errors/month across Upload, UploadProcessor, UploadFinisher, and Notify tasks.

Sentry Issues

Issue Occurrences Task
WORKER-Q8N 5,164 UploadFinisher
WORKER-PZ7 1,851 UploadFinisher
WORKER-PHS 1,183 UploadProcessor
WORKER-W68 892 Upload
WORKER-WC7 266 Notify

Solution

When setup.tasks.per_task_db_sessions.enabled is true:

  1. At task start: Call get_db_session.remove() to clear any stale session
  2. Generate session ID: Unique ID for logging correlation
  3. At task end: Commit/close and call get_db_session.remove() to prevent contamination

Configuration

setup:
  tasks:
    per_task_db_sessions:
      enabled: true           # Master switch (default: false)
      queues:                 # Optional: only enable for these queues
        - timeseries
      tasks:                  # Optional: only enable for these task names
        - app.tasks.upload_finisher.UploadFinisherTask

Rollout behavior:

  • If queues and tasks are both empty → enabled for ALL tasks
  • If queues or tasks are specified → only enabled for matching tasks
  • enabled: false → disabled for everyone (instant rollback)

Changes

  • Add use_per_task_db_sessions(task_name, queue_name) config helper function
  • Modify BaseCodecovTask.run() to manage session lifecycle when enabled
  • Update wrap_up_dbsession() to accept per_task_sessions and session_id params
  • Add comprehensive logging with session IDs and queue names for debugging
  • Add unit tests for feature flag logic and session lifecycle

Rollout Plan

  1. Deploy with feature flag OFF (default)
  2. Enable for single low-risk queue:
    enabled: true
    queues: ["timeseries"]
  3. Monitor Sentry for 24-48 hours
  4. Add high-error tasks:
    tasks: ["app.tasks.upload_finisher.UploadFinisherTask"]
  5. Full rollout (empty queues/tasks lists)

Risk Assessment

High risk — This touches core task infrastructure.

Mitigations:

  • Feature flag for instant rollback (just change config, no deploy needed)
  • Granular queue-by-queue or task-by-task rollout
  • Extensive logging for debugging
  • Legacy code path remains until fully validated

Linear Issue

CCMRG-2009


Note

Introduces an opt-in per-task SQLAlchemy session lifecycle to prevent cross-task session contamination and enable gradual rollout via config.

  • New use_per_task_db_sessions(task_name, queue_name) reads setup.tasks.per_task_db_sessions (enabled + optional queues/tasks filters)
  • BaseCodecovTask.run() optionally clears any stale session at start, generates a short session_id for logs, and passes flags to cleanup
  • wrap_up_dbsession(db_session, per_task_sessions, session_id) enhanced to always remove the session when enabled, and improve timeout/invalid-state recovery and logging
  • Adds targeted logging for session start/commit/remove and includes queue/task context
  • Extends unit tests to cover config gating, run-time session removal, and wrap-up behavior; updates existing tests for new method signature

Written by Cursor Bugbot for commit 7074895. This will update automatically on new commits. Configure here.

Adds a feature-flagged option to use per-task database sessions instead
of shared worker-level sessions. This prevents session contamination
when a task crashes or times out mid-transaction.

## Problem

When tasks share DB sessions at the worker level, a crash or timeout
can leave the session in an invalid state ('prepared' or pending
rollback). Subsequent tasks fail with SQLAlchemy transaction errors.
This causes ~10,000 errors/month across Upload, UploadProcessor,
UploadFinisher, and Notify tasks.

## Solution

When `setup.tasks.use_per_task_db_sessions` is enabled:
- Call `get_db_session.remove()` at task start to clear any stale session
- Generate unique session ID for logging correlation
- Log session lifecycle (creation, commit/close, removal)
- Call `get_db_session.remove()` at task end to prevent contamination

The feature flag defaults to False, allowing gradual rollout.

## Changes

- Add `use_per_task_db_sessions()` config helper function
- Modify `BaseCodecovTask.run()` to manage session lifecycle
- Update `wrap_up_dbsession()` to accept per_task_sessions param
- Add comprehensive logging with session IDs
- Add unit tests for new behavior

Fixes: WORKER-Q8N, WORKER-PZ7, WORKER-PHS, WORKER-W68, WORKER-WC7
Linear: CCMRG-2009
@sentry
Copy link

sentry bot commented Jan 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.35%. Comparing base (6e3603b) to head (7074895).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #654   +/-   ##
=======================================
  Coverage   93.35%   93.35%           
=======================================
  Files        1292     1292           
  Lines       47128    47161   +33     
  Branches     1567     1567           
=======================================
+ Hits        43996    44029   +33     
  Misses       2823     2823           
  Partials      309      309           
Flag Coverage Δ
workerintegration 59.09% <42.85%> (-0.05%) ⬇️
workerunit 91.28% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codecov-notifications
Copy link

codecov-notifications bot commented Jan 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Enhances the per-task DB sessions feature with granular rollout control:

Config structure:
  setup.tasks.per_task_db_sessions:
    enabled: true          # Master switch
    queues: ["timeseries"]  # Only enable for these queues
    tasks: []              # Or specific task names

Rollout examples:
1. Start with single queue:
   queues: ["timeseries"]

2. Add specific high-error tasks:
   tasks: ["app.tasks.upload_finisher.UploadFinisherTask"]

3. Full rollout (empty lists = all):
   enabled: true
   queues: []
   tasks: []

Also logs queue_name in session lifecycle messages for debugging.
@drazisil-codecov drazisil-codecov marked this pull request as ready for review January 16, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants