You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When LangfuseTraceSyncService triggers a refresh for a per-user Langfuse repo and the submission fails with DuplicateJobError (stale RUNNING job from Bug #372), the error is caught and logged as a warning but there is no circuit breaker, backoff, or escalation. The sync service retries on every sync cycle (typically every 60 seconds) indefinitely, generating the same warning log entry forever.
src/code_indexer/server/repositories/background_jobs.py - No API to detect stale jobs
Proposed Fix
Primary (depends on Bug #372 fix): Once stale jobs are cleaned on startup, this bug becomes moot for the restart scenario.
Defense-in-depth: Add a staleness check before raising DuplicateJobError. In _check_operation_conflict(), check if the conflicting job's started_at is older than cidx_index_timeout + 300s (buffer). If so, transition it to FAILED automatically and allow the new submission.
Optional: Add per-repo retry counter in LangfuseTraceSyncService that escalates from WARNING to ERROR after 3 consecutive failures for the same alias, with a suggestion to check job status.
Impact
Medium: Log noise (thousands of identical warnings per day), but the actual data continues syncing to disk correctly
The warnings obscure real errors in the log stream
Secondary: each failed submission attempt adds SQLite read pressure from _check_operation_conflict() scanning all jobs
Contributes to "database is locked" contention observed in logs
Evidence (Staging Logs 2026-03-07)
Over 15 identical "already running" warnings across a 6-hour window for just two repos:
langfuse_Claude_Code_unknown-global - 5+ warnings with rotating stale job IDs
langfuse_Claude_Code_seba.battig_lightspeeddms.com-global - 10+ warnings with rotating stale job IDs
Description
When
LangfuseTraceSyncServicetriggers a refresh for a per-user Langfuse repo and the submission fails withDuplicateJobError(stale RUNNING job from Bug #372), the error is caught and logged as a warning but there is no circuit breaker, backoff, or escalation. The sync service retries on every sync cycle (typically every 60 seconds) indefinitely, generating the same warning log entry forever.Reproduction Steps
Expected Behavior
After N consecutive
DuplicateJobErrorfailures for the same repo, the sync service should either:cidx_index_timeout+ buffer)Actual Behavior
Same warning, same stale job ID, every sync cycle, forever.
Root Cause
In
langfuse_trace_sync_service.py:253-259, the refresh trigger is wrapped in a bareexcept Exceptionthat logs a warning and continues:No retry tracking, no backoff, no staleness detection.
Affected Files
src/code_indexer/server/services/langfuse_trace_sync_service.py:253-259- Missing circuit breakersrc/code_indexer/server/repositories/background_jobs.py- No API to detect stale jobsProposed Fix
Primary (depends on Bug #372 fix): Once stale jobs are cleaned on startup, this bug becomes moot for the restart scenario.
Defense-in-depth: Add a staleness check before raising
DuplicateJobError. In_check_operation_conflict(), check if the conflicting job'sstarted_atis older thancidx_index_timeout + 300s(buffer). If so, transition it to FAILED automatically and allow the new submission.Optional: Add per-repo retry counter in
LangfuseTraceSyncServicethat escalates from WARNING to ERROR after 3 consecutive failures for the same alias, with a suggestion to check job status.Impact
_check_operation_conflict()scanning all jobsEvidence (Staging Logs 2026-03-07)
Over 15 identical "already running" warnings across a 6-hour window for just two repos:
langfuse_Claude_Code_unknown-global- 5+ warnings with rotating stale job IDslangfuse_Claude_Code_seba.battig_lightspeeddms.com-global- 10+ warnings with rotating stale job IDs