Skip to content

Conversation

@SgtPooki
Copy link
Collaborator

Summary

  • Replace the per-session advisory lock with a DB-backed per‑SP mutex to prevent concurrent deal/retrieval jobs for the same SP across all workers.
  • Record the worker hostname and job ID with the mutex for observability and stale-lock recovery.
  • Add runbook guidance for capacity and backlog limits.

Problem

Staging showed spikes (e.g., 72 deals in flight) despite a “per‑SP” lock. The advisory lock was session-scoped, so with pooled connections and per-worker concurrency (10 deals + 10 retrievals), multiple jobs could run against the same SP on a single worker. With 5 workers, this allowed many concurrent deal/retrieval jobs for the same SP.

Fix

  • Introduced a job_mutex table with a unique constraint on sp_address.
  • Replaced advisory lock acquisition/release with mutex insert/delete using job ID and hostname.
  • Added stale-lock replacement via JOB_LOCK_STALE_SECONDS.

Notes

  • Lock remains global per SP (deal + retrieval share the same mutex).
  • Concurrency is now limited by SP count rather than per-worker sessions.
  • See docs/runbooks/jobs.md for capacity formulas and example limits.

@FilOzzy FilOzzy added this to FOC Feb 10, 2026
@github-project-automation github-project-automation bot moved this to 📌 Triage in FOC Feb 10, 2026
@SgtPooki SgtPooki marked this pull request as ready for review February 10, 2026 14:25
Copilot AI review requested due to automatic review settings February 10, 2026 14:25
@SgtPooki SgtPooki moved this from 📌 Triage to 🔎 Awaiting review in FOC Feb 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the previous session-scoped per‑SP advisory lock with a DB-backed per‑SP mutex to prevent concurrent deal/retrieval execution for the same provider across all pg-boss workers, and updates operational docs accordingly.

Changes:

  • Add job_mutex table + repository methods to acquire/release a per‑SP mutex with stale-lock takeover.
  • Update pg-boss workers to use the new mutex (recording job ID + worker identity) and requeue on contention.
  • Document capacity/backlog sizing guidance and introduce JOB_LOCK_STALE_SECONDS configuration/docs.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
docs/runbooks/jobs.md Adds operational guidance and capacity/backlog formulas; notes global per‑SP mutex behavior.
docs/infra.md Updates infra notes to reflect DB-backed per‑SP mutex with multi-replica workers.
docs/environment-variables.md Documents new JOB_LOCK_STALE_SECONDS env var.
apps/backend/src/jobs/repositories/job-schedule.repository.ts Implements SQL-based acquire/release for per‑SP mutex via job_mutex.
apps/backend/src/jobs/jobs.service.ts Switches worker handlers from advisory locks to DB mutex; adds worker identity + stale-lock config usage.
apps/backend/src/jobs/jobs.service.spec.ts Updates unit tests to mock/expect mutex acquisition calls.
apps/backend/src/database/migrations/1760000000003-AddJobMutex.ts Adds the job_mutex table and indexes.
apps/backend/src/config/app.config.ts Adds config validation/loading for JOB_LOCK_STALE_SECONDS.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

SgtPooki and others added 4 commits February 10, 2026 09:40
* docs: document how our job system works

* fix: log full pieceCid, CID, and providerAddresses (#244)

* fix: log full pieceCid, CID, and providerAddresses

* chore: fix lint

* docs: add architecture diagram for jobs

* chore(deps): bump tailwind-merge from 2.6.0 to 3.4.0 (#202)

Bumps [tailwind-merge](https://github.com/dcastil/tailwind-merge) from 2.6.0 to 3.4.0.
- [Release notes](https://github.com/dcastil/tailwind-merge/releases)
- [Commits](dcastil/tailwind-merge@v2.6.0...v3.4.0)

---
updated-dependencies:
- dependency-name: tailwind-merge
  dependency-version: 3.4.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump the npm-dependencies group in /apps/backend with 9 updates (#255)

* chore(deps): bump the npm-dependencies group

Bumps the npm-dependencies group in /apps/backend with 9 updates:

| Package | From | To |
| --- | --- | --- |
| [@nestjs/common](https://github.com/nestjs/nest/tree/HEAD/packages/common) | `11.1.6` | `11.1.13` |
| [@nestjs/config](https://github.com/nestjs/config) | `4.0.2` | `4.0.3` |
| [@nestjs/core](https://github.com/nestjs/nest/tree/HEAD/packages/core) | `11.1.6` | `11.1.13` |
| [@nestjs/platform-express](https://github.com/nestjs/nest/tree/HEAD/packages/platform-express) | `11.1.6` | `11.1.13` |
| [@nestjs/schedule](https://github.com/nestjs/schedule) | `6.0.0` | `6.1.1` |
| [@nestjs/swagger](https://github.com/nestjs/swagger) | `11.2.0` | `11.2.6` |
| [axios](https://github.com/axios/axios) | `1.11.0` | `1.13.5` |
| [undici](https://github.com/nodejs/undici) | `7.16.0` | `7.21.0` |
| [@nestjs/testing](https://github.com/nestjs/nest/tree/HEAD/packages/testing) | `11.1.6` | `11.1.13` |


Updates `@nestjs/common` from 11.1.6 to 11.1.13
- [Release notes](https://github.com/nestjs/nest/releases)
- [Commits](https://github.com/nestjs/nest/commits/v11.1.13/packages/common)

Updates `@nestjs/config` from 4.0.2 to 4.0.3
- [Release notes](https://github.com/nestjs/config/releases)
- [Commits](nestjs/config@4.0.2...4.0.3)

Updates `@nestjs/core` from 11.1.6 to 11.1.13
- [Release notes](https://github.com/nestjs/nest/releases)
- [Commits](https://github.com/nestjs/nest/commits/v11.1.13/packages/core)

Updates `@nestjs/platform-express` from 11.1.6 to 11.1.13
- [Release notes](https://github.com/nestjs/nest/releases)
- [Commits](https://github.com/nestjs/nest/commits/v11.1.13/packages/platform-express)

Updates `@nestjs/schedule` from 6.0.0 to 6.1.1
- [Release notes](https://github.com/nestjs/schedule/releases)
- [Commits](nestjs/schedule@6.0.0...6.1.1)

Updates `@nestjs/swagger` from 11.2.0 to 11.2.6
- [Release notes](https://github.com/nestjs/swagger/releases)
- [Commits](nestjs/swagger@11.2.0...11.2.6)

Updates `axios` from 1.11.0 to 1.13.5
- [Release notes](https://github.com/axios/axios/releases)
- [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md)
- [Commits](axios/axios@v1.11.0...v1.13.5)

Updates `undici` from 7.16.0 to 7.21.0
- [Release notes](https://github.com/nodejs/undici/releases)
- [Commits](nodejs/undici@v7.16.0...v7.21.0)

Updates `@nestjs/testing` from 11.1.6 to 11.1.13
- [Release notes](https://github.com/nestjs/nest/releases)
- [Commits](https://github.com/nestjs/nest/commits/v11.1.13/packages/testing)

---
updated-dependencies:
- dependency-name: "@nestjs/common"
  dependency-version: 11.1.13
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: npm-dependencies
- dependency-name: "@nestjs/config"
  dependency-version: 4.0.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: npm-dependencies
- dependency-name: "@nestjs/core"
  dependency-version: 11.1.13
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: npm-dependencies
- dependency-name: "@nestjs/platform-express"
  dependency-version: 11.1.13
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: npm-dependencies
- dependency-name: "@nestjs/schedule"
  dependency-version: 6.1.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: npm-dependencies
- dependency-name: "@nestjs/swagger"
  dependency-version: 11.2.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: npm-dependencies
- dependency-name: axios
  dependency-version: 1.13.5
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: npm-dependencies
- dependency-name: undici
  dependency-version: 7.21.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: npm-dependencies
- dependency-name: "@nestjs/testing"
  dependency-version: 11.1.13
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: npm-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: update pnpm lock file

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com>

* chore(deps-dev): bump @types/node from 22.19.3 to 25.2.2 (#253)

Bumps [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) from 22.19.3 to 25.2.2.
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node)

---
updated-dependencies:
- dependency-name: "@types/node"
  dependency-version: 25.2.2
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps-dev): bump jsdom from 27.4.0 to 28.0.0 (#254)

Bumps [jsdom](https://github.com/jsdom/jsdom) from 27.4.0 to 28.0.0.
- [Release notes](https://github.com/jsdom/jsdom/releases)
- [Changelog](https://github.com/jsdom/jsdom/blob/main/Changelog.md)
- [Commits](jsdom/jsdom@27.4.0...28.0.0)

---
updated-dependencies:
- dependency-name: jsdom
  dependency-version: 28.0.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump pg-boss from 6.2.2 to 12.9.0 in /apps/backend (#256)

* chore(deps): bump pg-boss from 6.2.2 to 12.9.0 in /apps/backend

Bumps [pg-boss](https://github.com/timgit/pg-boss) from 6.2.2 to 12.9.0.
- [Release notes](https://github.com/timgit/pg-boss/releases)
- [Commits](timgit/pg-boss@6.2.2...12.9.0)

---
updated-dependencies:
- dependency-name: pg-boss
  dependency-version: 12.9.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: use new pg-boss version

* chore: rename batchSize vars

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com>

* docs: fix docs after upgrading pg-boss

* fix: merge from main issues

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🔎 Awaiting review

Development

Successfully merging this pull request may close these issues.

1 participant