-
Notifications
You must be signed in to change notification settings - Fork 8
fix: workers use mutex table for SP lock #247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR replaces the previous session-scoped per‑SP advisory lock with a DB-backed per‑SP mutex to prevent concurrent deal/retrieval execution for the same provider across all pg-boss workers, and updates operational docs accordingly.
Changes:
- Add
job_mutextable + repository methods to acquire/release a per‑SP mutex with stale-lock takeover. - Update pg-boss workers to use the new mutex (recording job ID + worker identity) and requeue on contention.
- Document capacity/backlog sizing guidance and introduce
JOB_LOCK_STALE_SECONDSconfiguration/docs.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/runbooks/jobs.md | Adds operational guidance and capacity/backlog formulas; notes global per‑SP mutex behavior. |
| docs/infra.md | Updates infra notes to reflect DB-backed per‑SP mutex with multi-replica workers. |
| docs/environment-variables.md | Documents new JOB_LOCK_STALE_SECONDS env var. |
| apps/backend/src/jobs/repositories/job-schedule.repository.ts | Implements SQL-based acquire/release for per‑SP mutex via job_mutex. |
| apps/backend/src/jobs/jobs.service.ts | Switches worker handlers from advisory locks to DB mutex; adds worker identity + stale-lock config usage. |
| apps/backend/src/jobs/jobs.service.spec.ts | Updates unit tests to mock/expect mutex acquisition calls. |
| apps/backend/src/database/migrations/1760000000003-AddJobMutex.ts | Adds the job_mutex table and indexes. |
| apps/backend/src/config/app.config.ts | Adds config validation/loading for JOB_LOCK_STALE_SECONDS. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
* docs: document how our job system works * fix: log full pieceCid, CID, and providerAddresses (#244) * fix: log full pieceCid, CID, and providerAddresses * chore: fix lint * docs: add architecture diagram for jobs * chore(deps): bump tailwind-merge from 2.6.0 to 3.4.0 (#202) Bumps [tailwind-merge](https://github.com/dcastil/tailwind-merge) from 2.6.0 to 3.4.0. - [Release notes](https://github.com/dcastil/tailwind-merge/releases) - [Commits](dcastil/tailwind-merge@v2.6.0...v3.4.0) --- updated-dependencies: - dependency-name: tailwind-merge dependency-version: 3.4.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump the npm-dependencies group in /apps/backend with 9 updates (#255) * chore(deps): bump the npm-dependencies group Bumps the npm-dependencies group in /apps/backend with 9 updates: | Package | From | To | | --- | --- | --- | | [@nestjs/common](https://github.com/nestjs/nest/tree/HEAD/packages/common) | `11.1.6` | `11.1.13` | | [@nestjs/config](https://github.com/nestjs/config) | `4.0.2` | `4.0.3` | | [@nestjs/core](https://github.com/nestjs/nest/tree/HEAD/packages/core) | `11.1.6` | `11.1.13` | | [@nestjs/platform-express](https://github.com/nestjs/nest/tree/HEAD/packages/platform-express) | `11.1.6` | `11.1.13` | | [@nestjs/schedule](https://github.com/nestjs/schedule) | `6.0.0` | `6.1.1` | | [@nestjs/swagger](https://github.com/nestjs/swagger) | `11.2.0` | `11.2.6` | | [axios](https://github.com/axios/axios) | `1.11.0` | `1.13.5` | | [undici](https://github.com/nodejs/undici) | `7.16.0` | `7.21.0` | | [@nestjs/testing](https://github.com/nestjs/nest/tree/HEAD/packages/testing) | `11.1.6` | `11.1.13` | Updates `@nestjs/common` from 11.1.6 to 11.1.13 - [Release notes](https://github.com/nestjs/nest/releases) - [Commits](https://github.com/nestjs/nest/commits/v11.1.13/packages/common) Updates `@nestjs/config` from 4.0.2 to 4.0.3 - [Release notes](https://github.com/nestjs/config/releases) - [Commits](nestjs/config@4.0.2...4.0.3) Updates `@nestjs/core` from 11.1.6 to 11.1.13 - [Release notes](https://github.com/nestjs/nest/releases) - [Commits](https://github.com/nestjs/nest/commits/v11.1.13/packages/core) Updates `@nestjs/platform-express` from 11.1.6 to 11.1.13 - [Release notes](https://github.com/nestjs/nest/releases) - [Commits](https://github.com/nestjs/nest/commits/v11.1.13/packages/platform-express) Updates `@nestjs/schedule` from 6.0.0 to 6.1.1 - [Release notes](https://github.com/nestjs/schedule/releases) - [Commits](nestjs/schedule@6.0.0...6.1.1) Updates `@nestjs/swagger` from 11.2.0 to 11.2.6 - [Release notes](https://github.com/nestjs/swagger/releases) - [Commits](nestjs/swagger@11.2.0...11.2.6) Updates `axios` from 1.11.0 to 1.13.5 - [Release notes](https://github.com/axios/axios/releases) - [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md) - [Commits](axios/axios@v1.11.0...v1.13.5) Updates `undici` from 7.16.0 to 7.21.0 - [Release notes](https://github.com/nodejs/undici/releases) - [Commits](nodejs/undici@v7.16.0...v7.21.0) Updates `@nestjs/testing` from 11.1.6 to 11.1.13 - [Release notes](https://github.com/nestjs/nest/releases) - [Commits](https://github.com/nestjs/nest/commits/v11.1.13/packages/testing) --- updated-dependencies: - dependency-name: "@nestjs/common" dependency-version: 11.1.13 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: npm-dependencies - dependency-name: "@nestjs/config" dependency-version: 4.0.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: npm-dependencies - dependency-name: "@nestjs/core" dependency-version: 11.1.13 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: npm-dependencies - dependency-name: "@nestjs/platform-express" dependency-version: 11.1.13 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: npm-dependencies - dependency-name: "@nestjs/schedule" dependency-version: 6.1.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: npm-dependencies - dependency-name: "@nestjs/swagger" dependency-version: 11.2.6 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: npm-dependencies - dependency-name: axios dependency-version: 1.13.5 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: npm-dependencies - dependency-name: undici dependency-version: 7.21.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: npm-dependencies - dependency-name: "@nestjs/testing" dependency-version: 11.1.13 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: npm-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> * chore: update pnpm lock file --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com> * chore(deps-dev): bump @types/node from 22.19.3 to 25.2.2 (#253) Bumps [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) from 22.19.3 to 25.2.2. - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) --- updated-dependencies: - dependency-name: "@types/node" dependency-version: 25.2.2 dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps-dev): bump jsdom from 27.4.0 to 28.0.0 (#254) Bumps [jsdom](https://github.com/jsdom/jsdom) from 27.4.0 to 28.0.0. - [Release notes](https://github.com/jsdom/jsdom/releases) - [Changelog](https://github.com/jsdom/jsdom/blob/main/Changelog.md) - [Commits](jsdom/jsdom@27.4.0...28.0.0) --- updated-dependencies: - dependency-name: jsdom dependency-version: 28.0.0 dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump pg-boss from 6.2.2 to 12.9.0 in /apps/backend (#256) * chore(deps): bump pg-boss from 6.2.2 to 12.9.0 in /apps/backend Bumps [pg-boss](https://github.com/timgit/pg-boss) from 6.2.2 to 12.9.0. - [Release notes](https://github.com/timgit/pg-boss/releases) - [Commits](timgit/pg-boss@6.2.2...12.9.0) --- updated-dependencies: - dependency-name: pg-boss dependency-version: 12.9.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * fix: use new pg-boss version * chore: rename batchSize vars --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com> * docs: fix docs after upgrading pg-boss * fix: merge from main issues --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Summary
Problem
Staging showed spikes (e.g., 72 deals in flight) despite a “per‑SP” lock. The advisory lock was session-scoped, so with pooled connections and per-worker concurrency (10 deals + 10 retrievals), multiple jobs could run against the same SP on a single worker. With 5 workers, this allowed many concurrent deal/retrieval jobs for the same SP.
Fix
job_mutextable with a unique constraint onsp_address.JOB_LOCK_STALE_SECONDS.Notes
docs/runbooks/jobs.mdfor capacity formulas and example limits.