GitHub Issue #783: Server lockup when updating data class domain design #7332

XingY · 2026-01-15T17:58:54Z

Rationale

Lock provisioned table on domain update
Operations such as add/drop column requires ACCESS EXCLUSIVE lock on the table. If another transaction performed a SELECT on a provisioned table, adding/dropping columns from the provisioned table would have to wait until the other transaction to complete. If the other transaction happened to be waiting for the add/drop column transaction (in this case, updating exp.dataclass table), the two would dead lock.

This PR adds a ACCESS EXCLUSIVE MODE LOCK on the provisioned table when it's queried in the transaction.

Related Pull Requests

Changes

move exp.dataclass an exp.materialsource indexing to be done after exp.data and exp.material indexing to avoid holding on to lock.

labkey-jeckels · 2026-01-16T17:43:45Z

I see the search indexer activity in later dumps but it looks to be idle in the first dumps in the log. Thus, I don't think it's causing the deadlock here, though it is piling on.

In the first dump, I see two problems, which may or may not be connected.

One is synchronization in PipelineQueueImpl. There's a job cancellation attempt from https-jsse-nio-443-exec-1 that's holding the lock but apparently not actually closing the DB connection. Other threads like https-jsse-nio-443-exec-10 and https-jsse-nio-443-exec-16 are trying to get the lock.

The second is related to the construct domain. https-jsse-nio-443-exec-2 and https-jsse-nio-443-exec-11 are both trying to update the domain. One is trying to update the exp.dataclass row while the other is trying to add the column to the provisioned table. https-jsse-nio-443-exec-24 and other threads are blocked trying to query that domain. JobThread-2.1 is running a domain validation job that's also blocked trying to query it.

Can you take another look at the earliest dump in the log and see if you agree with my assessment? And if so, do you think your patch will help, or should we pursue a different fix? Your patch seems OK to me, but I'm worried it won't address the root problem.

XingY · 2026-01-20T02:27:57Z

I see the search indexer activity in later dumps but it looks to be idle in the first dumps in the log. Thus, I don't think it's causing the deadlock here, though it is piling on.

In the first dump, I see two problems, which may or may not be connected.

One is synchronization in PipelineQueueImpl. There's a job cancellation attempt from https-jsse-nio-443-exec-1 that's holding the lock but apparently not actually closing the DB connection. Other threads like https-jsse-nio-443-exec-10 and https-jsse-nio-443-exec-16 are trying to get the lock.

The second is related to the construct domain. https-jsse-nio-443-exec-2 and https-jsse-nio-443-exec-11 are both trying to update the domain. One is trying to update the exp.dataclass row while the other is trying to add the column to the provisioned table. https-jsse-nio-443-exec-24 and other threads are blocked trying to query that domain. JobThread-2.1 is running a domain validation job that's also blocked trying to query it.

Can you take another look at the earliest dump in the log and see if you agree with my assessment? And if so, do you think your patch will help, or should we pursue a different fix? Your patch seems OK to me, but I'm worried it won't address the root problem.

Another change made to lock provisioned table IN ACCESS EXCLUSIVE MODE. I don't think it hurts to keep the search indexer change so I'll leave that in.

labkey-jeckels

Would be good for @labkey-matthewb to review as well. He added the delete-specific locking in June. This adds locking to the update scenario, and makes the lock even more restrictive.

#6775

https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES

labkey-jeckels · 2026-01-21T21:05:56Z

api/src/org/labkey/api/exp/property/Domain.java

     */
    Lock getDatabaseLock();
-    void lockForDelete(DbSchema expSchema);
+    void lockForUpdateDelete(DbSchema lockSchema);


I know this isn't from your change, but I don't understand why we're passing this as an argument. It looks like all callers are using the exp schema.

labkey-jeckels · 2026-01-21T21:10:21Z

experiment/src/org/labkey/experiment/api/SampleTypeServiceImpl.java

+            indexSampleTypeMaterials(sampleType, q);
+
+            // GitHub Issue 783: Server lockup when updating data class domain design
+            // Index MaterialSource after materials indexing, to avoid holding locks on exp.MaterialSource table for too long


I'm fine with the reordering, but I'm confused by the "holding locks on exp.MaterialSource table for too long" comment. Are we somehow holding locks after fetching the rows and closing the ResultSet?

We are not holding locks after reading and closing the resultset. However, since the index task also updates exp.materialsource.lastindex, it locks this exp.materialsource row for other transactions to update. In this case, it blocks the updateDomain.api from updating exp.materialsource.modified (after it had already ADD COLUMN on provisioned).

Then I'm not understanding how the ordering makes a difference. For deadlocks, it shouldn't matter if you lock X and then Y, or Y and then X, as long as you're not holding them both at the same time, correct? And the indexing part shouldn't keep a lock open.

Let me know if it would be easier to talk through the scenario. I'm probably missing something.

So indexer thread locked exp.materialsource (by updating it early in the transaction, it holds edit lock to that row till the end of transaction), indexer thread wants to read provisioned (which is currently access locked by saveDomain thread, which prevents reading).
saveDomain thread locked provisioned by Add Column (until end of transaction), saveDomain wants to update exp.materialsource, which is locked by indexer thread.
The 2 threads are waiting for each other to release the lock and cannot proceed, hence the deadlock.

Perhaps the confusion (on my part) is the fact that indexer task is not wrapped in a transaction, and hence should not lock.

Update: in the case of query.saveRows, for example, indexing is done inside a transaction.

Based on our discussion, the change of ordering has been reverted.

labkey-matthewb · 2026-01-21T21:29:13Z

api/src/org/labkey/api/exp/property/DomainUtil.java


+        var lockSchema = ExperimentService.get().getSchema();
+        if (lockSchema.getScope().isTransactionActive())
+            d.lockForUpdateDelete(lockSchema);


This lock is very heavy-handed. Is it possible to only acquire this if/when we know we have actual changes to the provisioned table?

That might be too late though. In this case the other transaction acquired shared lock before the current transaction perform Add column, resulting in failure to Add column, because the other transaction is also waiting for the current transaction to release write lock on exp.dataclass.

What we really need I think is READ "nolock", which postgres doesn't seem to support.

GitHub Issue #783: Server lockup when updating data class domain design

4586f8f

XingY requested a review from labkey-jeckels January 15, 2026 17:58

XingY added 7 commits January 19, 2026 12:43

lock provisioned table on domain update

4b6a6d2

Merge branch 'develop' into fb_dataclassLock

9e42f38

lock

d1da5c7

transaction

2e3dab8

revert transaction

10e4700

revert transaction

5e8f69e

Merge branch 'develop' into fb_dataclassLock

c5017f3

XingY requested review from labkey-jeckels and removed request for labkey-jeckels January 21, 2026 18:16

XingY self-assigned this Jan 21, 2026

labkey-jeckels approved these changes Jan 21, 2026

View reviewed changes

labkey-jeckels requested a review from labkey-matthewb January 21, 2026 21:13

labkey-matthewb reviewed Jan 21, 2026

View reviewed changes

XingY added 2 commits January 22, 2026 09:52

Merge branch 'develop' into fb_dataclassLock

075770b

revert indexing order

bbf426a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Issue #783: Server lockup when updating data class domain design #7332

GitHub Issue #783: Server lockup when updating data class domain design #7332

Uh oh!

XingY commented Jan 15, 2026 •

edited

Loading

Uh oh!

labkey-jeckels commented Jan 16, 2026

Uh oh!

XingY commented Jan 20, 2026

Uh oh!

labkey-jeckels left a comment

Uh oh!

labkey-jeckels Jan 21, 2026

Uh oh!

labkey-jeckels Jan 21, 2026

Uh oh!

XingY Jan 21, 2026

Uh oh!

labkey-jeckels Jan 21, 2026

Uh oh!

XingY Jan 21, 2026

Uh oh!

XingY Jan 21, 2026 •

edited

Loading

Uh oh!

XingY Jan 22, 2026

Uh oh!

labkey-matthewb Jan 21, 2026

Uh oh!

XingY Jan 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GitHub Issue #783: Server lockup when updating data class domain design #7332

Are you sure you want to change the base?

GitHub Issue #783: Server lockup when updating data class domain design #7332

Uh oh!

Conversation

XingY commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

Related Pull Requests

Changes

Uh oh!

labkey-jeckels commented Jan 16, 2026

Uh oh!

XingY commented Jan 20, 2026

Uh oh!

labkey-jeckels left a comment

Choose a reason for hiding this comment

Uh oh!

labkey-jeckels Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

labkey-jeckels Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

XingY Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

labkey-jeckels Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

XingY Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

XingY Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XingY Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

labkey-matthewb Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

XingY Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

XingY commented Jan 15, 2026 •

edited

Loading

XingY Jan 21, 2026 •

edited

Loading

XingY Jan 21, 2026 •

edited

Loading