-
Notifications
You must be signed in to change notification settings - Fork 7
GitHub Issue #783: Server lockup when updating data class domain design #7332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
I see the search indexer activity in later dumps but it looks to be idle in the first dumps in the log. Thus, I don't think it's causing the deadlock here, though it is piling on. In the first dump, I see two problems, which may or may not be connected. One is synchronization in PipelineQueueImpl. There's a job cancellation attempt from The second is related to the construct domain. Can you take another look at the earliest dump in the log and see if you agree with my assessment? And if so, do you think your patch will help, or should we pursue a different fix? Your patch seems OK to me, but I'm worried it won't address the root problem. |
Another change made to lock provisioned table IN ACCESS EXCLUSIVE MODE. I don't think it hurts to keep the search indexer change so I'll leave that in. |
labkey-jeckels
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good for @labkey-matthewb to review as well. He added the delete-specific locking in June. This adds locking to the update scenario, and makes the lock even more restrictive.
https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES
| */ | ||
| Lock getDatabaseLock(); | ||
| void lockForDelete(DbSchema expSchema); | ||
| void lockForUpdateDelete(DbSchema lockSchema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't from your change, but I don't understand why we're passing this as an argument. It looks like all callers are using the exp schema.
| indexSampleTypeMaterials(sampleType, q); | ||
|
|
||
| // GitHub Issue 783: Server lockup when updating data class domain design | ||
| // Index MaterialSource after materials indexing, to avoid holding locks on exp.MaterialSource table for too long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the reordering, but I'm confused by the "holding locks on exp.MaterialSource table for too long" comment. Are we somehow holding locks after fetching the rows and closing the ResultSet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not holding locks after reading and closing the resultset. However, since the index task also updates exp.materialsource.lastindex, it locks this exp.materialsource row for other transactions to update. In this case, it blocks the updateDomain.api from updating exp.materialsource.modified (after it had already ADD COLUMN on provisioned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I'm not understanding how the ordering makes a difference. For deadlocks, it shouldn't matter if you lock X and then Y, or Y and then X, as long as you're not holding them both at the same time, correct? And the indexing part shouldn't keep a lock open.
Let me know if it would be easier to talk through the scenario. I'm probably missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So indexer thread locked exp.materialsource (by updating it early in the transaction, it holds edit lock to that row till the end of transaction), indexer thread wants to read provisioned (which is currently access locked by saveDomain thread, which prevents reading).
saveDomain thread locked provisioned by Add Column (until end of transaction), saveDomain wants to update exp.materialsource, which is locked by indexer thread.
The 2 threads are waiting for each other to release the lock and cannot proceed, hence the deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the confusion (on my part) is the fact that indexer task is not wrapped in a transaction, and hence should not lock.
Update: in the case of query.saveRows, for example, indexing is done inside a transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our discussion, the change of ordering has been reverted.
|
|
||
| var lockSchema = ExperimentService.get().getSchema(); | ||
| if (lockSchema.getScope().isTransactionActive()) | ||
| d.lockForUpdateDelete(lockSchema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lock is very heavy-handed. Is it possible to only acquire this if/when we know we have actual changes to the provisioned table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be too late though. In this case the other transaction acquired shared lock before the current transaction perform Add column, resulting in failure to Add column, because the other transaction is also waiting for the current transaction to release write lock on exp.dataclass.
What we really need I think is READ "nolock", which postgres doesn't seem to support.
Rationale
Lock provisioned table on domain update
Operations such as add/drop column requires ACCESS EXCLUSIVE lock on the table. If another transaction performed a SELECT on a provisioned table, adding/dropping columns from the provisioned table would have to wait until the other transaction to complete. If the other transaction happened to be waiting for the add/drop column transaction (in this case, updating exp.dataclass table), the two would dead lock.
This PR adds a ACCESS EXCLUSIVE MODE LOCK on the provisioned table when it's queried in the transaction.
Related Pull Requests
Changes