This repository was archived by the owner on Jan 14, 2026. It is now read-only.
ADBDEV-3685 Error handling for disqkuota worker startup stage#20
Merged
ADBDEV-3685 Error handling for disqkuota worker startup stage#20
Conversation
4451314 to
2bb759b
Compare
b5bf85b to
5748cba
Compare
During diskquota worker's first run the initial set of active tables with their sizes is being loaded from diskquota.table_size table in order to warm up diskquota rejectmap and other shared memory objects. If an error occurs during this initialization process, the error will be ignored in PG_CATCH() block. Because of that local_active_table_stat_map will not be filled properly. And at the next loop iteration tables, that are not in acitive table list will be marked as irrelevant and to be deleted both from table_size_map and table_size table in flush_to_table_size function. In case when the inital set of active tables is huge (thousands of tables), this error ignorance could lead to the formation of a too long delete statement, which the SPI executor won't be able to process due to memory limits. And this case can lead to worker's segmentation fault or other errorneous behaviour of whole extension. This commit proposes the handling of the initialization errors, which occur during worker's first run. In the DiskquotaDBEntry structure the bool variable "corrupted" is added in order to indicate, that the worker wasn't able to initialize itself on given database. And DiskquotaDBEntry also is now passed to refresh_disk_quota_model function from worker main loop, because one need to change the state of dbEntry. The state is changed when the refresh_disk_quota_usage function catches an error, which occured during the initialization step, in PG_CATCH() block. And after the error is catched, the "corrupted" flag is set in given dbEntry, and then the error is rethrown. This leads to worker process termination. The launcher will not be able to start it again, because added flag is set in the database structure, and this flag is being checked inside the disk_quota_launcher_main function. The flag can be reseted by calling resetBackgroundWorkerCorruption function, which is currently called in SIGHUP handler.
5748cba to
215f525
Compare
|
Could you get patch from PR and do upgrade test with your changes? |
Member
Author
Have run the upgrade tests in docker with my changes and Eugene's changes. All test passed. |
red1452
reviewed
Jun 20, 2023
red1452
previously approved these changes
Jun 20, 2023
red1452
previously approved these changes
Jun 23, 2023
Member
I think that in the latest version this is no longer true and another rationale for this patch should be found. |
RekGRpth
reviewed
Jun 27, 2023
RekGRpth
reviewed
Jun 27, 2023
RekGRpth
reviewed
Jun 27, 2023
RekGRpth
reviewed
Jun 27, 2023
RekGRpth
reviewed
Jun 27, 2023
RekGRpth
reviewed
Jun 27, 2023
RekGRpth
reviewed
Jun 27, 2023
18dbe0d to
21525c3
Compare
Member
Author
I have rewritten PR description. |
RekGRpth
approved these changes
Jun 27, 2023
red1452
approved these changes
Jun 27, 2023
Stolb27
added a commit
that referenced
this pull request
Jul 25, 2023
Stolb27
pushed a commit
that referenced
this pull request
Jul 25, 2023
During diskquota worker's first run the initial set of active tables with their sizes is being loaded from diskquota.table_size table in order to warm up diskquota rejectmap and other shared memory objects. If an error occurs during this initialization process, the error will be ignored in PG_CATCH() block. Because of that local_active_table_stat_map will not be filled properly. And at the next loop iteration tables, that are not in acitive table list will be marked as irrelevant and to be deleted both from table_size_map and table_size table in flush_to_table_size function. In case when the inital set of active tables is huge (thousands of tables), this error ignorance could lead to the formation of a too long delete statement, which the SPI executor won't be able to process due to memory limits. And this case can lead to worker's segmentation fault or other errorneous behaviour of whole extension. This commit proposes the handling of the initialization errors, which occur during worker's first run. In the DiskquotaDBEntry structure the bool variable "corrupted" is added in order to indicate, that the worker wasn't able to initialize itself on given database. And DiskquotaDBEntry also is now passed to refresh_disk_quota_model function from worker main loop, because one need to change the state of dbEntry. The state is changed when the refresh_disk_quota_usage function catches an error, which occured during the initialization step, in PG_CATCH() block. And after the error is catched, the "corrupted" flag is set in given dbEntry, and then the error is rethrown. This leads to worker process termination. The launcher will not be able to start it again, because added flag is set in the database structure, and this flag is being checked inside the disk_quota_launcher_main function. The flag can be reseted by calling resetBackgroundWorkerCorruption function, which is currently called in SIGHUP handler. Cherry-picked-from: 3b06e37 to reapply above c2686c9
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
During diskquota worker's first run the initial set of active tables with their sizes is being loaded from
diskquota.table_sizetable in order to warm up diskquota rejectmap and other shared memory objects. If an error occurs during this initialization process, the error will be ignored inPG_CATCH()block. Such ignorance can be potentially harmful and can lead to undesired behaviour of the whole extension. For example, if an error ocurs during initialization,local_active_table_stat_mapwill not be filled properly. And at the next loop iteration, tables, that are not in acitive table list will be marked as irrelevant and to be deleted both fromtable_size_mapandtable_size tableinflush_to_table_sizefunction. This situation produces extra perfomance load, which is not guaranteed to be safe.This commit proposes the handling of the initialization errors, which occur during worker's first run. In the
DiskquotaDBEntrystructure the bool variablecorruptedis added in order to indicate, that the worker wasn't able to initialize itself on given database. AndDiskquotaDBEntryalso is now passed torefresh_disk_quota_modelfunction from worker main loop, because one need to change the state ofdbEntry. The state is changed when therefresh_disk_quota_usagefunction catches an error, which occured during the initialization step, inPG_CATCH()block. And after the error is catched, thecorruptedflag is set in givendbEntry, and then the error is rethrown. This leads to worker process termination. The launcher will not be able to start it again, because added flag is set in the database structure, and this flag is being checked inside thedisk_quota_launcher_mainfunction. The flag can be reseted by callingresetBackgroundWorkerCorruptionfunction, which is currently called in SIGHUP handler.