Skip to content

PG-2395 Fetch datname and username from syscache in pgsm_store()#658

Open
AndersAstrand wants to merge 1 commit into
percona:mainfrom
AndersAstrand:defer-syscache-to-store
Open

PG-2395 Fetch datname and username from syscache in pgsm_store()#658
AndersAstrand wants to merge 1 commit into
percona:mainfrom
AndersAstrand:defer-syscache-to-store

Conversation

@AndersAstrand
Copy link
Copy Markdown
Contributor

@AndersAstrand AndersAstrand commented May 25, 2026

Resolve them from the catalog in pgsm_store() rather than in pgsm_create_hash_entry(). We have a bug report we have not been able to reproduce where queryDesc->totaltime is freed mid-call in pgsm_ExecutorEnd(), most likely by cache invalidation processing during these syscache lookups when they ran inside pgsm_create_hash_entry(). The misbehaving callback is most likely from another extension or the RDS fork of PostgreSQL, since standard invalidation callbacks does not seem to touch the relevant memory context.

I didn't change the undefined behavior of passing NULL to snprintf() in this PR, but we should fix it in the future (i believe glibc inserts a placeholder like (null) for us which is the existing behavior before this PR.)

Resolve them from the catalog in pgsm_store() rather than in
pgsm_create_hash_entry(). We have a bug report we have not been
able to reproduce where queryDesc->totaltime is freed mid-call in
pgsm_ExecutorEnd(), most likely by cache invalidation processing
during these syscache lookups when they ran inside
pgsm_create_hash_entry(). The misbehaving callback is most likely
from another extension or the RDS fork of PostgreSQL, since
standard invalidation callbacks does not seem to touch the relevant
memory context.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

❌ Patch coverage is 60.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.17%. Comparing base (5950916) to head (4ff6890).

Files with missing lines Patch % Lines
pg_stat_monitor.c 60.00% 0 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #658      +/-   ##
==========================================
- Coverage   86.32%   86.17%   -0.16%     
==========================================
  Files           3        3              
  Lines        1346     1345       -1     
  Branches      217      218       +1     
==========================================
- Hits         1162     1159       -3     
  Misses         87       87              
- Partials       97       99       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AndersAstrand AndersAstrand changed the title Fetch datname and username from syscache in pgsm_store() PG-2395 Fetch datname and username from syscache in pgsm_store() May 25, 2026
Copy link
Copy Markdown
Contributor

@jeltz jeltz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel the commit message actually explains why this fix makes sense. Is it because it is now allocated in another memory context? If so that was not clear from the commit message.

Comment thread pg_stat_monitor.c
extract_query_comments(query, comments, sizeof(comments));
comments_len = strlen(comments);

if (IsTransactionState() && strlen(entry->datname) == 0)
Copy link
Copy Markdown
Contributor

@jeltz jeltz May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I am a fan pf this kind of trying to set multiple times. When can it already be set? At least I think we should document that.

@AndersAstrand
Copy link
Copy Markdown
Contributor Author

AndersAstrand commented May 26, 2026

I don't feel the commit message actually explains why this fix makes sense. Is it because it is now allocated in another memory context? If so that was not clear from the commit message.

That's great feedback! It's hard to write with the right level of detail when you have all the context in your head that the reader probably doesn't :). Here is a full explanation of my reasoning. Do you think all of this should go in the commit message?

The bug report #651 includes a stack trace that shows that by the time we read queryDesc->totaltime, the pointer isn't valid.

A lot of my analysis was based on what we do differently compared to pg_stat_statements which our use of queryDesc->totaltime is based on, and nothing related to that pointer specifically differed. We allocate and set the pointer at the same time in the same way in the same relation to hook chaining. And we read the pointer at the same time in the same way and with the same relation to hook chaining.

Now the issue could also be present in pg_stat_statements and the crash would happen there if the two were swapped in the shared_preload_libraries, but this felt unlikely to me.

One of the things we do, that pg_stat_statements does not, is doing syscache lookups before we read the contents of queryDesc->totaltime. And since the reporter of the bug saw a correlation with aggressive autovacuum I started to suspect that maybe the callback chains triggered by the syscache lookups processing the invalidation messages produced by autovacuum could have something to do with it. Possibly because some other extension either ruined the memory context we had allocated totaltime in, or maybe overwrote the pointer with something that was just garbage after they were done with it.

I had no way of verifying this suspicion however since I was unable to reproduce the bug, so I simply moved the syscache lookups to somewhere after we're done with queryDesc->totaltime. pgsm_store() seemed reasonable.

@jeltz
Copy link
Copy Markdown
Contributor

jeltz commented May 28, 2026

But if the issue is totaltime why would you think changing how we work with datname and username would help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants