You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PostgreSQL 18.3 backends terminate with SIGSEGV in pgsm_ExecutorEnd at pg_stat_monitor.c:781 during commit-time PortalCleanup. Crashes correlate tightly with autovacuum completion on the relation the failing query targets: the postmaster logs an autovacuum-done line for a table, and within 1–2 seconds a backend running a query against that same table segfaults. The postmaster then terminates all other backends, replays WAL, and the database returns in ~1 second.
This is reproducible across four independent RDS PostgreSQL 18.3 instances sharing the same shared_preload_libraries setting. A control instance in the same account with pg_stat_monitor absent from shared_preload_libraries has zero crashes over the same window. Removing pg_stat_monitor from shared_preload_libraries is a confirmed workaround — all four instances have been crash-free since the extension was removed on 2026-04-30.
Note: This report was prepared with AI assistance. Some information is inferred rather than directly observable because the affected instances run on AWS RDS (a managed service), which does not expose the unsanitized stack trace, the exact binary build, or /rdsdbdata/log/error/ core dumps to customers. AWS Support extracted the sanitized stack trace below on our behalf. We have provided everything we were able to obtain.
Expected Results
pgsm_ExecutorEnd should safely no-op or skip its stats update when the executor state it relies on has been invalidated during portal cleanup, rather than dereferencing a NULL/freed pointer and crashing the backend.
#0 pgsm_ExecutorEnd (queryDesc=...) at pg_stat_monitor.c:781
#1 PortalCleanup (portal=...) at portalcmds.c:309
#2 PortalDrop (portal=..., isTopCommit=true) at portalmem.c:502
#3 PreCommit_Portals (isPrepare=false) at portalmem.c:756
#4 CommitTransaction () at xact.c:2269
#5 CommitTransactionCommandInternal () at xact.c:3202
#6 CommitTransactionCommand () at xact.c:3165
#7 finish_xact_command () at postgres.c:2832
#8 PostgresMain (...) at postgres.c:4969
#9 BackendMain (...) at backend_startup.c:129
#10 postmaster_child_launch (...) at launch_backend.c:290
#11 BackendStartup (...) at postmaster.c:3601
#12 ServerLoop () at postmaster.c:1716
#13 PostmasterMain (...) at postmaster.c:1414
#14 main (...) at main.c:227
There is a NULL-check on queryDesc->totaltime at line 742, but line 781 later dereferences queryDesc->totaltime->total without re-checking. If the structure pointed to by queryDesc->totaltime becomes invalid (freed / replaced / its contents invalidated) between 742 and 781, the dereference at 781 faults.
The fact that the crash is reached specifically via PortalCleanup → PortalDrop in the commit path, and is tightly correlated with autovacuum completing on the queried relation, points at executor-state invalidation concurrent with pgsm_ExecutorEnd's read.
Representative crash log (one of ~18 across 4 instances in 3 days)
2026-04-30 16:56:56 UTC LOG: automatic vacuum of table "postgres.public.tool_execution": index scans: 1
pages: 0 removed, 6 remain, 6 scanned (100.00% of total), 0 eagerly scanned
tuples: 4 removed, 52 remain, 0 are dead but not yet removable
...
2026-04-30 16:56:58 UTC LOG: client backend (PID 12539) was terminated by signal 11: Segmentation fault
2026-04-30 16:56:58 UTC DETAIL: Failed process was running:
SELECT * FROM tool_execution WHERE task_id = $1 ORDER BY started_at
2026-04-30 16:56:58 UTC LOG: terminating any other active server processes
2026-04-30 16:56:58 UTC LOG: [pg_stat_monitor] pgsm_shmem_shutdown: Shutdown initiated.
2026-04-30 16:56:59 UTC LOG: database system was interrupted; last known up at 2026-04-30 16:53:15 UTC
2026-04-30 16:56:59 UTC LOG: database system was not properly shut down; automatic recovery in progress
2026-04-30 16:56:59 UTC LOG: redo starts at 48/64000750
2026-04-30 16:56:59 UTC LOG: redo done at 48/6800BA70
2026-04-30 16:56:59 UTC LOG: database system is ready to accept connections
Observed autovacuum → crash correlation on one instance
The rds_heartbeat2 bias reflects the RDS-internal monitor polling that table every few seconds, which makes it statistically the most likely query to fall in the post-autovacuum window — it is not a rdsadmin-specific pattern. The tool_execution crash is an application query on an application table.
Version
PostgreSQL 18.3 (AWS RDS managed, default build)
pg_stat_monitor extversion = 2.3 (per pg_available_extensions). The .control file ships version 2.3; PG18 support was introduced in release 2.3.1 (2025-11-28, issue support for postgres 18 #566), so the loaded binary is 2.3.1 or newer. We have asked AWS Support whether they apply downstream patches and will update this issue if we receive details.
Affected instances: 4/4 with pg_stat_monitor in shared_preload_libraries. Control instance in same account/region without pg_stat_monitor: 0 crashes in the same window.
Steps to reproduce
We have not produced a minimal reproducer. Observed environment for what is a reliable reproduction in our setup:
RDS PostgreSQL 18.3 with shared_preload_libraries = pg_stat_statements,pg_hint_plan,auto_explain,pg_stat_monitor,pg_cron
Any table receiving regular-but-small write activity so autovacuum runs frequently (we observed it on both a 52-row application table and on rdsadmin.pg_catalog.rds_heartbeat2 which is polled constantly by RDS's heartbeat monitor)
Continuous SELECT workload against the same relation
Over a 3-day window we observed ~5 crashes per instance (18 total across 4 instances). Every crash occurred within 1–2s of an autovacuum-done log entry for the relation being queried.
Relevant logs
See the representative crash log above. Every recovery sequence explicitly logs [pg_stat_monitor] pgsm_shmem_shutdown: Shutdown initiated.
Workaround
Removing pg_stat_monitor from shared_preload_libraries (requires reboot) has eliminated crashes on all four affected instances since 2026-04-30. AWS Support independently recommended the same workaround.
Related issues reviewed
Confirmed distinct from existing reports:
memory leak in pg_stat_monitor #628 (OPEN, memory leak on PG18) — different failure mode (leak vs. SIGSEGV), different code path (shmem shutdown vs. executor hook).
Description
PostgreSQL 18.3 backends terminate with
SIGSEGVinpgsm_ExecutorEndatpg_stat_monitor.c:781during commit-timePortalCleanup. Crashes correlate tightly with autovacuum completion on the relation the failing query targets: the postmaster logs an autovacuum-done line for a table, and within 1–2 seconds a backend running a query against that same table segfaults. The postmaster then terminates all other backends, replays WAL, and the database returns in ~1 second.This is reproducible across four independent RDS PostgreSQL 18.3 instances sharing the same
shared_preload_librariessetting. A control instance in the same account with pg_stat_monitor absent fromshared_preload_librarieshas zero crashes over the same window. Removingpg_stat_monitorfromshared_preload_librariesis a confirmed workaround — all four instances have been crash-free since the extension was removed on 2026-04-30.Expected Results
pgsm_ExecutorEndshould safely no-op or skip its stats update when the executor state it relies on has been invalidated during portal cleanup, rather than dereferencing a NULL/freed pointer and crashing the backend.Actual Results
Sanitized stack trace (from AWS Support, crash on 2026-04-30 16:56:58 UTC, backend PID 12539)
Crash site in upstream 2.3.1
pg_stat_monitor.c:781at tag2.3.1:There is a NULL-check on
queryDesc->totaltimeat line 742, but line 781 later dereferencesqueryDesc->totaltime->totalwithout re-checking. If the structure pointed to byqueryDesc->totaltimebecomes invalid (freed / replaced / its contents invalidated) between 742 and 781, the dereference at 781 faults.The fact that the crash is reached specifically via
PortalCleanup → PortalDropin the commit path, and is tightly correlated with autovacuum completing on the queried relation, points at executor-state invalidation concurrent withpgsm_ExecutorEnd's read.Representative crash log (one of ~18 across 4 instances in 3 days)
Observed autovacuum → crash correlation on one instance
SELECT value FROM rds_heartbeat2rdsadmin.pg_catalog.rds_heartbeat2(completed 08:27:17)SELECT value FROM rds_heartbeat2rdsadmin.pg_catalog.rds_heartbeat2(completed 01:31:37)SELECT value FROM rds_heartbeat2rdsadmin.pg_catalog.rds_heartbeat2(completed 22:56:28)SELECT value FROM rds_heartbeat2rdsadmin.pg_catalog.rds_heartbeat2(completed 16:03:07)SELECT * FROM tool_execution WHERE task_id = $1 ORDER BY started_atpostgres.public.tool_execution(completed 16:56:56)The
rds_heartbeat2bias reflects the RDS-internal monitor polling that table every few seconds, which makes it statistically the most likely query to fall in the post-autovacuum window — it is not ardsadmin-specific pattern. Thetool_executioncrash is an application query on an application table.Version
extversion = 2.3(perpg_available_extensions). The.controlfile ships version2.3; PG18 support was introduced in release 2.3.1 (2025-11-28, issue support for postgres 18 #566), so the loaded binary is 2.3.1 or newer. We have asked AWS Support whether they apply downstream patches and will update this issue if we receive details.us-east-1shared_preload_libraries = pg_stat_statements,pg_hint_plan,auto_explain,pg_stat_monitor,pg_cronAffected instances: 4/4 with pg_stat_monitor in
shared_preload_libraries. Control instance in same account/region without pg_stat_monitor: 0 crashes in the same window.Steps to reproduce
We have not produced a minimal reproducer. Observed environment for what is a reliable reproduction in our setup:
shared_preload_libraries = pg_stat_statements,pg_hint_plan,auto_explain,pg_stat_monitor,pg_cronrdsadmin.pg_catalog.rds_heartbeat2which is polled constantly by RDS's heartbeat monitor)SELECTworkload against the same relationRelevant logs
See the representative crash log above. Every recovery sequence explicitly logs
[pg_stat_monitor] pgsm_shmem_shutdown: Shutdown initiated.Workaround
Removing
pg_stat_monitorfromshared_preload_libraries(requires reboot) has eliminated crashes on all four affected instances since 2026-04-30. AWS Support independently recommended the same workaround.Related issues reviewed
Confirmed distinct from existing reports:
pg_stat_monitor_internalat line 2569 (the stats reporting path, viatuplestore_putvalues). Our crash is inpgsm_ExecutorEndat line 781 (the executor hook path). Different code path and different engine major.pgsm_ExecutorEnd → PortalCleanup → PortalDropprefix, but an LWLock hang rather than a SIGSEGV, on pgsm 2.1.0. Related enough that the hook-during-portal-cleanup pattern has history.Code of Conduct