Re-implement Doublewrite buffer encryption#3
Open
satya-bodapati wants to merge 21 commits into
Open
Conversation
…ace encryption) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-3822 "InnoDB system tablespace encryption" https://jira.percona.com/browse/PS-3822 (commit 78b6114) to make parallel doublewrite part of the upstream 8.0.20 merge easier. Temporarily disabled the following MTR test cases: - 'innodb.percona_parallel_dblwr_encrypt' - 'innodb.percona_sys_tablespace_encrypt' - 'innodb.percona_sys_tablespace_encrypt_dblwr' - 'sys_vars.innodb_parallel_dblwr_encrypt_basic' - 'sys_vars.innodb_sys_tablespace_encrypt_basic'
…b_doublewrite file when innodb_doublewrite is disabled) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-3411 "LP #1570682: Parallel doublewrite buffer file created when skip-innodb_doublewrite is set" https://jira.percona.com/browse/PS-3411 (commit 14318e4) to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…must crash server on I/O error) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-5678 "Parallel doublewrite must crash server on I/O error" https://jira.percona.com/browse/PS-5678 (commit 0f810d7) to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…rotation. ALPHA) https://jira.percona.com/browse/PS-6789 Temporarily reverted 'buf0dblwr.cc' part of the PS-3829 "Innodb key rotation. ALPHA" https://jira.percona.com/browse/PS-3829 (commit c7f44ee) to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…d to set O_DIRECT on xb_doublewrite when running MTR test cases) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-1068 "Fix bug 1669414 (Failed to set O_DIRECT on xb_doublewrite when running MTR test cases)" https://jira.percona.com/browse/PS-1068 (commit 7f41824) to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…lel doublewrite memory not freed with innodb_fast_shutdown=2) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-1707 "LP #1578139: Parallel doublewrite memory not freed with innodb_fast_shutdown=2" https://jira.percona.com/browse/PS-1707 (commit 8a53ed7) to make parallel doublewrite part of the upstream 8.0.20 merge easier.
… implementation (Implement parallel doublewrite) https://jira.percona.com/browse/PS-6789 Reverted 'parallel-doublewrite' blueprint implementation "Implement parallel doublewrite" https://blueprints.launchpad.net/percona-server/+spec/parallel-doublewrite (commit 4596aaa) to make parallel doublewrite part of the upstream 8.0.20 merge easier. Temporarily disabled the following MTR test cases: - 'sys_vars.innodb_parallel_doublewrite_path_basic' - 'innodb.percona_doublewrite'
https://jira.percona.com/browse/PS-6789 *** Updated man pages from MySQL Server 8.0.20 source tarball. *** Updated 'scripts/fill_help_tables.sql' from MySQL Server 8.0.20 source tarball.
https://jira.percona.com/browse/PS-6789 *** Reverted our fix for PS-6094 "Handler fails to trigger on Error 1049 or SQLSTATE 42000 or plain sqlexception" (https://jira.percona.com/browse/PS-6094) (commit 31b5c73) in favor of the upstream fix for the Bug #30561920 / #97682 "Handler fails to trigger on Error 1049 or SQLSTATE 42000 or plain sqlexception" (https://bugs.mysql.com/bug.php?id=97682) (commit mysql/mysql-server@72c6171). *** Reverted our fix for PS-3630 "LP #1660255: Test innodb.innodb_mysql is unstable" (https://jira.percona.com/browse/PS-3630) (commit e0b5050) in favor of the upstream fix for the Bug #30810572 "FIX INNODB-MYSQL TEST" (commit mysql/mysql-server@2692669). *** Reverted our 8.0.17 merge postfix "PS-5363 (Merge MySQL 8.0.17): fixed regexps in the rpl.rpl_perfschema_threads_processlist_status MTR test case" (https://jira.percona.com/browse/PS-5363) (commit 8d7dd4a) affecting 'rpl.rpl_perfschema_threads_processlist_status' MTR test case in favor of the changes made by upstream in WL#3549 "Binlog Compression" (commit mysql/mysql-server@1e5ae34). *** Reverted our 8.0.18 merge postfix "PS-5674: gen_lex_token generator reworked" (https://jira.percona.com/browse/PS-5674) (commit 214212a) in favor of the changes made by upstream Bug #30765691 "FREE TOKEN SLOTS ARE EXHAUSTED IN GEN_LEX_TOKEN.CC" (commit mysql/mysql-server@17ca03f). 'SYM_PERCONA()' macro preserved and made a synonym for upstream's 'SYM()'. Percona Server 5.7-specific tokens - CHANGED_PAGE_BITMAPS_SYM - CLIENT_STATS_SYM - CLUSTERING_SYM - COMPRESSION_DICTIONARY_SYM - INDEX_STATS_SYM - TABLE_STATS_SYM - THREAD_STATS_SYM - USER_STATS_SYM - ENCRYPTION_KEY_ID_SYM explicitly assigned values starting from 1300. The same values were assigned to them implicitly in Percona Server 8.0.19. Percona Server 8.0-specific tokens - EFFECTIVE_SYM - SEQUENCE_TABLE_SYM explicitly assigned values starting from 1350. This group has different values than in Percona Server 8.0.19. *** Similarly to other 'innodb.log_encrypt_<n>' MTR test cases 'innodb.log_encrypt_7' coming from upstream 8.0.20 cloned into two 'innodb.log_encrypt_7_mk' and 'innodb.log_encrypt_7_rk'. *** Similarly to other 'innodb.table_encrypt_<n>' MTR test cases 'innodb.table_encrypt_6' coming from upstream 8.0.20 cloned into three 'innodb.table_encrypt_6', 'keyring_vault.table_encrypt_6' and 'keyring_vault.table_encrypt_6_directory'. *** VERSION raised to "8.0.20-11". univ.i version raised to "11".
https://jira.percona.com/browse/PS-6789 In the fix for Bug #30508721 "MTR DOESN'T KEEP TRACK OF THE STATE OF INNODB MONITORS" (commit mysql/mysql-server@abd33c2) Oracle extended MTR 'check-testcase' procedure with additional comparison of data from InnoDB metrics state. They also introduced 'mysql-test/include/innodb_monitor_restore.inc' MTR include file that is supposed to reset InnoDB monitors to their default state. 'mysql-test/include/innodb_monitor_restore.inc' extended with enabling Percona-specific monitors, those that are enabled (defined with 'MONITOR_DEFAULT_ON' flag) by default. Similarly to what was done in the upstream patch "SET GLOBAL innodb_monitor_enable=default;" "SET GLOBAL innodb_monitor_disable=default;" "SET GLOBAL innodb_monitor_reset_all=default;" statement sequences were substituted with '--source include/innodb_monitor_restore.inc' all over the test code. As the result, fixed the following MTR test cases: - 'innodb.innodb_idle_flush_pct' - 'innodb.lock_contention_big' - 'innodb.monitor' - 'innodb.percona_ahi_partitions' - 'innodb.percona_changed_page_bmp_flush_5446' - 'innodb.transportable_tbsp-debug' - 'innodb_zip.transportable_tbsp_debug_zip' - 'sys_vars.innodb_monitor_disable_basic' - 'sys_vars.innodb_monitor_enable_basic' - 'sys_vars.innodb_monitor_reset_all_basic' - 'sys_vars.innodb_monitor_reset_basic' - 'sys_vars.innodb_purge_run_now_basic' - 'sys_vars.innodb_purge_stop_now_basic'
…ated MTR test cases https://jira.percona.com/browse/PS-6789 The following MTR test cases re-recorded because of the 'filesort' improvements introduced in the fix for Oracle's Bug #30776132 "MAKE FILESORT KEYS CONSISTENT BETWEEN FIELDS AND ITEMS" (commit mysql/mysql-server@6d587a6) - 'main.pool_of_threads' - 'main.pool_of_threads_high_prio_tickets'. The following MTR test cases re-recorded because of the changed execution plan (more hash joins instead of nested blok loops) introduced in these improvements Bug #30528604 "DELETE THE PRE-ITERATOR EXECUTOR" (commit mysql/mysql-server@ef166f8), Bug #30473261 "CONVERT THE INDEX SUBQUERY ENGINES INTO USING THE ITERATOR EXECUTOR" (commit mysql/mysql-server@cb4116e) (commit mysql/mysql-server@629b549) (commit mysql/mysql-server@5a41fba) (commit mysql/mysql-server@31bd903) (commit mysql/mysql-server@75bbe1b) (commit mysql/mysql-server@6226c1a) (commit mysql/mysql-server@0b45e96) (commit mysql/mysql-server@8e45d7e) (commit mysql/mysql-server@7493ae4) (commit mysql/mysql-server@a5f60bf) (commit mysql/mysql-server@609b86e), Bug #30912972 "ASSERTION `KEYLEN == M_START_KEY.LENGTH' FAILED" (commit mysql/mysql-server@b28bea5) - 'audit_log.audit_log_filter_db' - 'main.pool_of_threads' - 'main.pool_of_threads_high_prio_tickets' - 'main.percona_expand_fast_index_creation' - 'main.percona_sequence_table'
https://jira.percona.com/browse/PS-6789 Re-recorded 'main.bug74778' MTR test case because of the new 'SHOW_ROUTINE' privilege implemented by Oracle in WL #9049 "Add a dynamic privilege for stored routine backup" (https://dev.mysql.com/worklog/task/?id=9049) (commit mysql/mysql-server@3e41e44)
… MTR test case https://jira.percona.com/browse/PS-6789 Re-recorded 'main.backup_locks_mysqldump' MTR test case because of the new default 'mysqldump' network timeout introduced in the fix for Oracle Bug #30755992 / #98203 "mysql dump sufficiently long network timeout too short" (https://bugs.mysql.com/bug.php?id=98203) (commit mysql/mysql-server@1f90fad)
https://jira.percona.com/browse/PS-6789 Re-recorded 'main.bug88797' MTR test case because of the new deprecation warning introduced in the implementation of WL #13325 "Deprecate VALUES syntax in INSERT ... ON DUPLICATE KEY UPDATE" (https://dev.mysql.com/worklog/task/?id=13325) (commit mysql/mysql-server@6f3b9df)
…test cases with explicit binlog positions https://jira.percona.com/browse/PS-6789 Fixed/re-recorded the following MTR test cases because of the changes in the implementation of WL percona#3549 "Binlog: compression" (https://dev.mysql.com/worklog/task/?id=3549) (commit mysql/mysql-server@1e5ae34) that caused increasing 'Format_description_event' binlog event size and therefore some pre-recorded binary log positions in the '.result' files. - 'main.ackup_safe_binlog_info' - 'main.mysqldump-max' - 'binlog.percona_binlog_consistent_mixed' - 'binlog.percona_binlog_consistent_row' - 'binlog.percona_binlog_consistent_stmt' - 'binlog.percona_binlog_consistent_debug'
…space encryption) https://jira.percona.com/browse/PS-6789 1. Re-enable system tablespace encryption again after 8.0.20 upstream merge that has new parallel doublewrite implementation (https://jira.percona.com/browse/PS-3822). 2. Removed 'innodb.percona_sys_tablespace_encrypt_dblwr' MTR test case as there is no doublewrite buffer in system tablespace anymore.
…29 (Innodb key rotation. ALPHA) https://jira.percona.com/browse/PS-6789 Restored 'buf0dblwr.cc' part of the PS-3829 "Innodb key rotation. ALPHA" https://jira.percona.com/browse/PS-3829 (commit c7f44ee) after upstream 8.0.20 merge. The following MTR test cases do not crash anymore - 'encryption.upgrade_crypt_data_57_v1' - 'encryption.upgrade_crypt_data_v1' - 'innodb.innodb_scrub' - 'main.percona_dd_upgrade_encrypted'
…in.percona_signal_handling_threadpool MTR test cases https://jira.percona.com/browse/PS-6789 Fixed and re-recorded 'main.percona_signal_handling' and 'main.percona_signal_handling_threadpool' MTR test in response to the changes in the Bug #30578923 "SENDING SIGHUP CAUSES A LOT OF GARBAGE TO BE PRINTED" (commit mysql/mysql-server@b90a1b3). Removed "Status information:" log section is now simulated via 'DBUG_EXECUTE_IF()'. MTR test cases made debug-only.
percona-ysorokin
requested changes
Jun 11, 2020
Collaborator
Author
|
./mtr --mem innodb.percona_parallel_dblwr_encrypt{,,,} --parallel=4 --repeat=20 [ 93%] innodb.percona_parallel_dblwr_encrypt w4 [ pass ] 11147
|
3708733 to
f6f0876
Compare
81e3869 to
6c5dc40
Compare
percona-ysorokin
pushed a commit
that referenced
this pull request
Sep 17, 2020
…o: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded
Problem
=======
Running mtr with ASAN build on Gentoo tests fails since the path to
libtirpc is not /lib64/libtirpc.so which is the path mtr uses for
preloading the library.
Further more the libasan path in Gentoo may contain also underscores and
minus which mtr safe_process does not recognize.
Fails on Gentoo since /lib64/libtirpc.so do not exist
+ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Fails on Gentoo since /usr/lib64/libtirpc.so is a GNU LD script
+ERROR: ld.so: object '/usr/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (invalid ELF header): ignored.
Need to preload /lib64/libtirpc.so.3 on gentoo.
When compiling with GNU C++ libasan path also include minus and underscores:
$ less mysql-test/lib/My/SafeProcess/ldd_asan_test_result
linux-vdso.so.1 (0x00007ffeba962000)
libasan.so.4 => /usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so.4 (0x00007f3c2e827000)
Tests that been affected in different ways are for example:
$ ./mtr group_replication.gr_clone_integration_clone_not_installed
[100%] group_replication.gr_clone_integration_clone_not_installed w3 [ fail ]
...
ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded
(cannot open shared object file): ignored.
ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded
(cannot open shared object file): ignored.
mysqltest: At line 21: Query 'START GROUP_REPLICATION' failed.
ERROR 2013 (HY000): Lost connection to MySQL server during query
...
ASAN:DEADLYSIGNAL
=================================================================
==11970==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc
0x7f0e5cecfb8c bp 0x7f0e340f1650 sp 0x7f0e340f0dc8 T44)
==11970==The signal is caused by a READ memory access.
==11970==Hint: address points to the zero page.
#0 0x7f0e5cecfb8b in xdr_uint32_t (/lib64/libc.so.6+0x13cb8b)
#1 0x7f0e5fbe6d43
(/usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so.4+0x87d43)
#2 0x7f0e3c675e59 in xdr_node_no
plugin/group_replication/libmysqlgcs/xdr_gen/xcom_vp_xdr.c:88
#3 0x7f0e3c67744d in xdr_pax_msg_1_6
plugin/group_replication/libmysqlgcs/xdr_gen/xcom_vp_xdr.c:852
...
$ ./mtr ndb.ndb_config
[100%] ndb.ndb_config [ fail ]
...
--- /.../src/mysql-test/suite/ndb/r/ndb_config.result 2019-06-25
21:19:08.308997942 +0300
+++ /.../bld/mysql-test/var/log/ndb_config.reject 2019-06-26
11:58:11.718512944 +0300
@@ -30,16 +30,22 @@
== 16 == bug44689
192.168.0.1 192.168.0.2 192.168.0.3 192.168.0.4 192.168.0.1 192.168.0.1
== 17 == bug49400
+ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded
(cannot open shared object file): ignored.
+ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be
preloaded (cannot open shared object file): ignored.
ERROR -- at line 25: TCP connection is a duplicate of the existing TCP
link from line 14
ERROR -- at line 25: Could not store section of configuration file.
$ ./mtr ndb.ndb_basic
[100%] ndb.ndb_basic [ pass ] 34706
ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded
(cannot open shared object file): ignored.
ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded
(cannot open shared object file): ignored.
Solution
========
In safe_process use same trick for libtirpc as for libasan to determine
path to library for pre loading.
Also allow underscores and minus in paths.
In addition also add some memory leak suppressions for perl.
Change-Id: Ia02e354a20cf8b279eb2573f3f8c2c39776343dc
(cherry picked from commit e88706d)
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 18, 2021
To call a service implementation one needs to: 1. query the registry to get a reference to the service needed 2. call the service via the reference 3. call the registry to release the reference While #2 is very fast (just a function pointer call) #1 and #3 can be expensive since they'd need to interact with the registry's global structure in a read/write fashion. Hence if the above sequence is to be repeated in a quick succession it'd be beneficial to do steps #1 and #3 just once and aggregate as many #2 steps in a single sequence. This will usually mean to cache the service reference received in #1 and delay 3 for as much as possible. But since there's an active reference held to the service implementation until 3 is taken special handling is needed to make sure that: The references are released at regular intervals so changes in the registry can become effective. There is a way to mark a service implementation as "inactive" ("dying") so that until all of the active references to it are released no new ones are possible. All of the above is part of the current audit API machinery, but needs to be isolated into a separate service suite and made generally available to all services. This is what this worklog aims to implement. RB#24806
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 18, 2021
TABLESPACE STATE DOES NOT CHANGE THE SPACE TO EMPTY After the commit for Bug#31991688, it was found that an idle system may not ever get around to truncating an undo tablespace when it is SET INACTIVE. Actually, it takes about 128 seconds before the undo tablespace is finally truncated. There are three main tasks for the function trx_purge(). 1) Process the undo logs and apply changes to the data files. (May be multiple threads) 2) Clean up the history list by freeing old undo logs and rollback segments. 3) Truncate undo tablespaces that have grown too big or are SET INACTIVE explicitly. Bug#31991688 made sure that steps 2 & 3 are not done too often. Concentrating this effort keeps the purge lag from growing too large. By default, trx_purge() does step#1 128 times before attempting steps #2 & #3 which are called 'truncate' steps. This is set by the setting innodb_purge_rseg_truncate_frequency. On an idle system, trx_purge() is called once per second if it has nothing to do in step 1. After 128 seconds, it will finally do steps 2 (truncating the undo logs and rollback segments which reduces the history list to zero) and step 3 (truncating any undo tablespaces that need it). The function that the purge coordinator thread uses to make these repeated calls to trx_purge() is called srv_do_purge(). When trx_purge() returns having done nothing, srv_do_purge() returns to srv_purge_coordinator_thread() which will put the purge thread to sleep. It is woke up again once per second by the master thread in srv_master_do_idle_tasks() if not sooner by any of several of other threads and activities. This is how an idle system can wait 128 seconds before the truncate steps are done and an undo tablespace that was SET INACTIVE can finally become 'empty'. The solution in this patch is to modify srv_do_purge() so that if trx_purge() did nothing and there is an undo space that was explicitly set to inactive, it will immediately call trx_purge again with do_truncate=true so that steps #2 and #3 will be done. This does not affect the effort by Bug#31991688 to keep the purge lag from growing too big on sysbench UPDATE NO_KEY. With this change, the purge lag has to be zero and there must be a pending explicit undo space truncate before this extra call to trx_purge is done. Approved by Sunny in RB#25311
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 18, 2025
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 18, 2025
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6 PS-6865 : Merge fb-prod201903 Summary: Currently during primary key's value encode, its ttl value can be from either one of these 3 cases 1. ttl column in primary key 2. non-ttl column a. old record(update case) b. current timestamp 3. ttl column in non-key field Workflow #1: first in Rdb_key_def::pack_record() find and store pk_offset, then in value encode try to parse key slice to fetch ttl value by using pk_offset. Workflow #3: fetch ttl value from ttl column The change is to merge #1 and #3 by always fetching TTL value from ttl column, not matter whether the ttl column is in primary key or not. Of course, remove pk_offset, since it isn't used. BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is stored by primary value encoding. Reviewed By: yizhang82 Differential Revision: D14662716 fbshipit-source-id: 6b4e5f044fd
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 18, 2025
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f
PS-7731 : Merge percona-202102
Summary:
Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty:
```
// bitmap is cleared on index merge, but it still needs to decode columns
bool field_requested =
decode_all_fields || m_verify_row_debug_checksums ||
bitmap_is_set(field_map, m_table->field[i]->field_index);
```
As a result MyRocks is significantly slower than InnoDB in this particular scenario.
Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*).
We have a few options to address this:
1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge.
2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false).
3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly.
In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init.
Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly.
Reference Patch: facebook/mysql-5.6@44d6a8d
---------
Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra.
Reviewed By: Pushapgl
Differential Revision: D26265470
fbshipit-source-id: f142be681ab
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 18, 2025
Upstream commit ID: facebook/mysql-5.6@3366bd9d91b2 PS-9395: Merge percona-202401 (https://jira.percona.com/browse/PS-9395) Summary: some MTR failed in ubsan due to num of rows/records calculated in records_in_range() is a negative values which is caused by m_actual_disk_size is a negative value ``` storage/rocksdb/ha_rocksdb.cc:: runtime error: -56.8272 is outside the range of representable values of type 'unsigned long long' #0 myrocks::ha_rocksdb::records_in_range_internal(unsigned int, key_range*, key_range*, long, long, unsigned long long*, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14855:16 #1 myrocks::ha_rocksdb::records_in_range(unsigned int, key_range*, key_range*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14760:3 #2 handler::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/sql/handler.cc:6608:26 #3 myrocks::ha_rocksdb::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:19002:18 #4 check_quick_select(THD*, RANGE_OPT_P ``` Due to m_actual_disk_size is an estimated value, always reset to 0 if it i becomes negative. Differential Revision: D50531919
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 18, 2025
PS-5741: Incorrect use of memset_s in keyring_vault.
Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.
---------------------------------------------------------------------------
PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate
---
*Problem:*
`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.
In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.
*Solution:*
Allocate memory with `thd_strmake` so the memory in `save` is not local.
---------------------------------------------------------------------------
Fix test main.bug12969156 when WITH_ASAN=ON
*Problem:*
ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:
```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215
Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
#0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62
This frame has 4 object(s):
[48, 56) 'result' (line 66)
[80, 112) '_db_stack_frame_' (line 63)
[144, 200) 'tm_tmp' (line 67)
[240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
#0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
#1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
#2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
#3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
#4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
#5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
#6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
#7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
#8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```
The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.
This is a benign error as all the variables are on the stack.
*Solution*:
Finish the thread in orderly way by using a signalling variable.
---------------------------------------------------------------------------
PS-8204: Fix XML escape rules for audit plugin
https://jira.percona.com/browse/PS-8204
There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.
---------------------------------------------------------------------------
PS-8854: Add main.percona_udf MTR test
Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
---------------------------------------------------------------------------
PS-9369: Fix currently processed query comparison in audit_log
https://perconadev.atlassian.net/browse/PS-9369
The audit_log uses stack to keep track of table access operations being
performed in scope of one query. It compares last known table access query
string stored on top of this stack with actual query in audit event being
processed at the moment to decide if new record should be pushed to stack
or it is time to clean records from the stack.
Currently audit_log simply compares char* variables to decide if this is
the same query string. This approach doesn't work. As a result plugin looses
control of the stack size and it starts growing with the time consuming
memory. This issue is not noticable on short term server connections
as memory is freed once connection is closed. At the same time this
leads to extra memory consumption for long running server connections.
The following is done to fix the issue:
- Query is sent along with audit event as MYSQL_LEX_CSTRING structure.
It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as
sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised
properly. Added string length check to make sure structure contains
any valid string.
- Used strncmp to compare actual strings instead of comparing char*
variables.
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 18, 2025
…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc.
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 18, 2025
…ocal DDL
executed
https://perconadev.atlassian.net/browse/PS-9018
Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.
Analysis
--------
It needs at least 3 threads for this deadlock to happen
1. One client thread
2. Two replica applier threads
How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.
1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.
2. Replica applier thread 1 enters the group commit pipeline to register in the
"Commit Order" queue since `log-replica-updates` is disabled on the replica
node.
3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
thread 1
3.1. Becomes leader (In Commit_stage_manager::enroll_for()).
3.2. Registers in the commit order queue.
3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.
3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
not yet released.
NOTE: SE commit for applier thread is already done by the time it reaches
here.
4. Replica applier thread 2 enters the group commit pipeline to register in the
"Commit Order" queue since `log-replica-updates` is disabled on the replica
node.
5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
applier thread 2
5.1. Becomes leader (In Commit_stage_manager::enroll_for())
5.2. Registers in the commit order queue.
5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
thread 1 it will wait until the lock is released.
6. Client thread enters the group commit pipeline to register in the
"Binlog Flush" queue.
7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
queue), it enters the conditional wait `m_stage_cond_leader` with an
intention to become the leader for both the "Binlog Flush" and
"Commit Order" queues.
8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
the GTID by calling gtid_state->update_commit_group() from
Commit_order_manager::flush_engine_and_signal_threads().
9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.
9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
to become the leader. Here it finds the client thread waiting to be
the leader.
9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
cond_var `m_stage_cond_leader` and enters a conditional wait until the
thread's `tx_commit_pending` is set to false by the client thread
(will be done in the
Commit_stage_manager::process_final_stage_for_ordered_commit_group()
called by client thread from fetch_and_process_flush_stage_queue()).
10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The
thread has now become a leader and it is its responsibility to update GTID
of applier thread 2.
10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.
10.2. Returns from `enroll_for()` and proceeds to process the
"Commit Order" and "Binlog Flush" queues.
10.3. Fetches the "Commit Order" and "Binlog Flush" queues.
10.4. Performs the storage engine flush by calling ha_flush_logs() from
fetch_and_process_flush_stage_queue().
10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
calling gtid_state->update_commit_group() from
Commit_stage_manager::process_final_stage_for_ordered_commit_group().
11. At this point, we will have
- Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
- Applier thread 1 performing GTID update for itself (from step 8).
Due to the lack of proper synchronization between the above two threads,
there exists a time window where both threads can call
gtid_state->update_commit_group() concurrently.
In subsequent steps, both threads simultaneously try to modify the contents
of the array `commit_group_sidnos` which is used to track the lock status of
sidnos. This concurrent access to `update_commit_group()` can cause a
lock-leak resulting in one thread acquiring the sidno lock and not
releasing at all.
-----------------------------------------------------------------------------------------------------------
Client thread Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock();
calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos()
set commit_group_sidno[2] = true set commit_group_sidno[2] = true
lock_sidno(2) -> successful
lock_sidno(2) -> waits
update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`
if (commit_group_sidnos[2]) {
unlock_sidno(2);
commit_group_sidnos[2] = false;
}
Applier thread continues..
lock_sidno(2) -> successful
update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`
if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
unlock_sidno(2);
commit_group_sidnos[2] = false;
}
Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------
12. As the above lock-leak can also happen the other way i.e, the applier
thread fails to unlock, there can be different consequences hereafter.
13. If the client thread continues without releasing the lock, then at a later
stage, it can enter into a deadlock with the applier thread performing a
GTID update with stack trace.
Client_thread
-------------
#1 __GI___lll_lock_wait
#2 ___pthread_mutex_lock
#3 native_mutex_lock <= waits for commit lock while holding sidno lock
#4 Commit_stage_manager::enroll_for
#5 MYSQL_BIN_LOG::change_stage
#6 MYSQL_BIN_LOG::ordered_commit
#7 MYSQL_BIN_LOG::commit
#8 ha_commit_trans
percona#9 trans_commit_implicit
percona#10 mysql_create_like_table
percona#11 Sql_cmd_create_table::execute
percona#12 mysql_execute_command
percona#13 dispatch_sql_command
Applier thread
--------------
#1 ___pthread_mutex_lock
#2 native_mutex_lock
#3 safe_mutex_lock
#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock
#5 Gtid_state::update_commit_group
#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here
#7 Commit_order_manager::finish
#8 Commit_order_manager::wait_and_finish
percona#9 ha_commit_low
percona#10 trx_coordinator::commit_in_engines
percona#11 MYSQL_BIN_LOG::commit
percona#12 ha_commit_trans
percona#13 trans_commit
percona#14 Xid_log_event::do_commit
percona#15 Xid_apply_log_event::do_apply_event_worker
percona#16 Slave_worker::slave_worker_exec_event
percona#17 slave_worker_exec_job_group
percona#18 handle_slave_worker
14. If the applier thread continues without releasing the lock, then at a later
stage, it can perform recursive locking while setting the GTID for the next
transaction (in set_gtid_next()).
In debug builds the above case hits the assertion
`safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
replica applier thread when it tries to re-acquire the lock.
Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.
However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.
In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).
With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 28, 2025
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 28, 2025
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6 PS-6865 : Merge fb-prod201903 Summary: Currently during primary key's value encode, its ttl value can be from either one of these 3 cases 1. ttl column in primary key 2. non-ttl column a. old record(update case) b. current timestamp 3. ttl column in non-key field Workflow #1: first in Rdb_key_def::pack_record() find and store pk_offset, then in value encode try to parse key slice to fetch ttl value by using pk_offset. Workflow #3: fetch ttl value from ttl column The change is to merge #1 and #3 by always fetching TTL value from ttl column, not matter whether the ttl column is in primary key or not. Of course, remove pk_offset, since it isn't used. BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is stored by primary value encoding. Reviewed By: yizhang82 Differential Revision: D14662716 fbshipit-source-id: 6b4e5f044fd
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 28, 2025
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f
PS-7731 : Merge percona-202102
Summary:
Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty:
```
// bitmap is cleared on index merge, but it still needs to decode columns
bool field_requested =
decode_all_fields || m_verify_row_debug_checksums ||
bitmap_is_set(field_map, m_table->field[i]->field_index);
```
As a result MyRocks is significantly slower than InnoDB in this particular scenario.
Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*).
We have a few options to address this:
1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge.
2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false).
3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly.
In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init.
Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly.
Reference Patch: facebook/mysql-5.6@44d6a8d
---------
Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra.
Reviewed By: Pushapgl
Differential Revision: D26265470
fbshipit-source-id: f142be681ab
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 28, 2025
Upstream commit ID: facebook/mysql-5.6@3366bd9d91b2 PS-9395: Merge percona-202401 (https://jira.percona.com/browse/PS-9395) Summary: some MTR failed in ubsan due to num of rows/records calculated in records_in_range() is a negative values which is caused by m_actual_disk_size is a negative value ``` storage/rocksdb/ha_rocksdb.cc:: runtime error: -56.8272 is outside the range of representable values of type 'unsigned long long' #0 myrocks::ha_rocksdb::records_in_range_internal(unsigned int, key_range*, key_range*, long, long, unsigned long long*, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14855:16 #1 myrocks::ha_rocksdb::records_in_range(unsigned int, key_range*, key_range*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14760:3 #2 handler::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/sql/handler.cc:6608:26 #3 myrocks::ha_rocksdb::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:19002:18 #4 check_quick_select(THD*, RANGE_OPT_P ``` Due to m_actual_disk_size is an estimated value, always reset to 0 if it i becomes negative. Differential Revision: D50531919
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 28, 2025
PS-5741: Incorrect use of memset_s in keyring_vault.
Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.
---------------------------------------------------------------------------
PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate
---
*Problem:*
`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.
In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.
*Solution:*
Allocate memory with `thd_strmake` so the memory in `save` is not local.
---------------------------------------------------------------------------
Fix test main.bug12969156 when WITH_ASAN=ON
*Problem:*
ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:
```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215
Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
#0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62
This frame has 4 object(s):
[48, 56) 'result' (line 66)
[80, 112) '_db_stack_frame_' (line 63)
[144, 200) 'tm_tmp' (line 67)
[240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
#0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
#1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
#2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
#3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
#4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
#5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
#6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
#7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
#8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```
The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.
This is a benign error as all the variables are on the stack.
*Solution*:
Finish the thread in orderly way by using a signalling variable.
---------------------------------------------------------------------------
PS-8204: Fix XML escape rules for audit plugin
https://jira.percona.com/browse/PS-8204
There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.
---------------------------------------------------------------------------
PS-8854: Add main.percona_udf MTR test
Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
---------------------------------------------------------------------------
PS-9369: Fix currently processed query comparison in audit_log
https://perconadev.atlassian.net/browse/PS-9369
The audit_log uses stack to keep track of table access operations being
performed in scope of one query. It compares last known table access query
string stored on top of this stack with actual query in audit event being
processed at the moment to decide if new record should be pushed to stack
or it is time to clean records from the stack.
Currently audit_log simply compares char* variables to decide if this is
the same query string. This approach doesn't work. As a result plugin looses
control of the stack size and it starts growing with the time consuming
memory. This issue is not noticable on short term server connections
as memory is freed once connection is closed. At the same time this
leads to extra memory consumption for long running server connections.
The following is done to fix the issue:
- Query is sent along with audit event as MYSQL_LEX_CSTRING structure.
It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as
sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised
properly. Added string length check to make sure structure contains
any valid string.
- Used strncmp to compare actual strings instead of comparing char*
variables.
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 28, 2025
…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc.
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 28, 2025
…ocal DDL
executed
https://perconadev.atlassian.net/browse/PS-9018
Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.
Analysis
--------
It needs at least 3 threads for this deadlock to happen
1. One client thread
2. Two replica applier threads
How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.
1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.
2. Replica applier thread 1 enters the group commit pipeline to register in the
"Commit Order" queue since `log-replica-updates` is disabled on the replica
node.
3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
thread 1
3.1. Becomes leader (In Commit_stage_manager::enroll_for()).
3.2. Registers in the commit order queue.
3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.
3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
not yet released.
NOTE: SE commit for applier thread is already done by the time it reaches
here.
4. Replica applier thread 2 enters the group commit pipeline to register in the
"Commit Order" queue since `log-replica-updates` is disabled on the replica
node.
5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
applier thread 2
5.1. Becomes leader (In Commit_stage_manager::enroll_for())
5.2. Registers in the commit order queue.
5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
thread 1 it will wait until the lock is released.
6. Client thread enters the group commit pipeline to register in the
"Binlog Flush" queue.
7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
queue), it enters the conditional wait `m_stage_cond_leader` with an
intention to become the leader for both the "Binlog Flush" and
"Commit Order" queues.
8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
the GTID by calling gtid_state->update_commit_group() from
Commit_order_manager::flush_engine_and_signal_threads().
9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.
9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
to become the leader. Here it finds the client thread waiting to be
the leader.
9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
cond_var `m_stage_cond_leader` and enters a conditional wait until the
thread's `tx_commit_pending` is set to false by the client thread
(will be done in the
Commit_stage_manager::process_final_stage_for_ordered_commit_group()
called by client thread from fetch_and_process_flush_stage_queue()).
10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The
thread has now become a leader and it is its responsibility to update GTID
of applier thread 2.
10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.
10.2. Returns from `enroll_for()` and proceeds to process the
"Commit Order" and "Binlog Flush" queues.
10.3. Fetches the "Commit Order" and "Binlog Flush" queues.
10.4. Performs the storage engine flush by calling ha_flush_logs() from
fetch_and_process_flush_stage_queue().
10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
calling gtid_state->update_commit_group() from
Commit_stage_manager::process_final_stage_for_ordered_commit_group().
11. At this point, we will have
- Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
- Applier thread 1 performing GTID update for itself (from step 8).
Due to the lack of proper synchronization between the above two threads,
there exists a time window where both threads can call
gtid_state->update_commit_group() concurrently.
In subsequent steps, both threads simultaneously try to modify the contents
of the array `commit_group_sidnos` which is used to track the lock status of
sidnos. This concurrent access to `update_commit_group()` can cause a
lock-leak resulting in one thread acquiring the sidno lock and not
releasing at all.
-----------------------------------------------------------------------------------------------------------
Client thread Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock();
calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos()
set commit_group_sidno[2] = true set commit_group_sidno[2] = true
lock_sidno(2) -> successful
lock_sidno(2) -> waits
update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`
if (commit_group_sidnos[2]) {
unlock_sidno(2);
commit_group_sidnos[2] = false;
}
Applier thread continues..
lock_sidno(2) -> successful
update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`
if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
unlock_sidno(2);
commit_group_sidnos[2] = false;
}
Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------
12. As the above lock-leak can also happen the other way i.e, the applier
thread fails to unlock, there can be different consequences hereafter.
13. If the client thread continues without releasing the lock, then at a later
stage, it can enter into a deadlock with the applier thread performing a
GTID update with stack trace.
Client_thread
-------------
#1 __GI___lll_lock_wait
#2 ___pthread_mutex_lock
#3 native_mutex_lock <= waits for commit lock while holding sidno lock
#4 Commit_stage_manager::enroll_for
#5 MYSQL_BIN_LOG::change_stage
#6 MYSQL_BIN_LOG::ordered_commit
#7 MYSQL_BIN_LOG::commit
#8 ha_commit_trans
percona#9 trans_commit_implicit
percona#10 mysql_create_like_table
percona#11 Sql_cmd_create_table::execute
percona#12 mysql_execute_command
percona#13 dispatch_sql_command
Applier thread
--------------
#1 ___pthread_mutex_lock
#2 native_mutex_lock
#3 safe_mutex_lock
#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock
#5 Gtid_state::update_commit_group
#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here
#7 Commit_order_manager::finish
#8 Commit_order_manager::wait_and_finish
percona#9 ha_commit_low
percona#10 trx_coordinator::commit_in_engines
percona#11 MYSQL_BIN_LOG::commit
percona#12 ha_commit_trans
percona#13 trans_commit
percona#14 Xid_log_event::do_commit
percona#15 Xid_apply_log_event::do_apply_event_worker
percona#16 Slave_worker::slave_worker_exec_event
percona#17 slave_worker_exec_job_group
percona#18 handle_slave_worker
14. If the applier thread continues without releasing the lock, then at a later
stage, it can perform recursive locking while setting the GTID for the next
transaction (in set_gtid_next()).
In debug builds the above case hits the assertion
`safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
replica applier thread when it tries to re-acquire the lock.
Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.
However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.
In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).
With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 28, 2025
Upstream commit ID: facebook/mysql-5.6@3366bd9d91b2 PS-9395: Merge percona-202401 (https://jira.percona.com/browse/PS-9395) Summary: some MTR failed in ubsan due to num of rows/records calculated in records_in_range() is a negative values which is caused by m_actual_disk_size is a negative value ``` storage/rocksdb/ha_rocksdb.cc:: runtime error: -56.8272 is outside the range of representable values of type 'unsigned long long' #0 myrocks::ha_rocksdb::records_in_range_internal(unsigned int, key_range*, key_range*, long, long, unsigned long long*, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14855:16 #1 myrocks::ha_rocksdb::records_in_range(unsigned int, key_range*, key_range*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14760:3 #2 handler::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/sql/handler.cc:6608:26 #3 myrocks::ha_rocksdb::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:19002:18 #4 check_quick_select(THD*, RANGE_OPT_P ``` Due to m_actual_disk_size is an estimated value, always reset to 0 if it i becomes negative. Differential Revision: D50531919
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 31, 2025
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 31, 2025
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6 PS-6865 : Merge fb-prod201903 Summary: Currently during primary key's value encode, its ttl value can be from either one of these 3 cases 1. ttl column in primary key 2. non-ttl column a. old record(update case) b. current timestamp 3. ttl column in non-key field Workflow #1: first in Rdb_key_def::pack_record() find and store pk_offset, then in value encode try to parse key slice to fetch ttl value by using pk_offset. Workflow #3: fetch ttl value from ttl column The change is to merge #1 and #3 by always fetching TTL value from ttl column, not matter whether the ttl column is in primary key or not. Of course, remove pk_offset, since it isn't used. BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is stored by primary value encoding. Reviewed By: yizhang82 Differential Revision: D14662716 fbshipit-source-id: 6b4e5f044fd
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 31, 2025
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f
PS-7731 : Merge percona-202102
Summary:
Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty:
```
// bitmap is cleared on index merge, but it still needs to decode columns
bool field_requested =
decode_all_fields || m_verify_row_debug_checksums ||
bitmap_is_set(field_map, m_table->field[i]->field_index);
```
As a result MyRocks is significantly slower than InnoDB in this particular scenario.
Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*).
We have a few options to address this:
1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge.
2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false).
3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly.
In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init.
Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly.
Reference Patch: facebook/mysql-5.6@44d6a8d
---------
Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra.
Reviewed By: Pushapgl
Differential Revision: D26265470
fbshipit-source-id: f142be681ab
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 31, 2025
Upstream commit ID: facebook/mysql-5.6@3366bd9d91b2 PS-9395: Merge percona-202401 (https://jira.percona.com/browse/PS-9395) Summary: some MTR failed in ubsan due to num of rows/records calculated in records_in_range() is a negative values which is caused by m_actual_disk_size is a negative value ``` storage/rocksdb/ha_rocksdb.cc:: runtime error: -56.8272 is outside the range of representable values of type 'unsigned long long' #0 myrocks::ha_rocksdb::records_in_range_internal(unsigned int, key_range*, key_range*, long, long, unsigned long long*, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14855:16 #1 myrocks::ha_rocksdb::records_in_range(unsigned int, key_range*, key_range*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14760:3 #2 handler::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/sql/handler.cc:6608:26 #3 myrocks::ha_rocksdb::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:19002:18 #4 check_quick_select(THD*, RANGE_OPT_P ``` Due to m_actual_disk_size is an estimated value, always reset to 0 if it i becomes negative. Differential Revision: D50531919
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 31, 2025
PS-5741: Incorrect use of memset_s in keyring_vault.
Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.
---------------------------------------------------------------------------
PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate
---
*Problem:*
`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.
In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.
*Solution:*
Allocate memory with `thd_strmake` so the memory in `save` is not local.
---------------------------------------------------------------------------
Fix test main.bug12969156 when WITH_ASAN=ON
*Problem:*
ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:
```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215
Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
#0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62
This frame has 4 object(s):
[48, 56) 'result' (line 66)
[80, 112) '_db_stack_frame_' (line 63)
[144, 200) 'tm_tmp' (line 67)
[240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
#0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
#1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
#2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
#3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
#4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
#5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
#6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
#7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
#8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```
The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.
This is a benign error as all the variables are on the stack.
*Solution*:
Finish the thread in orderly way by using a signalling variable.
---------------------------------------------------------------------------
PS-8204: Fix XML escape rules for audit plugin
https://jira.percona.com/browse/PS-8204
There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.
---------------------------------------------------------------------------
PS-8854: Add main.percona_udf MTR test
Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
---------------------------------------------------------------------------
PS-9369: Fix currently processed query comparison in audit_log
https://perconadev.atlassian.net/browse/PS-9369
The audit_log uses stack to keep track of table access operations being
performed in scope of one query. It compares last known table access query
string stored on top of this stack with actual query in audit event being
processed at the moment to decide if new record should be pushed to stack
or it is time to clean records from the stack.
Currently audit_log simply compares char* variables to decide if this is
the same query string. This approach doesn't work. As a result plugin looses
control of the stack size and it starts growing with the time consuming
memory. This issue is not noticable on short term server connections
as memory is freed once connection is closed. At the same time this
leads to extra memory consumption for long running server connections.
The following is done to fix the issue:
- Query is sent along with audit event as MYSQL_LEX_CSTRING structure.
It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as
sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised
properly. Added string length check to make sure structure contains
any valid string.
- Used strncmp to compare actual strings instead of comparing char*
variables.
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 31, 2025
…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc.
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 31, 2025
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
percona-ysorokin
pushed a commit
that referenced
this pull request
Aug 2, 2025
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c
percona-ysorokin
pushed a commit
that referenced
this pull request
Aug 2, 2025
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6 PS-6865 : Merge fb-prod201903 Summary: Currently during primary key's value encode, its ttl value can be from either one of these 3 cases 1. ttl column in primary key 2. non-ttl column a. old record(update case) b. current timestamp 3. ttl column in non-key field Workflow #1: first in Rdb_key_def::pack_record() find and store pk_offset, then in value encode try to parse key slice to fetch ttl value by using pk_offset. Workflow #3: fetch ttl value from ttl column The change is to merge #1 and #3 by always fetching TTL value from ttl column, not matter whether the ttl column is in primary key or not. Of course, remove pk_offset, since it isn't used. BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is stored by primary value encoding. Reviewed By: yizhang82 Differential Revision: D14662716 fbshipit-source-id: 6b4e5f044fd
percona-ysorokin
pushed a commit
that referenced
this pull request
Aug 2, 2025
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f
PS-7731 : Merge percona-202102
Summary:
Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty:
```
// bitmap is cleared on index merge, but it still needs to decode columns
bool field_requested =
decode_all_fields || m_verify_row_debug_checksums ||
bitmap_is_set(field_map, m_table->field[i]->field_index);
```
As a result MyRocks is significantly slower than InnoDB in this particular scenario.
Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*).
We have a few options to address this:
1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge.
2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false).
3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly.
In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init.
Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly.
Reference Patch: facebook/mysql-5.6@44d6a8d
---------
Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra.
Reviewed By: Pushapgl
Differential Revision: D26265470
fbshipit-source-id: f142be681ab
percona-ysorokin
pushed a commit
that referenced
this pull request
Aug 2, 2025
Upstream commit ID: facebook/mysql-5.6@3366bd9d91b2 PS-9395: Merge percona-202401 (https://jira.percona.com/browse/PS-9395) Summary: some MTR failed in ubsan due to num of rows/records calculated in records_in_range() is a negative values which is caused by m_actual_disk_size is a negative value ``` storage/rocksdb/ha_rocksdb.cc:: runtime error: -56.8272 is outside the range of representable values of type 'unsigned long long' #0 myrocks::ha_rocksdb::records_in_range_internal(unsigned int, key_range*, key_range*, long, long, unsigned long long*, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14855:16 #1 myrocks::ha_rocksdb::records_in_range(unsigned int, key_range*, key_range*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14760:3 #2 handler::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/sql/handler.cc:6608:26 #3 myrocks::ha_rocksdb::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:19002:18 #4 check_quick_select(THD*, RANGE_OPT_P ``` Due to m_actual_disk_size is an estimated value, always reset to 0 if it i becomes negative. Differential Revision: D50531919
percona-ysorokin
pushed a commit
that referenced
this pull request
Aug 2, 2025
PS-5741: Incorrect use of memset_s in keyring_vault.
Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.
---------------------------------------------------------------------------
PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate
---
*Problem:*
`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.
In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.
*Solution:*
Allocate memory with `thd_strmake` so the memory in `save` is not local.
---------------------------------------------------------------------------
Fix test main.bug12969156 when WITH_ASAN=ON
*Problem:*
ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:
```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215
Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
#0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62
This frame has 4 object(s):
[48, 56) 'result' (line 66)
[80, 112) '_db_stack_frame_' (line 63)
[144, 200) 'tm_tmp' (line 67)
[240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
#0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
#1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
#2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
#3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
#4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
#5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
#6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
#7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
#8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```
The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.
This is a benign error as all the variables are on the stack.
*Solution*:
Finish the thread in orderly way by using a signalling variable.
---------------------------------------------------------------------------
PS-8204: Fix XML escape rules for audit plugin
https://jira.percona.com/browse/PS-8204
There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.
---------------------------------------------------------------------------
PS-8854: Add main.percona_udf MTR test
Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
---------------------------------------------------------------------------
PS-9369: Fix currently processed query comparison in audit_log
https://perconadev.atlassian.net/browse/PS-9369
The audit_log uses stack to keep track of table access operations being
performed in scope of one query. It compares last known table access query
string stored on top of this stack with actual query in audit event being
processed at the moment to decide if new record should be pushed to stack
or it is time to clean records from the stack.
Currently audit_log simply compares char* variables to decide if this is
the same query string. This approach doesn't work. As a result plugin looses
control of the stack size and it starts growing with the time consuming
memory. This issue is not noticable on short term server connections
as memory is freed once connection is closed. At the same time this
leads to extra memory consumption for long running server connections.
The following is done to fix the issue:
- Query is sent along with audit event as MYSQL_LEX_CSTRING structure.
It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as
sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised
properly. Added string length check to make sure structure contains
any valid string.
- Used strncmp to compare actual strings instead of comparing char*
variables.
percona-ysorokin
pushed a commit
that referenced
this pull request
Aug 2, 2025
…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc.
percona-ysorokin
pushed a commit
that referenced
this pull request
Aug 2, 2025
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
percona-ysorokin
added a commit
that referenced
this pull request
Aug 12, 2025
https://perconadev.atlassian.net/browse/PS-10049 Fixed memory leak in 'NdbTimestamp-t' unit test detected by ASan. ================================================================= ==3324130==ERROR: LeakSanitizer: detected memory leaks Direct leak of 67 byte(s) in 3 object(s) allocated from: #0 0x5555ab78bc5a in strdup (/home/yura/ws/percona-server-8.4-build-asan_clang20/runtime_output_directory/NdbTimestamp-t+0xbdc5a) (BuildId: 9c5a51c07dd925cf13b2f9408597355a524793bc) #1 0x5555ab7eb685 in test_TZ(int) /home/yura/ws/percona-server-8.4/storage/ndb/src/common/portlib/NdbTimestamp.cpp:516:10 #2 0x5555ab7eb126 in main /home/yura/ws/percona-server-8.4/storage/ndb/src/common/portlib/NdbTimestamp.cpp:576:53 #3 0x7f0596ae4d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 SUMMARY: AddressSanitizer: 67 byte(s) leaked in 3 allocation(s).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.