Re-implement Doublewrite buffer encryption by satya-bodapati · Pull Request #3 · percona-ysorokin/percona-server

satya-bodapati · 2020-06-11T08:55:15Z

No description provided.

…ace encryption) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-3822 "InnoDB system tablespace encryption" https://jira.percona.com/browse/PS-3822 (commit 78b6114) to make parallel doublewrite part of the upstream 8.0.20 merge easier. Temporarily disabled the following MTR test cases: - 'innodb.percona_parallel_dblwr_encrypt' - 'innodb.percona_sys_tablespace_encrypt' - 'innodb.percona_sys_tablespace_encrypt_dblwr' - 'sys_vars.innodb_parallel_dblwr_encrypt_basic' - 'sys_vars.innodb_sys_tablespace_encrypt_basic'

…b_doublewrite file when innodb_doublewrite is disabled) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-3411 "LP #1570682: Parallel doublewrite buffer file created when skip-innodb_doublewrite is set" https://jira.percona.com/browse/PS-3411 (commit 14318e4) to make parallel doublewrite part of the upstream 8.0.20 merge easier.

…must crash server on I/O error) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-5678 "Parallel doublewrite must crash server on I/O error" https://jira.percona.com/browse/PS-5678 (commit 0f810d7) to make parallel doublewrite part of the upstream 8.0.20 merge easier.

…rotation. ALPHA) https://jira.percona.com/browse/PS-6789 Temporarily reverted 'buf0dblwr.cc' part of the PS-3829 "Innodb key rotation. ALPHA" https://jira.percona.com/browse/PS-3829 (commit c7f44ee) to make parallel doublewrite part of the upstream 8.0.20 merge easier.

…d to set O_DIRECT on xb_doublewrite when running MTR test cases) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-1068 "Fix bug 1669414 (Failed to set O_DIRECT on xb_doublewrite when running MTR test cases)" https://jira.percona.com/browse/PS-1068 (commit 7f41824) to make parallel doublewrite part of the upstream 8.0.20 merge easier.

…lel doublewrite memory not freed with innodb_fast_shutdown=2) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-1707 "LP #1578139: Parallel doublewrite memory not freed with innodb_fast_shutdown=2" https://jira.percona.com/browse/PS-1707 (commit 8a53ed7) to make parallel doublewrite part of the upstream 8.0.20 merge easier.

… implementation (Implement parallel doublewrite) https://jira.percona.com/browse/PS-6789 Reverted 'parallel-doublewrite' blueprint implementation "Implement parallel doublewrite" https://blueprints.launchpad.net/percona-server/+spec/parallel-doublewrite (commit 4596aaa) to make parallel doublewrite part of the upstream 8.0.20 merge easier. Temporarily disabled the following MTR test cases: - 'sys_vars.innodb_parallel_doublewrite_path_basic' - 'innodb.percona_doublewrite'

https://jira.percona.com/browse/PS-6789 *** Updated man pages from MySQL Server 8.0.20 source tarball. *** Updated 'scripts/fill_help_tables.sql' from MySQL Server 8.0.20 source tarball.

https://jira.percona.com/browse/PS-6789

https://jira.percona.com/browse/PS-6789 *** Reverted our fix for PS-6094 "Handler fails to trigger on Error 1049 or SQLSTATE 42000 or plain sqlexception" (https://jira.percona.com/browse/PS-6094) (commit 31b5c73) in favor of the upstream fix for the Bug #30561920 / #97682 "Handler fails to trigger on Error 1049 or SQLSTATE 42000 or plain sqlexception" (https://bugs.mysql.com/bug.php?id=97682) (commit mysql/mysql-server@72c6171). *** Reverted our fix for PS-3630 "LP #1660255: Test innodb.innodb_mysql is unstable" (https://jira.percona.com/browse/PS-3630) (commit e0b5050) in favor of the upstream fix for the Bug #30810572 "FIX INNODB-MYSQL TEST" (commit mysql/mysql-server@2692669). *** Reverted our 8.0.17 merge postfix "PS-5363 (Merge MySQL 8.0.17): fixed regexps in the rpl.rpl_perfschema_threads_processlist_status MTR test case" (https://jira.percona.com/browse/PS-5363) (commit 8d7dd4a) affecting 'rpl.rpl_perfschema_threads_processlist_status' MTR test case in favor of the changes made by upstream in WL#3549 "Binlog Compression" (commit mysql/mysql-server@1e5ae34). *** Reverted our 8.0.18 merge postfix "PS-5674: gen_lex_token generator reworked" (https://jira.percona.com/browse/PS-5674) (commit 214212a) in favor of the changes made by upstream Bug #30765691 "FREE TOKEN SLOTS ARE EXHAUSTED IN GEN_LEX_TOKEN.CC" (commit mysql/mysql-server@17ca03f). 'SYM_PERCONA()' macro preserved and made a synonym for upstream's 'SYM()'. Percona Server 5.7-specific tokens - CHANGED_PAGE_BITMAPS_SYM - CLIENT_STATS_SYM - CLUSTERING_SYM - COMPRESSION_DICTIONARY_SYM - INDEX_STATS_SYM - TABLE_STATS_SYM - THREAD_STATS_SYM - USER_STATS_SYM - ENCRYPTION_KEY_ID_SYM explicitly assigned values starting from 1300. The same values were assigned to them implicitly in Percona Server 8.0.19. Percona Server 8.0-specific tokens - EFFECTIVE_SYM - SEQUENCE_TABLE_SYM explicitly assigned values starting from 1350. This group has different values than in Percona Server 8.0.19. *** Similarly to other 'innodb.log_encrypt_<n>' MTR test cases 'innodb.log_encrypt_7' coming from upstream 8.0.20 cloned into two 'innodb.log_encrypt_7_mk' and 'innodb.log_encrypt_7_rk'. *** Similarly to other 'innodb.table_encrypt_<n>' MTR test cases 'innodb.table_encrypt_6' coming from upstream 8.0.20 cloned into three 'innodb.table_encrypt_6', 'keyring_vault.table_encrypt_6' and 'keyring_vault.table_encrypt_6_directory'. *** VERSION raised to "8.0.20-11". univ.i version raised to "11".

https://jira.percona.com/browse/PS-6789 In the fix for Bug #30508721 "MTR DOESN'T KEEP TRACK OF THE STATE OF INNODB MONITORS" (commit mysql/mysql-server@abd33c2) Oracle extended MTR 'check-testcase' procedure with additional comparison of data from InnoDB metrics state. They also introduced 'mysql-test/include/innodb_monitor_restore.inc' MTR include file that is supposed to reset InnoDB monitors to their default state. 'mysql-test/include/innodb_monitor_restore.inc' extended with enabling Percona-specific monitors, those that are enabled (defined with 'MONITOR_DEFAULT_ON' flag) by default. Similarly to what was done in the upstream patch "SET GLOBAL innodb_monitor_enable=default;" "SET GLOBAL innodb_monitor_disable=default;" "SET GLOBAL innodb_monitor_reset_all=default;" statement sequences were substituted with '--source include/innodb_monitor_restore.inc' all over the test code. As the result, fixed the following MTR test cases: - 'innodb.innodb_idle_flush_pct' - 'innodb.lock_contention_big' - 'innodb.monitor' - 'innodb.percona_ahi_partitions' - 'innodb.percona_changed_page_bmp_flush_5446' - 'innodb.transportable_tbsp-debug' - 'innodb_zip.transportable_tbsp_debug_zip' - 'sys_vars.innodb_monitor_disable_basic' - 'sys_vars.innodb_monitor_enable_basic' - 'sys_vars.innodb_monitor_reset_all_basic' - 'sys_vars.innodb_monitor_reset_basic' - 'sys_vars.innodb_purge_run_now_basic' - 'sys_vars.innodb_purge_stop_now_basic'

…ated MTR test cases https://jira.percona.com/browse/PS-6789 The following MTR test cases re-recorded because of the 'filesort' improvements introduced in the fix for Oracle's Bug #30776132 "MAKE FILESORT KEYS CONSISTENT BETWEEN FIELDS AND ITEMS" (commit mysql/mysql-server@6d587a6) - 'main.pool_of_threads' - 'main.pool_of_threads_high_prio_tickets'. The following MTR test cases re-recorded because of the changed execution plan (more hash joins instead of nested blok loops) introduced in these improvements Bug #30528604 "DELETE THE PRE-ITERATOR EXECUTOR" (commit mysql/mysql-server@ef166f8), Bug #30473261 "CONVERT THE INDEX SUBQUERY ENGINES INTO USING THE ITERATOR EXECUTOR" (commit mysql/mysql-server@cb4116e) (commit mysql/mysql-server@629b549) (commit mysql/mysql-server@5a41fba) (commit mysql/mysql-server@31bd903) (commit mysql/mysql-server@75bbe1b) (commit mysql/mysql-server@6226c1a) (commit mysql/mysql-server@0b45e96) (commit mysql/mysql-server@8e45d7e) (commit mysql/mysql-server@7493ae4) (commit mysql/mysql-server@a5f60bf) (commit mysql/mysql-server@609b86e), Bug #30912972 "ASSERTION `KEYLEN == M_START_KEY.LENGTH' FAILED" (commit mysql/mysql-server@b28bea5) - 'audit_log.audit_log_filter_db' - 'main.pool_of_threads' - 'main.pool_of_threads_high_prio_tickets' - 'main.percona_expand_fast_index_creation' - 'main.percona_sequence_table'

https://jira.percona.com/browse/PS-6789 Re-recorded 'main.bug74778' MTR test case because of the new 'SHOW_ROUTINE' privilege implemented by Oracle in WL #9049 "Add a dynamic privilege for stored routine backup" (https://dev.mysql.com/worklog/task/?id=9049) (commit mysql/mysql-server@3e41e44)

… MTR test case https://jira.percona.com/browse/PS-6789 Re-recorded 'main.backup_locks_mysqldump' MTR test case because of the new default 'mysqldump' network timeout introduced in the fix for Oracle Bug #30755992 / #98203 "mysql dump sufficiently long network timeout too short" (https://bugs.mysql.com/bug.php?id=98203) (commit mysql/mysql-server@1f90fad)

https://jira.percona.com/browse/PS-6789 Re-recorded 'main.bug88797' MTR test case because of the new deprecation warning introduced in the implementation of WL #13325 "Deprecate VALUES syntax in INSERT ... ON DUPLICATE KEY UPDATE" (https://dev.mysql.com/worklog/task/?id=13325) (commit mysql/mysql-server@6f3b9df)

…test cases with explicit binlog positions https://jira.percona.com/browse/PS-6789 Fixed/re-recorded the following MTR test cases because of the changes in the implementation of WL percona#3549 "Binlog: compression" (https://dev.mysql.com/worklog/task/?id=3549) (commit mysql/mysql-server@1e5ae34) that caused increasing 'Format_description_event' binlog event size and therefore some pre-recorded binary log positions in the '.result' files. - 'main.ackup_safe_binlog_info' - 'main.mysqldump-max' - 'binlog.percona_binlog_consistent_mixed' - 'binlog.percona_binlog_consistent_row' - 'binlog.percona_binlog_consistent_stmt' - 'binlog.percona_binlog_consistent_debug'

…space encryption) https://jira.percona.com/browse/PS-6789 1. Re-enable system tablespace encryption again after 8.0.20 upstream merge that has new parallel doublewrite implementation (https://jira.percona.com/browse/PS-3822). 2. Removed 'innodb.percona_sys_tablespace_encrypt_dblwr' MTR test case as there is no doublewrite buffer in system tablespace anymore.

…29 (Innodb key rotation. ALPHA) https://jira.percona.com/browse/PS-6789 Restored 'buf0dblwr.cc' part of the PS-3829 "Innodb key rotation. ALPHA" https://jira.percona.com/browse/PS-3829 (commit c7f44ee) after upstream 8.0.20 merge. The following MTR test cases do not crash anymore - 'encryption.upgrade_crypt_data_57_v1' - 'encryption.upgrade_crypt_data_v1' - 'innodb.innodb_scrub' - 'main.percona_dd_upgrade_encrypted'

…in.percona_signal_handling_threadpool MTR test cases https://jira.percona.com/browse/PS-6789 Fixed and re-recorded 'main.percona_signal_handling' and 'main.percona_signal_handling_threadpool' MTR test in response to the changes in the Bug #30578923 "SENDING SIGHUP CAUSES A LOT OF GARBAGE TO BE PRINTED" (commit mysql/mysql-server@b90a1b3). Removed "Status information:" log section is now simulated via 'DBUG_EXECUTE_IF()'. MTR test cases made debug-only.

satya-bodapati · 2020-06-11T14:29:40Z

./mtr --mem innodb.percona_parallel_dblwr_encrypt{,,,} --parallel=4 --repeat=20
Logging: /home/satya/WORK/ps-8.0.20-merge/mysql-test/mysql-test-run.pl --mem innodb.percona_parallel_dblwr_encrypt innodb.percona_parallel_dblwr_encrypt innodb.percona_parallel_dblwr_encrypt innodb.percona_parallel_dblwr_encrypt --parallel=4 --repeat=20
MySQL Version 8.0.20

[ 93%] innodb.percona_parallel_dblwr_encrypt w4 [ pass ] 11147
[ 95%] innodb.percona_parallel_dblwr_encrypt w2 [ pass ] 11241
[ 96%] innodb.percona_parallel_dblwr_encrypt w3 [ pass ] 10978
[ 97%] innodb.percona_parallel_dblwr_encrypt w1 [ pass ] 10974
[ 98%] innodb.percona_parallel_dblwr_encrypt w4 [ pass ] 10765
[100%] innodb.percona_parallel_dblwr_encrypt w2 [ pass ] 10931

The servers were restarted 76 times
The servers were reinitialized 0 times
Spent 1837.889 of 794 seconds executing testcases

Completed: All 80 tests were successful.

…o: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded Problem ======= Running mtr with ASAN build on Gentoo tests fails since the path to libtirpc is not /lib64/libtirpc.so which is the path mtr uses for preloading the library. Further more the libasan path in Gentoo may contain also underscores and minus which mtr safe_process does not recognize. Fails on Gentoo since /lib64/libtirpc.so do not exist +ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. Fails on Gentoo since /usr/lib64/libtirpc.so is a GNU LD script +ERROR: ld.so: object '/usr/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (invalid ELF header): ignored. Need to preload /lib64/libtirpc.so.3 on gentoo. When compiling with GNU C++ libasan path also include minus and underscores: $ less mysql-test/lib/My/SafeProcess/ldd_asan_test_result linux-vdso.so.1 (0x00007ffeba962000) libasan.so.4 => /usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so.4 (0x00007f3c2e827000) Tests that been affected in different ways are for example: $ ./mtr group_replication.gr_clone_integration_clone_not_installed [100%] group_replication.gr_clone_integration_clone_not_installed w3 [ fail ] ... ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. mysqltest: At line 21: Query 'START GROUP_REPLICATION' failed. ERROR 2013 (HY000): Lost connection to MySQL server during query ... ASAN:DEADLYSIGNAL ================================================================= ==11970==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f0e5cecfb8c bp 0x7f0e340f1650 sp 0x7f0e340f0dc8 T44) ==11970==The signal is caused by a READ memory access. ==11970==Hint: address points to the zero page. #0 0x7f0e5cecfb8b in xdr_uint32_t (/lib64/libc.so.6+0x13cb8b) #1 0x7f0e5fbe6d43 (/usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so.4+0x87d43) #2 0x7f0e3c675e59 in xdr_node_no plugin/group_replication/libmysqlgcs/xdr_gen/xcom_vp_xdr.c:88 #3 0x7f0e3c67744d in xdr_pax_msg_1_6 plugin/group_replication/libmysqlgcs/xdr_gen/xcom_vp_xdr.c:852 ... $ ./mtr ndb.ndb_config [100%] ndb.ndb_config [ fail ] ... --- /.../src/mysql-test/suite/ndb/r/ndb_config.result 2019-06-25 21:19:08.308997942 +0300 +++ /.../bld/mysql-test/var/log/ndb_config.reject 2019-06-26 11:58:11.718512944 +0300 @@ -30,16 +30,22 @@ == 16 == bug44689 192.168.0.1 192.168.0.2 192.168.0.3 192.168.0.4 192.168.0.1 192.168.0.1 == 17 == bug49400 +ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. +ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR -- at line 25: TCP connection is a duplicate of the existing TCP link from line 14 ERROR -- at line 25: Could not store section of configuration file. $ ./mtr ndb.ndb_basic [100%] ndb.ndb_basic [ pass ] 34706 ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. Solution ======== In safe_process use same trick for libtirpc as for libasan to determine path to library for pre loading. Also allow underscores and minus in paths. In addition also add some memory leak suppressions for perl. Change-Id: Ia02e354a20cf8b279eb2573f3f8c2c39776343dc (cherry picked from commit e88706d)

To call a service implementation one needs to: 1. query the registry to get a reference to the service needed 2. call the service via the reference 3. call the registry to release the reference While #2 is very fast (just a function pointer call) #1 and #3 can be expensive since they'd need to interact with the registry's global structure in a read/write fashion. Hence if the above sequence is to be repeated in a quick succession it'd be beneficial to do steps #1 and #3 just once and aggregate as many #2 steps in a single sequence. This will usually mean to cache the service reference received in #1 and delay 3 for as much as possible. But since there's an active reference held to the service implementation until 3 is taken special handling is needed to make sure that: The references are released at regular intervals so changes in the registry can become effective. There is a way to mark a service implementation as "inactive" ("dying") so that until all of the active references to it are released no new ones are possible. All of the above is part of the current audit API machinery, but needs to be isolated into a separate service suite and made generally available to all services. This is what this worklog aims to implement. RB#24806

TABLESPACE STATE DOES NOT CHANGE THE SPACE TO EMPTY After the commit for Bug#31991688, it was found that an idle system may not ever get around to truncating an undo tablespace when it is SET INACTIVE. Actually, it takes about 128 seconds before the undo tablespace is finally truncated. There are three main tasks for the function trx_purge(). 1) Process the undo logs and apply changes to the data files. (May be multiple threads) 2) Clean up the history list by freeing old undo logs and rollback segments. 3) Truncate undo tablespaces that have grown too big or are SET INACTIVE explicitly. Bug#31991688 made sure that steps 2 & 3 are not done too often. Concentrating this effort keeps the purge lag from growing too large. By default, trx_purge() does step#1 128 times before attempting steps #2 & #3 which are called 'truncate' steps. This is set by the setting innodb_purge_rseg_truncate_frequency. On an idle system, trx_purge() is called once per second if it has nothing to do in step 1. After 128 seconds, it will finally do steps 2 (truncating the undo logs and rollback segments which reduces the history list to zero) and step 3 (truncating any undo tablespaces that need it). The function that the purge coordinator thread uses to make these repeated calls to trx_purge() is called srv_do_purge(). When trx_purge() returns having done nothing, srv_do_purge() returns to srv_purge_coordinator_thread() which will put the purge thread to sleep. It is woke up again once per second by the master thread in srv_master_do_idle_tasks() if not sooner by any of several of other threads and activities. This is how an idle system can wait 128 seconds before the truncate steps are done and an undo tablespace that was SET INACTIVE can finally become 'empty'. The solution in this patch is to modify srv_do_purge() so that if trx_purge() did nothing and there is an undo space that was explicitly set to inactive, it will immediately call trx_purge again with do_truncate=true so that steps #2 and #3 will be done. This does not affect the effort by Bug#31991688 to keep the purge lag from growing too big on sysbench UPDATE NO_KEY. With this change, the purge lag has to be zero and there must be a pending explicit undo space truncate before this extra call to trx_purge is done. Approved by Sunny in RB#25311

Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c

Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6 PS-6865 : Merge fb-prod201903 Summary: Currently during primary key's value encode, its ttl value can be from either one of these 3 cases 1. ttl column in primary key 2. non-ttl column a. old record(update case) b. current timestamp 3. ttl column in non-key field Workflow #1: first in Rdb_key_def::pack_record() find and store pk_offset, then in value encode try to parse key slice to fetch ttl value by using pk_offset. Workflow #3: fetch ttl value from ttl column The change is to merge #1 and #3 by always fetching TTL value from ttl column, not matter whether the ttl column is in primary key or not. Of course, remove pk_offset, since it isn't used. BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is stored by primary value encoding. Reviewed By: yizhang82 Differential Revision: D14662716 fbshipit-source-id: 6b4e5f044fd

Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab

Upstream commit ID: facebook/mysql-5.6@3366bd9d91b2 PS-9395: Merge percona-202401 (https://jira.percona.com/browse/PS-9395) Summary: some MTR failed in ubsan due to num of rows/records calculated in records_in_range() is a negative values which is caused by m_actual_disk_size is a negative value ``` storage/rocksdb/ha_rocksdb.cc:: runtime error: -56.8272 is outside the range of representable values of type 'unsigned long long' #0 myrocks::ha_rocksdb::records_in_range_internal(unsigned int, key_range*, key_range*, long, long, unsigned long long*, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14855:16 #1 myrocks::ha_rocksdb::records_in_range(unsigned int, key_range*, key_range*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14760:3 #2 handler::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/sql/handler.cc:6608:26 #3 myrocks::ha_rocksdb::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:19002:18 #4 check_quick_select(THD*, RANGE_OPT_P ``` Due to m_actual_disk_size is an estimated value, always reset to 0 if it i becomes negative. Differential Revision: D50531919

PS-5741: Incorrect use of memset_s in keyring_vault. Fixed the usage of memset_s. The arguments should be: void memset_s(void *dest, size_t dest_max, int c, size_t n) where the 2nd argument is size of buffer and the 3rd is argument is character to fill. --------------------------------------------------------------------------- PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate --- *Problem:* `st_mysql_value::val_str` might return a pointer to `buf` which after the function called is deleted. Therefore the value in `save`, after reuturnin from the function, is invalid. In this particular case, the error is not manifesting as val_str` returns memory allocated with `thd_strmake` and it does not use `buf`. *Solution:* Allocate memory with `thd_strmake` so the memory in `save` is not local. --------------------------------------------------------------------------- Fix test main.bug12969156 when WITH_ASAN=ON *Problem:* ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`: ``` ==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478 WRITE of size 24 at 0x7fe746d06d14 thread T16777215 Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62 This frame has 4 object(s): [48, 56) 'result' (line 66) [80, 112) '_db_stack_frame_' (line 63) [144, 200) 'tm_tmp' (line 67) [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) Thread T26 created by T25 here: #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104 #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148 #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279 #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279 #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664 #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160 #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952 #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544 percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065 percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325 percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198 percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473 ``` The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above. This is a benign error as all the variables are on the stack. *Solution*: Finish the thread in orderly way by using a signalling variable. --------------------------------------------------------------------------- PS-8204: Fix XML escape rules for audit plugin https://jira.percona.com/browse/PS-8204 There was a wrong length specified for some XML escape rules. As a result of this terminating null symbol from replacement rule was copied into resulting string. This lead to quer text truncation in audit log file. In addition added empty replacement rules for '\b' and 'f' symbols which just remove them from resulting string. These symboles are not supported in XML 1.0. --------------------------------------------------------------------------- PS-8854: Add main.percona_udf MTR test Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions. --------------------------------------------------------------------------- PS-9369: Fix currently processed query comparison in audit_log https://perconadev.atlassian.net/browse/PS-9369 The audit_log uses stack to keep track of table access operations being performed in scope of one query. It compares last known table access query string stored on top of this stack with actual query in audit event being processed at the moment to decide if new record should be pushed to stack or it is time to clean records from the stack. Currently audit_log simply compares char* variables to decide if this is the same query string. This approach doesn't work. As a result plugin looses control of the stack size and it starts growing with the time consuming memory. This issue is not noticable on short term server connections as memory is freed once connection is closed. At the same time this leads to extra memory consumption for long running server connections. The following is done to fix the issue: - Query is sent along with audit event as MYSQL_LEX_CSTRING structure. It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised properly. Added string length check to make sure structure contains any valid string. - Used strncmp to compare actual strings instead of comparing char* variables.

…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc.

…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.