[pull] main from llvm:main by pull[bot] · Pull Request #1148 · MPACT-ORG/llvm-project

pull · 2026-03-20T23:55:04Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

This fixes a few places where MissingFeatures asserts were incorrect, extends the text of two errorNYI diagnostics to disambiguate them, and fixes a typo in an adjacent comment.

Related: #179278, #160386 Extends cir.global to accept address space attributes. Globals can now specify either `target_address_space(N)` or `lang_address_space(offload_*)`. Address spaces are also preserved throughout get_global ops.

#186261) Summary: This PR changes the handling of the emitted kernels when targeting a CPU to be a pointer struct. The old handling emitted a standard function prototype, this necessitated a target specific ABI to call it because the signature differed with the number of arguments. Instead, this PR emits a void pointer to a naturally aligned struct, this is what APIs like `pthreads` assert. This allows us to remove all the complexity around launching host kernels and just pass the argument list.

…ots temporarily (#187753) Windows bots are still failing after a3db68a and d7dbba5. This test is new, let's take it off while we investigate.

Test case for #187728

…el (#187740) These FP <-> Integer conversion instructions should use PipeA instead.

#187563) This helper class contains an optional value and a "reason" message. It replaces the uses of std::pair<optional<...>, Reason>. Issue: #185287

…n ambiguity notes (#187750) In SemaLookup.cpp, `UnqualUsingDirectiveSet::done()` uses `llvm::sort` with a comparator that only checks the ancestor relationships. So, if there are multiple "neighbor" namespaces, they are considered equal, and thus `llvm::sort` may return the using directives in a non-deterministic order. This was observed as a test failure on clang/test/CXX/drs/cwg0xx.cpp at line 220 after PR #187219 started verifying the diagnostics ordering. The two "candidate found by name lookup" notes were emitted in the opposite order from the test's expectations -- in some builds of Clang, but not others. Switching to `llvm::stable_sort` ensures that using-directives are always traversed in a deterministic order, and thus the notes emitted deterministically.

Co-authored-by: Jay Foad <jay.foad@amd.com>

Source modifiers (input modifiers) should always be immediates. This commit made machine verifier reject non-immediate source modifiers. Closes #182243

…es.mir (#186504) This patch adds two more functions for exercising the target-cpu attribute.

Large memcopies are pretty rare, but are more common in ML workloads (copying large matrixes/tensors, often to/from CPU host). For large copies NTA stores can provide performance advantages for both memcpy itself and the rest of the workload (by reducing cache pollution). Other runtimes already have NTA path for large copies, so add 1 to the llvm-libc. Internal whole-program loadtests shows small, but statistically significant improvement of 0.1%. ML specific bencahmrks showed 10-20% performance gain, and fleetbench (https://github.com/google/fleetbench, which has more up-to-date version of libc benchmarks) shows ~3% gain (ns/byte for distributions taken from various applications). ``` [Memcpy_0]_L1 0.01950n ± 3% 0.01900n ± 5% ~ (p=0.390 n=20) [Memcpy_0]_L2 0.02300n ± 0% 0.02300n ± 0% ~ (p=0.256 n=20) [Memcpy_0]_LLC 0.1335n ± 1% 0.1310n ± 1% -1.87% (p=0.000 n=20) [Memcpy_0]_Cold 0.1540n ± 2% 0.1520n ± 1% -1.30% (p=0.021 n=20) [Memcpy_1]_L1 0.04300n ± 5% 0.04200n ± 2% -2.33% (p=0.000 n=20) [Memcpy_1]_L2 0.05000n ± 2% 0.04800n ± 0% -4.00% (p=0.000 n=20) [Memcpy_1]_LLC 0.2500n ± 2% 0.2390n ± 1% -4.40% (p=0.000 n=20) [Memcpy_1]_Cold 0.2750n ± 1% 0.2640n ± 1% -4.00% (p=0.000 n=20) [Memcpy_2]_L1 0.03800n ± 3% 0.03800n ± 3% ~ (p=0.420 n=20) [Memcpy_2]_L2 0.04400n ± 2% 0.04300n ± 0% -2.27% (p=0.000 n=20) [Memcpy_2]_LLC 0.2320n ± 1% 0.2220n ± 1% -4.31% (p=0.000 n=20) [Memcpy_2]_Cold 0.2565n ± 1% 0.2460n ± 1% -4.09% (p=0.000 n=20) [Memcpy_3]_L1 0.1380n ± 1% 0.1355n ± 2% ~ (p=0.095 n=20) [Memcpy_3]_L2 0.1490n ± 1% 0.1430n ± 1% -4.03% (p=0.000 n=20) [Memcpy_3]_LLC 0.7955n ± 1% 0.7450n ± 0% -6.35% (p=0.000 n=20) [Memcpy_3]_Cold 0.8495n ± 1% 0.7935n ± 0% -6.59% (p=0.000 n=20) [Memcpy_4]_L1 0.04000n ± 3% 0.03900n ± 3% ~ (p=0.466 n=20) [Memcpy_4]_L2 0.04500n ± 2% 0.04400n ± 2% ~ (p=0.130 n=20) [Memcpy_4]_LLC 0.2040n ± 1% 0.1950n ± 1% -4.41% (p=0.000 n=20) [Memcpy_4]_Cold 0.2240n ± 1% 0.2150n ± 1% -4.02% (p=0.000 n=20) [Memcpy_5]_L1 0.05800n ± 3% 0.06050n ± 1% +4.31% (p=0.000 n=20) [Memcpy_5]_L2 0.06400n ± 0% 0.06400n ± 2% 0.00% (p=0.004 n=20) [Memcpy_5]_LLC 0.3320n ± 1% 0.3140n ± 1% -5.42% (p=0.000 n=20) [Memcpy_5]_Cold 0.3620n ± 1% 0.3430n ± 0% -5.25% (p=0.000 n=20) [Memcpy_6]_L1 0.05700n ± 2% 0.05750n ± 3% ~ (p=0.403 n=20) [Memcpy_6]_L2 0.06500n ± 0% 0.06250n ± 1% -3.85% (p=0.000 n=20) [Memcpy_6]_LLC 0.3410n ± 1% 0.3205n ± 1% -6.01% (p=0.000 n=20) [Memcpy_6]_Cold 0.3670n ± 1% 0.3470n ± 1% -5.45% (p=0.000 n=20) [Memcpy_7]_L1 0.05900n ± 2% 0.05900n ± 2% ~ (p=0.296 n=20) [Memcpy_7]_L2 0.06400n ± 2% 0.06400n ± 0% ~ (p=0.327 n=20) [Memcpy_7]_LLC 0.3145n ± 1% 0.2965n ± 1% -5.72% (p=0.000 n=20) [Memcpy_7]_Cold 0.3410n ± 1% 0.3220n ± 0% -5.57% (p=0.000 n=20) [Memcpy_8]_L1 0.03600n ± 3% 0.03600n ± 3% ~ (p=0.804 n=20) [Memcpy_8]_L2 0.04200n ± 0% 0.04100n ± 2% -2.38% (p=0.000 n=20) [Memcpy_8]_LLC 0.2210n ± 1% 0.2090n ± 1% -5.43% (p=0.000 n=20) [Memcpy_8]_Cold 0.2415n ± 1% 0.2300n ± 1% -4.76% (p=0.000 n=20) geomean 0.1184n 0.1148n -3.03% ```

)

…7777) When converting OpenACC compute constructs to acc.compute_region, also sink constants inside so they do not become live-ins.

Add one new flag, dealloc_align_mismatch that turns on/off alignment checks. Add three new config parameters, one for deallocate type mismatch (such as abort on new/free if true), one for checking if the size parameter matches on dealloc and one for checking if the alignment is correct on a dealloc. Add extra flags to be passed for to indicate to do an align/size check. Update report functions to better indicate the errors. Add unit tests for all of these. This is based on these upstream cls by jcking: #147735 #146556

The layout propagation fails if dpas has an f16 accumulator. This fix resolves the issue by removing the packingSize argument which seems not valid here.

Need to update matching between the original reduced values and their vectorized matches after ordered reduction vectorization to avoid a compiler crash

Assembly files compiled with debug info generate `DW_TAG_label entries` with `DW_AT_low_pc` but no `DW_AT_high_pc` attributes. Without address range information, `dsymutil` would call `addLabelLowPc()` which only records the start address, making the compilation unit appear "empty" with no ranges. This caused dsymutil to discard all debug information including line tables. This patch adds infrastructure to query symbol sizes from the debug map and use them to reconstruct address ranges for assembly labels. rdar://166225328 --------- Co-authored-by: Ryan Mansfield <ryan_mansfield@apple.com>

…ns (#174528) OffloadArch uses an enumerator named `UNUSED`, which is a very common macro name in external codebases (e.g. Mesa defines UNUSED as an attribute helper). If such a macro is visible when including clang/Basic/OffloadArch.h, the preprocessor expands the token inside the enum and breaks compilation of the installed Clang headers. Rename the enumerator to `UNUSED_` and update all in-tree references. This is a spelling-only change (no behavioral impact) and mirrors the existing approach used for SM_32_ to avoid macro clashes.

) Enable and test PointerAuthAuthTraps, which ensures that we trap after an authentication failures.

Add binding attributes to global variables that were created for resources embedded in structs. The binding values are based on `register` annotations and `[[vk::binding]]` attribute on the struct instance. Fixes #182992

This should fix the test on all the builders that don't enable this backend.

…zation out of BuildUDIVPattern. (#187739) Check the type before we call getOperationAction. Give BuildUDIVPattern only AllowWiden and a WideSVT. Update variable names and comments to avoid spreading "64" to too many places.

…ing of BuildSDIV/UDIV. NFCI (#187780) This groups the type and operation legality checks to the beginning. The rest of the code can focus on the transformation.

Missed this when reviewing #186986. This fixes the warnings to follow the [LLVM Coding Standards](https://llvm.org/docs/CodingStandards.html#error-and-warning-messages).

…187562) When targeting arm64e, we enable `-fptrauth-indirect-gotos` by default, which signs label addresses and authenticates indirect branches. Add support (and a test) for this in the LLDB expression evaluator.

…atrix memory layout transformations (#186898) Fixes #184906 The SPIRV and DXIL backends assume matrices are provided in column-major order when lowering matrix transpose and matrix multiplication intrinsics. To support row-major order matrices from Clang/HLSL, we therefore need to convert row-major order matrices into column-major order matrices before applying matrix transpose and multiplication. A conversion from column-major order back to row-major order is also required for correctness after a matrix transpose or matrix multiply. For the matrix transpose case on row-major order matrices, the last two matrix memory layout transforms cancel each other out. So a row-major order matrix transpose is simply a column-major order transpose with the row and column dimensions swapped. For the matrix multiply case, this PR adds helper functions to the MatrixBuilder to convert a NxM row-/column-major order matrix into a NxM column-/row-major order matrix by applying a matrix transpose. These transformations take advantage of the fact that a row-major order matrix of NxM dimensions `rNxM` interpreted in column-major order is equivalent to its transpose in column-major order. Example: Let `r3x2 = [ 0, 1, 2, 3, 4, 5 ]`. The 3x2 row-major order matrix is visualized as ``` 0 1 2 3 4 5 ``` When `r3x2`, or `[ 0, 1, 2, 3, 4, 5 ]` is interpreted as a 2x3 column-major order matrix, it is visualized as: ``` 0 2 4 1 3 5 ``` which is equal to the transpose of `r3x2` but in column-major order. These matrix memory layout transformations are inserted before and after the matrix multiply and transpose intrinsics when lowering HLSL mul and transpose. We don't simplify the matrix multiply case because HLSL in Clang will eventually need to support the `row_major` and `column_major` keywords that allow matrices to independently be row-major or column-major regardless of the default matrix memory layout. While this method of supporting row-major order matrices is not performant, it is correct and will suffice for now until benchmarks are created and performance becomes a primary concern. Assisted-by: GitHub Copilot (powered by Claude Opus 4.6)

RFC https://discourse.llvm.org/t/rfc-bounds-checking-interfaces-for-llvm-libc/87685 Add `constraint_handler_t` type required by Annex K interface in LLVM libc.

…terable (#187339) A crash was encountered in the slice op folder when the input was a constant with dense resource values. The folder was trying to iterate over the input values, which is not possible for resource values. This change fixes the crash and adds a test.

Fixes a breakage from #182155

This patch fixes the update of the DAGNode UnscheduledSucc counter when a use edge is modified. This is the result of a setOperand() or a RAUW (and friends) operation. Before this patch we would not check if the User (i.e., the consumer of the use-def edge) is scheduled and we would update the definition's UnscheduledSucc counter, resulting in counting errors. For example, consider the following IR: ``` %A = ... %B = ... %U = %A ; scheduled ``` Note that %U's DAGNode is marked as "scheduled" while %A and %B are not. If we change %U's operand from %A to %B then we should not attempt to update %A's or %B's UnscheduledSuccs because %U is scheduled so it should not get counted as an "unscheduled" successor.

…7761) Function template specialization arguments were incorrectly rendered without a comma. This was due to the "End" JSON property also being used in the levels above. Mustache looks for missing properties in parent contexts, see #174359.

Fix return types and/or function arguments of several functions: * mtx_destroy * tss_delete * thrd_exit

…closer (#186518) Selecting the score in SGPRInfo used to require an index which you would get by calling a getSgprScoresIdx(), which is defined in a different class. This patch moves the score selection logic into the SGPRinfo. This makes the interface simpler and more intuitive. Also given that SGPRInfo contains only two scores, this patch also replaces the score array with individual score variables. Should be NFC.

…ARGETS (#187634) In our downstream we have a non-runtime target depending on libclc EXTRA_TARGET and then observe a race condition in parallel build: both runtimes-build (full build, no lock) and libclc EXTRA_TARGET (triggered by non-runtime target, FileLock) build concurrently, leading to corrupt libclc library. This exposes an limitation in ExternalProject EXTRA_TARGET design: EXTRA_TARGETS in llvm_ExternalProject_Add only depend on ${name}-configure, not ${name}-build. This makes EXTRA_TARGETS unsafe as dependencies of a non-runtime target.. Fix: Add a locked BUILD_COMMAND to ExternalProject_Add for Unix Makefiles generator, using the same cmake.lock as EXTRA_TARGETS. This serializes runtimes-build with all EXTRA_TARGETS under one lock. With this PR, a non-runtime target can depend on a specific EXTRA_TARGET, rather than needing to depend on the umbrella runtimes target. This is an improvement. A non-runtime target can start as soon as its dependent specific EXTRA_TARGET build finishes, without waiting for other runtimes. In addition, the non-runtime target may only need a specific EXTRA_TARGET and minimal dependency on it is accurate. Assisted-by: Claude Sonnet 4.6 --------- Co-authored-by: Petr Hosek <phosek@google.com>

After #186881 was merged the gcc libc bots started complaining about the conversion from u8 to 2 bit integer being unsafe (see: https://lab.llvm.org/buildbot/#/builders/131/builds/42788). This PR adds a bitmask that fixes the warning.

ro-i and others added 30 commits March 20, 2026 18:59

[offload] Use flang-rt for test feature requirements (#187733)

bc6a265

[CIR][NFC] Minor cleanups to missing feature markers (#187754)

4a5da64

This fixes a few places where MissingFeatures asserts were incorrect, extends the text of two errorNYI diagnostics to disambiguate them, and fixes a typo in an adjacent comment.

[CIR] Address Space support for GlobalOps (#179082)

0ec6e1d

Related: #179278, #160386 Extends cir.global to accept address space attributes. Globals can now specify either `target_address_space(N)` or `lang_address_space(offload_*)`. Address spaces are also preserved throughout get_global ops.

[Bazel] Port a2c0c43

60db764

[llvm] Silence llvm-debuginfod-find/headers-winhttp.test on Windows b…

9431920

…ots temporarily (#187753) Windows bots are still failing after a3db68a and d7dbba5. This test is new, let's take it off while we investigate.

[LSR] Add regression test for unnecessary phi introduction (#187751)

63c9573

Test case for #187728

[RISCV] Fix the pipe used by fmv.x.<fp>/<fp>.x in SiFive7 sched mod…

78b651a

…el (#187740) These FP <-> Integer conversion instructions should use PipeA instead.

[flang][OpenMP] Introduce WithReason<T> for nest/sequence properties (

cfc94a6

#187563) This helper class contains an optional value and a "reason" message. It replaces the uses of std::pair<optional<...>, Reason>. Issue: #185287

Add VDS encoding for gfx13 (#187693)

498dd13

Co-authored-by: Jay Foad <jay.foad@amd.com>

[AMDGPU] Add basic verification for source modifiers (#186733)

dd30239

Source modifiers (input modifiers) should always be immediates. This commit made machine verifier reject non-immediate source modifiers. Closes #182243

[AMDGPU][SIInsertWaitcnts] Add test functions in waitcnt-wcg-attribut…

827ddb2

…es.mir (#186504) This patch adds two more functions for exercising the target-cpu attribute.

[AMDGPU][GlobalISel] Add RegBankLegalize rules for amdgcn.class (#178827

bd3b06b

)

[mlir][acc] Sink constants into acc.compute_region when creating (#18…

66f06f5

…7777) When converting OpenACC compute constructs to acc.compute_region, also sink constants inside so they do not become live-ins.

[MLIR][XeGPU] Fix dpas f16 output layout (#184419)

44c6a0a

The layout propagation fails if dpas has an f16 accumulator. This fix resolves the issue by removing the packingSize argument which seems not valid here.

[SLP]Update values after ordered vectorization

b260861

Need to update matching between the original reduced values and their vectorized matches after ordered reduction vectorization to avoid a compiler crash

[lldb] Support PointerAuthAuthTraps in the expression evaluator (#187612

9b30151

) Enable and test PointerAuthAuthTraps, which ensures that we trap after an authentication failures.

[HLSL] Add binding attributes to resources from structs (#184731)

a99dbc5

Add binding attributes to global variables that were created for resources embedded in structs. The binding values are based on `register` annotations and `[[vk::binding]]` attribute on the struct instance. Fixes #182992

[dsymutil] Require AArch64 backend in asm-line-tables.test (#187797)

2d3b8ce

This should fix the test on all the builders that don't enable this backend.

[TargetLowering] Move the MULH/MUL_LOHI legality checks to the beginn…

343b566

…ing of BuildSDIV/UDIV. NFCI (#187780) This groups the type and operation legality checks to the beginning. The rest of the code can focus on the transformation.

[lldb] Fix warning style for SymStore symbol locator (#187776)

79f3104

Missed this when reviewing #186986. This fixes the warnings to follow the [LLVM Coding Standards](https://llvm.org/docs/CodingStandards.html#error-and-warning-messages).

[libc][annex_k] Add constraint_handler_t. (#163239)

a261548

RFC https://discourse.llvm.org/t/rfc-bounds-checking-interfaces-for-llvm-libc/87685 Add `constraint_handler_t` type required by Annex K interface in LLVM libc.

lhutton1 and others added 10 commits March 20, 2026 22:27

[lldb][bytecode] Fix Update() and failing test (#187795)

7a5431e

Fixes a breakage from #182155

[scudo] Make the default for size/align checks to not die. (#187799)

8cc0124

[libc] Fix function prototypes for <threads.h> C11 header. (#187808)

a60b3a8

Fix return types and/or function arguments of several functions: * mtx_destroy * tss_delete * thrd_exit

[clang] fix error: cannot compile this l-value expression yet (#187755)

335a2d0

[compiler-rt] Add bitmask to fix warning (#187812)

6891a6e

After #186881 was merged the gcc libc bots started complaining about the conversion from u8 to 2 bit integer being unsafe (see: https://lab.llvm.org/buildbot/#/builders/131/builds/42788). This PR adds a bitmask that fixes the warning.

pull bot locked and limited conversation to collaborators Mar 20, 2026

pull bot added the ⤵️ pull label Mar 20, 2026

pull bot merged commit 6891a6e into MPACT-ORG:main Mar 20, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from llvm:main#1148

[pull] main from llvm:main#1148
pull[bot] merged 40 commits intoMPACT-ORG:mainfrom
llvm:main

pull bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

pull bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

pull bot commented Mar 20, 2026 •

edited

Loading