[pull] main from llvm:main#1148
Merged
pull[bot] merged 40 commits intoMPACT-ORG:mainfrom Mar 20, 2026
Merged
Conversation
This fixes a few places where MissingFeatures asserts were incorrect, extends the text of two errorNYI diagnostics to disambiguate them, and fixes a typo in an adjacent comment.
#186261) Summary: This PR changes the handling of the emitted kernels when targeting a CPU to be a pointer struct. The old handling emitted a standard function prototype, this necessitated a target specific ABI to call it because the signature differed with the number of arguments. Instead, this PR emits a void pointer to a naturally aligned struct, this is what APIs like `pthreads` assert. This allows us to remove all the complexity around launching host kernels and just pass the argument list.
…el (#187740) These FP <-> Integer conversion instructions should use PipeA instead.
…n ambiguity notes (#187750) In SemaLookup.cpp, `UnqualUsingDirectiveSet::done()` uses `llvm::sort` with a comparator that only checks the ancestor relationships. So, if there are multiple "neighbor" namespaces, they are considered equal, and thus `llvm::sort` may return the using directives in a non-deterministic order. This was observed as a test failure on clang/test/CXX/drs/cwg0xx.cpp at line 220 after PR #187219 started verifying the diagnostics ordering. The two "candidate found by name lookup" notes were emitted in the opposite order from the test's expectations -- in some builds of Clang, but not others. Switching to `llvm::stable_sort` ensures that using-directives are always traversed in a deterministic order, and thus the notes emitted deterministically.
Co-authored-by: Jay Foad <jay.foad@amd.com>
Source modifiers (input modifiers) should always be immediates. This commit made machine verifier reject non-immediate source modifiers. Closes #182243
…es.mir (#186504) This patch adds two more functions for exercising the target-cpu attribute.
Large memcopies are pretty rare, but are more common in ML workloads (copying large matrixes/tensors, often to/from CPU host). For large copies NTA stores can provide performance advantages for both memcpy itself and the rest of the workload (by reducing cache pollution). Other runtimes already have NTA path for large copies, so add 1 to the llvm-libc. Internal whole-program loadtests shows small, but statistically significant improvement of 0.1%. ML specific bencahmrks showed 10-20% performance gain, and fleetbench (https://github.com/google/fleetbench, which has more up-to-date version of libc benchmarks) shows ~3% gain (ns/byte for distributions taken from various applications). ``` [Memcpy_0]_L1 0.01950n ± 3% 0.01900n ± 5% ~ (p=0.390 n=20) [Memcpy_0]_L2 0.02300n ± 0% 0.02300n ± 0% ~ (p=0.256 n=20) [Memcpy_0]_LLC 0.1335n ± 1% 0.1310n ± 1% -1.87% (p=0.000 n=20) [Memcpy_0]_Cold 0.1540n ± 2% 0.1520n ± 1% -1.30% (p=0.021 n=20) [Memcpy_1]_L1 0.04300n ± 5% 0.04200n ± 2% -2.33% (p=0.000 n=20) [Memcpy_1]_L2 0.05000n ± 2% 0.04800n ± 0% -4.00% (p=0.000 n=20) [Memcpy_1]_LLC 0.2500n ± 2% 0.2390n ± 1% -4.40% (p=0.000 n=20) [Memcpy_1]_Cold 0.2750n ± 1% 0.2640n ± 1% -4.00% (p=0.000 n=20) [Memcpy_2]_L1 0.03800n ± 3% 0.03800n ± 3% ~ (p=0.420 n=20) [Memcpy_2]_L2 0.04400n ± 2% 0.04300n ± 0% -2.27% (p=0.000 n=20) [Memcpy_2]_LLC 0.2320n ± 1% 0.2220n ± 1% -4.31% (p=0.000 n=20) [Memcpy_2]_Cold 0.2565n ± 1% 0.2460n ± 1% -4.09% (p=0.000 n=20) [Memcpy_3]_L1 0.1380n ± 1% 0.1355n ± 2% ~ (p=0.095 n=20) [Memcpy_3]_L2 0.1490n ± 1% 0.1430n ± 1% -4.03% (p=0.000 n=20) [Memcpy_3]_LLC 0.7955n ± 1% 0.7450n ± 0% -6.35% (p=0.000 n=20) [Memcpy_3]_Cold 0.8495n ± 1% 0.7935n ± 0% -6.59% (p=0.000 n=20) [Memcpy_4]_L1 0.04000n ± 3% 0.03900n ± 3% ~ (p=0.466 n=20) [Memcpy_4]_L2 0.04500n ± 2% 0.04400n ± 2% ~ (p=0.130 n=20) [Memcpy_4]_LLC 0.2040n ± 1% 0.1950n ± 1% -4.41% (p=0.000 n=20) [Memcpy_4]_Cold 0.2240n ± 1% 0.2150n ± 1% -4.02% (p=0.000 n=20) [Memcpy_5]_L1 0.05800n ± 3% 0.06050n ± 1% +4.31% (p=0.000 n=20) [Memcpy_5]_L2 0.06400n ± 0% 0.06400n ± 2% 0.00% (p=0.004 n=20) [Memcpy_5]_LLC 0.3320n ± 1% 0.3140n ± 1% -5.42% (p=0.000 n=20) [Memcpy_5]_Cold 0.3620n ± 1% 0.3430n ± 0% -5.25% (p=0.000 n=20) [Memcpy_6]_L1 0.05700n ± 2% 0.05750n ± 3% ~ (p=0.403 n=20) [Memcpy_6]_L2 0.06500n ± 0% 0.06250n ± 1% -3.85% (p=0.000 n=20) [Memcpy_6]_LLC 0.3410n ± 1% 0.3205n ± 1% -6.01% (p=0.000 n=20) [Memcpy_6]_Cold 0.3670n ± 1% 0.3470n ± 1% -5.45% (p=0.000 n=20) [Memcpy_7]_L1 0.05900n ± 2% 0.05900n ± 2% ~ (p=0.296 n=20) [Memcpy_7]_L2 0.06400n ± 2% 0.06400n ± 0% ~ (p=0.327 n=20) [Memcpy_7]_LLC 0.3145n ± 1% 0.2965n ± 1% -5.72% (p=0.000 n=20) [Memcpy_7]_Cold 0.3410n ± 1% 0.3220n ± 0% -5.57% (p=0.000 n=20) [Memcpy_8]_L1 0.03600n ± 3% 0.03600n ± 3% ~ (p=0.804 n=20) [Memcpy_8]_L2 0.04200n ± 0% 0.04100n ± 2% -2.38% (p=0.000 n=20) [Memcpy_8]_LLC 0.2210n ± 1% 0.2090n ± 1% -5.43% (p=0.000 n=20) [Memcpy_8]_Cold 0.2415n ± 1% 0.2300n ± 1% -4.76% (p=0.000 n=20) geomean 0.1184n 0.1148n -3.03% ```
…7777) When converting OpenACC compute constructs to acc.compute_region, also sink constants inside so they do not become live-ins.
Add one new flag, dealloc_align_mismatch that turns on/off alignment checks. Add three new config parameters, one for deallocate type mismatch (such as abort on new/free if true), one for checking if the size parameter matches on dealloc and one for checking if the alignment is correct on a dealloc. Add extra flags to be passed for to indicate to do an align/size check. Update report functions to better indicate the errors. Add unit tests for all of these. This is based on these upstream cls by jcking: #147735 #146556
The layout propagation fails if dpas has an f16 accumulator. This fix resolves the issue by removing the packingSize argument which seems not valid here.
Need to update matching between the original reduced values and their vectorized matches after ordered reduction vectorization to avoid a compiler crash
Assembly files compiled with debug info generate `DW_TAG_label entries` with `DW_AT_low_pc` but no `DW_AT_high_pc` attributes. Without address range information, `dsymutil` would call `addLabelLowPc()` which only records the start address, making the compilation unit appear "empty" with no ranges. This caused dsymutil to discard all debug information including line tables. This patch adds infrastructure to query symbol sizes from the debug map and use them to reconstruct address ranges for assembly labels. rdar://166225328 --------- Co-authored-by: Ryan Mansfield <ryan_mansfield@apple.com>
…ns (#174528) OffloadArch uses an enumerator named `UNUSED`, which is a very common macro name in external codebases (e.g. Mesa defines UNUSED as an attribute helper). If such a macro is visible when including clang/Basic/OffloadArch.h, the preprocessor expands the token inside the enum and breaks compilation of the installed Clang headers. Rename the enumerator to `UNUSED_` and update all in-tree references. This is a spelling-only change (no behavioral impact) and mirrors the existing approach used for SM_32_ to avoid macro clashes.
Add binding attributes to global variables that were created for resources embedded in structs. The binding values are based on `register` annotations and `[[vk::binding]]` attribute on the struct instance. Fixes #182992
This should fix the test on all the builders that don't enable this backend.
…zation out of BuildUDIVPattern. (#187739) Check the type before we call getOperationAction. Give BuildUDIVPattern only AllowWiden and a WideSVT. Update variable names and comments to avoid spreading "64" to too many places.
…ing of BuildSDIV/UDIV. NFCI (#187780) This groups the type and operation legality checks to the beginning. The rest of the code can focus on the transformation.
Missed this when reviewing #186986. This fixes the warnings to follow the [LLVM Coding Standards](https://llvm.org/docs/CodingStandards.html#error-and-warning-messages).
…187562) When targeting arm64e, we enable `-fptrauth-indirect-gotos` by default, which signs label addresses and authenticates indirect branches. Add support (and a test) for this in the LLDB expression evaluator.
…atrix memory layout transformations (#186898) Fixes #184906 The SPIRV and DXIL backends assume matrices are provided in column-major order when lowering matrix transpose and matrix multiplication intrinsics. To support row-major order matrices from Clang/HLSL, we therefore need to convert row-major order matrices into column-major order matrices before applying matrix transpose and multiplication. A conversion from column-major order back to row-major order is also required for correctness after a matrix transpose or matrix multiply. For the matrix transpose case on row-major order matrices, the last two matrix memory layout transforms cancel each other out. So a row-major order matrix transpose is simply a column-major order transpose with the row and column dimensions swapped. For the matrix multiply case, this PR adds helper functions to the MatrixBuilder to convert a NxM row-/column-major order matrix into a NxM column-/row-major order matrix by applying a matrix transpose. These transformations take advantage of the fact that a row-major order matrix of NxM dimensions `rNxM` interpreted in column-major order is equivalent to its transpose in column-major order. Example: Let `r3x2 = [ 0, 1, 2, 3, 4, 5 ]`. The 3x2 row-major order matrix is visualized as ``` 0 1 2 3 4 5 ``` When `r3x2`, or `[ 0, 1, 2, 3, 4, 5 ]` is interpreted as a 2x3 column-major order matrix, it is visualized as: ``` 0 2 4 1 3 5 ``` which is equal to the transpose of `r3x2` but in column-major order. These matrix memory layout transformations are inserted before and after the matrix multiply and transpose intrinsics when lowering HLSL mul and transpose. We don't simplify the matrix multiply case because HLSL in Clang will eventually need to support the `row_major` and `column_major` keywords that allow matrices to independently be row-major or column-major regardless of the default matrix memory layout. While this method of supporting row-major order matrices is not performant, it is correct and will suffice for now until benchmarks are created and performance becomes a primary concern. Assisted-by: GitHub Copilot (powered by Claude Opus 4.6)
RFC https://discourse.llvm.org/t/rfc-bounds-checking-interfaces-for-llvm-libc/87685 Add `constraint_handler_t` type required by Annex K interface in LLVM libc.
…terable (#187339) A crash was encountered in the slice op folder when the input was a constant with dense resource values. The folder was trying to iterate over the input values, which is not possible for resource values. This change fixes the crash and adds a test.
Fixes a breakage from #182155
This patch fixes the update of the DAGNode UnscheduledSucc counter when a use edge is modified. This is the result of a setOperand() or a RAUW (and friends) operation. Before this patch we would not check if the User (i.e., the consumer of the use-def edge) is scheduled and we would update the definition's UnscheduledSucc counter, resulting in counting errors. For example, consider the following IR: ``` %A = ... %B = ... %U = %A ; scheduled ``` Note that %U's DAGNode is marked as "scheduled" while %A and %B are not. If we change %U's operand from %A to %B then we should not attempt to update %A's or %B's UnscheduledSuccs because %U is scheduled so it should not get counted as an "unscheduled" successor.
Fix return types and/or function arguments of several functions: * mtx_destroy * tss_delete * thrd_exit
…closer (#186518) Selecting the score in SGPRInfo used to require an index which you would get by calling a getSgprScoresIdx(), which is defined in a different class. This patch moves the score selection logic into the SGPRinfo. This makes the interface simpler and more intuitive. Also given that SGPRInfo contains only two scores, this patch also replaces the score array with individual score variables. Should be NFC.
…ARGETS (#187634) In our downstream we have a non-runtime target depending on libclc EXTRA_TARGET and then observe a race condition in parallel build: both runtimes-build (full build, no lock) and libclc EXTRA_TARGET (triggered by non-runtime target, FileLock) build concurrently, leading to corrupt libclc library. This exposes an limitation in ExternalProject EXTRA_TARGET design: EXTRA_TARGETS in llvm_ExternalProject_Add only depend on ${name}-configure, not ${name}-build. This makes EXTRA_TARGETS unsafe as dependencies of a non-runtime target.. Fix: Add a locked BUILD_COMMAND to ExternalProject_Add for Unix Makefiles generator, using the same cmake.lock as EXTRA_TARGETS. This serializes runtimes-build with all EXTRA_TARGETS under one lock. With this PR, a non-runtime target can depend on a specific EXTRA_TARGET, rather than needing to depend on the umbrella runtimes target. This is an improvement. A non-runtime target can start as soon as its dependent specific EXTRA_TARGET build finishes, without waiting for other runtimes. In addition, the non-runtime target may only need a specific EXTRA_TARGET and minimal dependency on it is accurate. Assisted-by: Claude Sonnet 4.6 --------- Co-authored-by: Petr Hosek <phosek@google.com>
After #186881 was merged the gcc libc bots started complaining about the conversion from u8 to 2 bit integer being unsafe (see: https://lab.llvm.org/buildbot/#/builders/131/builds/42788). This PR adds a bitmask that fixes the warning.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )