Skip to content

[pull] main from llvm:main#1148

Merged
pull[bot] merged 40 commits intoMPACT-ORG:mainfrom
llvm:main
Mar 20, 2026
Merged

[pull] main from llvm:main#1148
pull[bot] merged 40 commits intoMPACT-ORG:mainfrom
llvm:main

Conversation

@pull
Copy link

@pull pull bot commented Mar 20, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

ro-i and others added 30 commits March 20, 2026 18:59
This fixes a few places where MissingFeatures asserts were incorrect,
extends the text of two errorNYI diagnostics to disambiguate them, and
fixes a typo in an adjacent comment.
Related: #179278,
#160386

Extends cir.global to accept address space attributes. Globals can now
specify either `target_address_space(N)` or
`lang_address_space(offload_*)`. Address spaces are also preserved
throughout get_global ops.
#186261)

Summary:
This PR changes the handling of the emitted kernels when targeting a CPU
to be a pointer struct.

The old handling emitted a standard function prototype, this
necessitated a target specific ABI to call it because the signature
differed with the number of arguments. Instead, this PR emits a void
pointer to a naturally aligned struct, this is what APIs like `pthreads`
assert.

This allows us to remove all the complexity around launching host
kernels and just pass the argument list.
…ots temporarily (#187753)

Windows bots are still failing after a3db68a and
d7dbba5. This test is new, let's take it off while
we investigate.
…el (#187740)

These FP <-> Integer conversion instructions should use PipeA instead.
#187563)

This helper class contains an optional value and a "reason" message. It
replaces the uses of std::pair<optional<...>, Reason>.

Issue: #185287
…n ambiguity notes (#187750)

In SemaLookup.cpp, `UnqualUsingDirectiveSet::done()` uses `llvm::sort`
with a comparator that only checks the ancestor relationships. So, if
there are multiple "neighbor" namespaces, they are considered equal, and
thus `llvm::sort` may return the using directives in a non-deterministic
order.

This was observed as a test failure on clang/test/CXX/drs/cwg0xx.cpp at
line 220 after PR #187219 started verifying the diagnostics ordering.
The two "candidate found by name lookup" notes were emitted in the
opposite order from the test's expectations -- in some builds of Clang,
but not others.

Switching to `llvm::stable_sort` ensures that using-directives are
always traversed in a deterministic order, and thus the notes emitted
deterministically.
Co-authored-by: Jay Foad <jay.foad@amd.com>
Source modifiers (input modifiers) should always be immediates.
This commit made machine verifier reject non-immediate source modifiers.

Closes #182243
…es.mir (#186504)

This patch adds two more functions for exercising the target-cpu
attribute.
Large memcopies are pretty rare, but are more common in ML workloads
(copying large matrixes/tensors, often to/from CPU host).

For large copies NTA stores can provide performance advantages for both
memcpy itself and the rest of the workload (by reducing cache
pollution). Other runtimes already have NTA path for large copies, so
add 1 to the llvm-libc.

Internal whole-program loadtests shows small, but statistically
significant improvement of 0.1%. ML specific bencahmrks showed 10-20%
performance gain, and fleetbench (https://github.com/google/fleetbench,
which has more up-to-date version of libc benchmarks) shows ~3% gain
(ns/byte for distributions taken from various applications).

```
[Memcpy_0]_L1      0.01950n ± 3%   0.01900n ± 5%       ~ (p=0.390 n=20)
[Memcpy_0]_L2      0.02300n ± 0%   0.02300n ± 0%       ~ (p=0.256 n=20)
[Memcpy_0]_LLC     0.1335n ± 1%    0.1310n ± 1%   -1.87% (p=0.000 n=20)
[Memcpy_0]_Cold    0.1540n ± 2%    0.1520n ± 1%   -1.30% (p=0.021 n=20)
[Memcpy_1]_L1      0.04300n ± 5%   0.04200n ± 2%  -2.33% (p=0.000 n=20)
[Memcpy_1]_L2      0.05000n ± 2%   0.04800n ± 0%  -4.00% (p=0.000 n=20)
[Memcpy_1]_LLC     0.2500n ± 2%    0.2390n ± 1%   -4.40% (p=0.000 n=20)
[Memcpy_1]_Cold    0.2750n ± 1%    0.2640n ± 1%   -4.00% (p=0.000 n=20)
[Memcpy_2]_L1      0.03800n ± 3%   0.03800n ± 3%       ~ (p=0.420 n=20)
[Memcpy_2]_L2      0.04400n ± 2%   0.04300n ± 0%  -2.27% (p=0.000 n=20)
[Memcpy_2]_LLC     0.2320n ± 1%    0.2220n ± 1%   -4.31% (p=0.000 n=20)
[Memcpy_2]_Cold    0.2565n ± 1%    0.2460n ± 1%   -4.09% (p=0.000 n=20)
[Memcpy_3]_L1      0.1380n ± 1%    0.1355n ± 2%        ~ (p=0.095 n=20)
[Memcpy_3]_L2      0.1490n ± 1%    0.1430n ± 1%   -4.03% (p=0.000 n=20)
[Memcpy_3]_LLC     0.7955n ± 1%    0.7450n ± 0%   -6.35% (p=0.000 n=20)
[Memcpy_3]_Cold    0.8495n ± 1%    0.7935n ± 0%   -6.59% (p=0.000 n=20)
[Memcpy_4]_L1      0.04000n ± 3%   0.03900n ± 3%       ~ (p=0.466 n=20)
[Memcpy_4]_L2      0.04500n ± 2%   0.04400n ± 2%       ~ (p=0.130 n=20)
[Memcpy_4]_LLC     0.2040n ± 1%    0.1950n ± 1%   -4.41% (p=0.000 n=20)
[Memcpy_4]_Cold    0.2240n ± 1%    0.2150n ± 1%   -4.02% (p=0.000 n=20)
[Memcpy_5]_L1      0.05800n ± 3%   0.06050n ± 1%  +4.31% (p=0.000 n=20)
[Memcpy_5]_L2      0.06400n ± 0%   0.06400n ± 2%   0.00% (p=0.004 n=20)
[Memcpy_5]_LLC     0.3320n ± 1%    0.3140n ± 1%   -5.42% (p=0.000 n=20)
[Memcpy_5]_Cold    0.3620n ± 1%    0.3430n ± 0%   -5.25% (p=0.000 n=20)
[Memcpy_6]_L1      0.05700n ± 2%   0.05750n ± 3%       ~ (p=0.403 n=20)
[Memcpy_6]_L2      0.06500n ± 0%   0.06250n ± 1%  -3.85% (p=0.000 n=20)
[Memcpy_6]_LLC     0.3410n ± 1%    0.3205n ± 1%   -6.01% (p=0.000 n=20)
[Memcpy_6]_Cold    0.3670n ± 1%    0.3470n ± 1%   -5.45% (p=0.000 n=20)
[Memcpy_7]_L1      0.05900n ± 2%   0.05900n ± 2%       ~ (p=0.296 n=20)
[Memcpy_7]_L2      0.06400n ± 2%   0.06400n ± 0%       ~ (p=0.327 n=20)
[Memcpy_7]_LLC     0.3145n ± 1%    0.2965n ± 1%   -5.72% (p=0.000 n=20)
[Memcpy_7]_Cold    0.3410n ± 1%    0.3220n ± 0%   -5.57% (p=0.000 n=20)
[Memcpy_8]_L1      0.03600n ± 3%   0.03600n ± 3%       ~ (p=0.804 n=20)
[Memcpy_8]_L2      0.04200n ± 0%   0.04100n ± 2%  -2.38% (p=0.000 n=20)
[Memcpy_8]_LLC     0.2210n ± 1%    0.2090n ± 1%   -5.43% (p=0.000 n=20)
[Memcpy_8]_Cold    0.2415n ± 1%    0.2300n ± 1%   -4.76% (p=0.000 n=20)
geomean            0.1184n         0.1148n        -3.03%
```
…7777)

When converting OpenACC compute constructs to acc.compute_region, also
sink constants inside so they do not become live-ins.
Add one new flag, dealloc_align_mismatch that turns on/off alignment
checks. Add three new config parameters, one for deallocate type
mismatch (such as abort on new/free if true), one for checking if the
size parameter matches on dealloc and one for checking if the alignment
is correct on a dealloc.

Add extra flags to be passed for to indicate to do an align/size check.

Update report functions to better indicate the errors. Add unit tests
for all of these.

This is based on these upstream cls by jcking:

#147735
#146556
The layout propagation fails if dpas has an f16 accumulator. This fix
resolves the issue by removing the packingSize argument which seems not
valid here.
Need to update matching between the original reduced values and their
vectorized matches after ordered reduction vectorization to avoid
a compiler crash
Assembly files compiled with debug info generate `DW_TAG_label entries`
with `DW_AT_low_pc` but no `DW_AT_high_pc` attributes. Without address
range information, `dsymutil` would call `addLabelLowPc()` which only
records the start address, making the compilation unit appear "empty"
with no ranges. This caused dsymutil to discard all debug information
including line tables.

This patch adds infrastructure to query symbol sizes from the debug map
and use them to reconstruct address ranges for assembly labels.

rdar://166225328

---------

Co-authored-by: Ryan Mansfield <ryan_mansfield@apple.com>
…ns (#174528)

OffloadArch uses an enumerator named `UNUSED`, which is a very common
macro name in external codebases (e.g. Mesa defines UNUSED as an
attribute helper). If such a macro is visible when including
clang/Basic/OffloadArch.h, the preprocessor expands the token inside the
enum and breaks compilation of the installed Clang headers.

Rename the enumerator to `UNUSED_` and update all in-tree references.
This is a spelling-only change (no behavioral impact) and mirrors the
existing approach used for SM_32_ to avoid macro clashes.
)

Enable and test PointerAuthAuthTraps, which ensures that we trap after
an authentication failures.
Add binding attributes to global variables that were created for resources embedded in structs. The binding values are based on `register` annotations and `[[vk::binding]]` attribute on the struct instance.

Fixes #182992
This should fix the test on all the builders that don't enable this backend.
…zation out of BuildUDIVPattern. (#187739)

Check the type before we call getOperationAction. Give BuildUDIVPattern
only AllowWiden and a WideSVT.

Update variable names and comments to avoid spreading "64" to too many
places.
…ing of BuildSDIV/UDIV. NFCI (#187780)

This groups the type and operation legality checks to the beginning. The
rest of the code can focus on the transformation.
Missed this when reviewing #186986. This fixes the warnings to follow
the [LLVM Coding
Standards](https://llvm.org/docs/CodingStandards.html#error-and-warning-messages).
…187562)

When targeting arm64e, we enable `-fptrauth-indirect-gotos` by default,
which signs label addresses and authenticates indirect branches. Add
support (and a test) for this in the LLDB expression evaluator.
…atrix memory layout transformations (#186898)

Fixes #184906

The SPIRV and DXIL backends assume matrices are provided in column-major
order when lowering matrix transpose and matrix multiplication
intrinsics.

To support row-major order matrices from Clang/HLSL, we therefore need
to convert row-major order matrices into column-major order matrices
before applying matrix transpose and multiplication. A conversion from
column-major order back to row-major order is also required for
correctness after a matrix transpose or matrix multiply.

For the matrix transpose case on row-major order matrices, the last two
matrix memory layout transforms cancel each other out. So a row-major
order matrix transpose is simply a column-major order transpose with the
row and column dimensions swapped.

For the matrix multiply case, this PR adds helper functions to the
MatrixBuilder to convert a NxM row-/column-major order matrix into a NxM
column-/row-major order matrix by applying a matrix transpose.

These transformations take advantage of the fact that a row-major order
matrix of NxM dimensions `rNxM` interpreted in column-major order is
equivalent to its transpose in column-major order.

Example: Let `r3x2 = [ 0, 1, 2, 3, 4, 5 ]`. The 3x2 row-major order
matrix is visualized as
```
0 1
2 3
4 5
```
When `r3x2`, or `[ 0, 1, 2, 3, 4, 5 ]` is interpreted as a 2x3
column-major order matrix, it is visualized as:
```
0 2 4
1 3 5
```
which is equal to the transpose of `r3x2` but in column-major order.

These matrix memory layout transformations are inserted before and after
the matrix multiply and transpose intrinsics when lowering HLSL mul and
transpose.

We don't simplify the matrix multiply case because HLSL in Clang will
eventually need to support the `row_major` and `column_major` keywords
that allow matrices to independently be row-major or column-major
regardless of the default matrix memory layout.

While this method of supporting row-major order matrices is not
performant, it is correct and will suffice for now until benchmarks are
created and performance becomes a primary concern.

Assisted-by: GitHub Copilot (powered by Claude Opus 4.6)
lhutton1 and others added 10 commits March 20, 2026 22:27
…terable (#187339)

A crash was encountered in the slice op folder when the input was a
constant with dense resource values. The folder was trying to iterate
over the input values, which is not possible for resource values. This
change fixes the crash and adds a test.
This patch fixes the update of the DAGNode UnscheduledSucc counter when
a use edge is modified. This is the result of a setOperand() or a RAUW
(and friends) operation.

Before this patch we would not check if the User (i.e., the consumer of
the use-def edge) is scheduled and we would update the definition's
UnscheduledSucc counter, resulting in counting errors.

For example, consider the following IR:
```
  %A = ...
  %B = ...
  %U = %A  ; scheduled
```
Note that %U's DAGNode is marked as "scheduled" while %A and %B are not.

If we change %U's operand from %A to %B then we should not attempt to
update %A's or %B's UnscheduledSuccs because %U is scheduled so it
should not get counted as an "unscheduled" successor.
…7761)

Function template specialization arguments were incorrectly rendered
without a comma. This was due to the "End" JSON property also being
used in the levels above. Mustache looks for missing properties in
parent contexts, see #174359.
Fix return types and/or function arguments of several functions:
* mtx_destroy
* tss_delete
* thrd_exit
…closer (#186518)

Selecting the score in SGPRInfo used to require an index which you would
get by calling a getSgprScoresIdx(), which is defined in a different
class.

This patch moves the score selection logic into the SGPRinfo. This makes
the interface simpler and more intuitive.

Also given that SGPRInfo contains only two scores, this patch also
replaces the score array with individual score variables.

Should be NFC.
…ARGETS (#187634)

In our downstream we have a non-runtime target depending on libclc
EXTRA_TARGET and then observe a race condition in parallel build: both
runtimes-build (full build, no lock) and libclc EXTRA_TARGET (triggered
by non-runtime target, FileLock) build concurrently, leading to corrupt
libclc library.

This exposes an limitation in ExternalProject EXTRA_TARGET design:
EXTRA_TARGETS in llvm_ExternalProject_Add only depend on
${name}-configure, not ${name}-build. This makes EXTRA_TARGETS unsafe as
dependencies of a non-runtime target..

Fix: Add a locked BUILD_COMMAND to ExternalProject_Add for Unix
Makefiles generator, using the same cmake.lock as EXTRA_TARGETS. This
serializes runtimes-build with all EXTRA_TARGETS under one lock.

With this PR, a non-runtime target can depend on a specific
EXTRA_TARGET, rather than needing to depend on the umbrella runtimes
target. This is an improvement. A non-runtime target can start as soon
as its dependent specific EXTRA_TARGET build finishes, without waiting
for other runtimes. In addition, the non-runtime target may only need a
specific EXTRA_TARGET and minimal dependency on it is accurate.

Assisted-by: Claude Sonnet 4.6

---------

Co-authored-by: Petr Hosek <phosek@google.com>
After #186881 was merged the gcc libc bots started complaining about the
conversion from u8 to 2 bit integer being unsafe (see:
https://lab.llvm.org/buildbot/#/builders/131/builds/42788). This PR
adds a bitmask that fixes the warning.
@pull pull bot locked and limited conversation to collaborators Mar 20, 2026
@pull pull bot added the ⤵️ pull label Mar 20, 2026
@pull pull bot merged commit 6891a6e into MPACT-ORG:main Mar 20, 2026
1 check failed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.