Skip to content

[pull] main from llvm:main#1147

Merged
pull[bot] merged 61 commits intoMPACT-ORG:mainfrom
llvm:main
Mar 20, 2026
Merged

[pull] main from llvm:main#1147
pull[bot] merged 61 commits intoMPACT-ORG:mainfrom
llvm:main

Conversation

@pull
Copy link

@pull pull bot commented Mar 20, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

artagnon and others added 30 commits March 20, 2026 11:58
Simplify exactly as InstCombine does. A follow-up would include
simplifying add x, (sub 0, y) -> sub x, y.

Alive2 proof: https://alive2.llvm.org/ce/z/Af7QiD
…uilt libc (#181913)

This is to add GPU wrappers for headers that are currently supported by
libc built for SPIRV.
Ensure that the analyzer doesn't rule out the equality (or guarantee
disequality) of a pointer to the stack and a symbolic pointer in unknown
space. Previously the analyzer incorrectly assumed that stack pointers
cannot be equal to symbolic pointers in unknown space.

It is true that functions cannot validly return pointers to their own
stack frame, but they can easily return a pointer to some other stack
frame (e.g. a function can return a pointer recieved as an argument).

The old behavior was introduced intentionally in 2012 by commit
3563fde, but it causes incorrect
analysis, e.g. it prevents the correct handling of some testcases from
the Juliet suite because it rules out the "fgets succeeds" branch.

Reported-by: Daniel Krupp <daniel.krupp@ericsson.com>
Corrected language and spelling errors in a comment within file.cpp.

Credit GH user @iBlanket for identifying this typo.
…7016)

When narrowing interleave groups, the main vector loop processes IC
iterations instead of VF * IC. Update selectEpilogueVectorizationFactor
to use the effective VF, checking if the canonical IV controlling the
loop now steps by UF instead of VFxUF.

This avoids epilogue vectorization with dead epilogue vector loops and
also prevents crashes in cases where we can prove both the epilogue and
scalar loop are dead.

Fixes #186846

PR: #187016
…part 36) (#187628)

Tests converted from test/Lower/Intrinsics: maxloc.f90, maxval.f90,
merge.f90, merge_bits.f90, minloc.f90
By using a native `v_cvt_i16/u16_f16` conversion and saturation at `i16`
we avoid additional `f16` to `f32` conversion that is required to
perform saturation at `i32`. It also allows to perform clamping using
`i16` instructions, reducing number of registers needed in *true16* mode
in some of the lit tests. The behavior is disabled for pre-gfx8 targets
by checking `has16BitInsts()`.
This is part of patches to port BBAddrMap to COFF.

Introduce BBAddrMap.h and move BBAddrMap/PGOAnalysisMap type definitions
out of ELFTypes.h.
This patch introduces the following reduction operators:

spirv.Tosa.ReduceAll
spirv.Tosa.ReduceAny
spirv.Tosa.ReduceMax
spirv.Tosa.ReduceMin
spirv.Tosa.ReduceProduct
spirv.Tosa.ReduceSum

Also dialect and serialization round-trip tests have been added.

Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
As detailed here:
https://github.com/InstLatx64/InstLatX64_Demo/blob/master/GFNI_Demo.h

These are a bit more complicated than gf2p8affine look ups, requiring us
to convert a SHL shift value / amount into a GF so we can perform a
multiplication. SRL/SRA need to be converted to SHL via
bitreverse/variable-sign-extension.

Followup to #89115
)

When rematerializing S_MOV_B64 or S_MOV_B64_IMM_PSEUDO and only a single
32-bit lane of the result is used at the remat point, emit S_MOV_B32
with the appropriate half of the 64-bit immediate instead.

This reduces register pressure by defining a 32-bit register instead of
a 64-bit pair when the other half is unused.
Essentially do the same thing as for LoopInfo. Anything inside a cycle
is mutually reachable, and the cycle can be replaced by its exit blocks
in the walk.

An interesting additional thing we could do for CycleInfo (but not
LoopInfo) is to early exit the walk if the stop block is not in a cycle
and dominates the start block. I've not included this in this patch to
keep the implementation the same as for LoopInfo to start with.
This test failed on the llvm-clang-win-x-aarch64 buildbot.

It seems the rounding is different, leading to a different output.
Instead of:
  Cost for VF 4: 9 (Estimated cost per lane: 2.2)

The windows buildbot it fails because the test output is:
  Cost for VF 4: 9 (Estimated cost per lane: 2.3)
…box loads (#187152)

When a boxed array is privatized via `omp.private`, the `SourceKind` of
the loaded box data was being misclassified as `SourceKind::Indirect` by
the alias analyzer. Instead its `SourceKind::Allocate` should be
preserved. This caused AliasAnalysis to conservatively return `MayAlias`
for accesses to privatized arrays vs dummy arguments. This prevented
InlineHLFIRAssign from inlining array section assignments.

Propagate the Allocate source kind when the box source is classified as
`Allocate`, so that alias analysis correctly returns `NoAlias`.
This patch makes ClangIR emit .cir and .mlir files when the-save-temps
flag is specified. Having these files emitted is useful e.g. when
inspecting the generated code for OpenMP offloading.

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com
…case (#187705)

Listening on all interfaces is probably not permitted on the bots and
causes failures of llvm-debuginfod-find/headers-winhttp.test after
39d6bb2. Restricting them to localhost
should fix that.
…lines (#187684)

The inherited constructors are inline thunks, so like other inline
functions they are exempted from ABI compatibility concerns with this
flag, and should not be exported.

This is a follow-up to #182706
This PR enhance the multi-reduction layout propagation: 
1. improve inst_data and lane_data to support fractional subgroup size
2. improve subgroup_layout/data setup to utilize the (nested) slice
layout from consumer op

It also removes the restriction in load_matrix/store_matrix layout
propagation to allow nd (n>2) layout
As pointed out by
#152770 (comment),
81e8a1e causes build errors with older versions of Xcode (Xcode 14 and
older) when using std::not_fn() with llvm::make_filter_range().

This implements the same fix as in d1d9413.
)

- Invert the condition to make the code more straight and sink
single-use variables there.
- Add a comment about on `createTargetMachine` side effects for
`-mcpu=help`.
- Remove redundant call to `setPGOOptions`
…rs (#186443)

The frozen C++03 headers got an invalid simplification in #134045 that
changed the signature of random_shuffle to use a forwarding reference
instead of a lvalue reference. This patch fixes it and adds a test.

---------

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
Selects of the form `cond ? 1 : 0` are created during unrolling of
setcc+vselect. Currently these are not optimized away post-legalization
even if fully redundant. Having these extra selects sitting between
things can prevent other folds from applying.

Enabling this requires some mitigations in the ARM backend, in
particular in the interaction with MVE support. There's two changes
here:

* Form CSINV/CSNEG/CSINC from CMOV, rather than only creating it during
SELECT_CC lowering. (After this change, the lowering in SELECT_CC can be
dropped without test changes, let me know if I should do that.)
* Support pushing negations through CMOV in more cases, in particular if
the operands are constant or the negation can be handled by flipping
lshr/ashr.

Additionally, in the X86 backend, try to simplify CMOV to SETCC if only the
low bit is demanded.
osmanyasar05 and others added 27 commits March 20, 2026 15:53
#181725)

Based on the suggestions in #140639, this PR adds the rewrite pattern `a
bitwiseop (~b +/- c)` -> `a bitwiseop ~(b -/+ c)` for AND, OR, and XOR
operations. This rewrite enables lowering to `ANDN`, `ORN`, and `XORN`
operations.

Added new MIR tests in `combine-binop-neg.mir` for AArch64 to verify the
new combine works for various commuted and uncommuted forms with AND,
OR, and XOR and added new LLVM IR tests for RISC-V in `rv32zbb-zbkb.ll`
to ensure the combine produces the expected `ANDN`, `ORN`, and `XORN`
operations.
…ecialized functions (#187645)

For non-specialized functions, ACCSpecializeForDevice collects ACC ops
inside compute constructs and applies device specialization patterns via
applyOpPatternsGreedily. With the default AnyOp strictness, the greedy
driver expands the worklist to parent ops when inner ops are modified,
accidentally unwrapping the parent acc.parallel via
ACCRegionUnwrapConversion. This leaves orphaned acc.loop
combined(parallel) ops that lose their parallelism and reduction
information downstream.

Set GreedyRewriteStrictness::ExistingOps so the greedy driver only
processes the initially collected inner ops, preserving the parent
compute construct for ACCComputeLowering to handle.
If the instructions state is alternate and/or contains non-directly
matching instructions, need to check if it is better to represent such
operations as non-alternate with copyables.
To do this, we need to compare operands between the instructions in their
different representations and choose the best one for optimal
vectorization.

Reviewers: RKSimon, hiraditya

Pull Request: #183777
- Enable `NoF16PseudoScalarTransInlineConstants` for 11.7.
- Add test for `RequiredExportPriority`, one of the differences between
11.5 and 11.7.
…rv-val` output (#182549)

KhronosGroup/SPIRV-Tools#6232 added support for
`SPV_INTEL_function_pointers` on `spirv-val`.

This PR updates some relevant tests to run `spirv-val` and document why
some others are failing.
Adds a port for AArch64MIPeepholeOpt

- Refactored lib/Target/AArch64/AArch64MIPeepholeOpt.cpp to extract base
logic as Impl
- Renamed existing pass with "Legacy" suffix and updated references
- Added NewPM pass AArch64MIPeepholeOptPass
- Updated tests
Fix build with `BUILD_SHARED_LIBS=On`
This adds a SPIR-V intrinsic for associating a name (textual identifier)
to a specialisation constant. The name is encoded in metadata, and is
intended to be used within LLVM / by the SPIR-V BE (e.g. #134016 would
be a direct user), as it is never emitted into the SPIR-V object.
Non-boolean and composite specialisation constants will be handled in
the future, via dedicated intrinsics, if there is interest.
Can now be used as `REQUIRES: flang-rt`, for example.
…7556)

LoopSequence keeps track of whether it contains code that would be an
invalid intervening code, or that would prevent loop nesting from being
a perfect nesting. To improve the quality of diagnostic messages store
the pointer to the offending parser::ExecutionPartConstruct.

Issue: #185287
Reverts #184164. Issue hit in testing, LCOMPILER-1587.
…187727)

a3db68a seemed t be the obvious fix for
the winhttp issue from 39d6bb2 in
llvm-debuginfod-find, but there are still bots failing. This patch
disables the test on all bots that cannot spawn an HTTP server in Python
and record request headers. Ideally it turns all affected bots back to
green and gives us an error message to investigate.
…s_poisoned (#187466)

Align beg address down instead of up in __asan_region_is_poisoned(), so
the shadow scan includes the first granule. This fixes a false negative
when first granule has an unpoisoned prefix and poisoned suffix.

Add test that covers this scenario.
…ls/clang-ssaf/`

This patch extracts the shared code between `clang-ssaf-format` and
`clang-ssaf-linker` into a new
`clangScalableStaticAnalysisFrameworkTool` library at
`clang/lib/ScalableStaticAnalysisFramework/Tool/`, with the public
header at
`clang/include/clang/ScalableStaticAnalysisFramework/Tool/Utils.h`. This
shared library provides:
  - `fail()` overloads for fatal error reporting
- `initTool()` — sets the tool name and version, configures the version
printer, hides unrelated command-line options, and parses arguments
  - `getToolName()` — accessor for the tool name set by `initTool()`
  - `loadPlugins()` — loads plugin shared libraries from a list of paths
  - `getFormatForExtension()` — cached format-registry lookup
  - `SummaryFile` — resolves a file path to its serialization format
  - `ErrorMessages` — shared diagnostic string constants

Tool-specific error strings remain in a `LocalErrorMessages` namespace
in each tool's anonymous namespace.

Binary names and locations (`bin/clang-ssaf-format`,
`bin/clang-ssaf-linker`) are unchanged.
This code is adapted from `SelectionDAG::computeKnownBits`
part of #150515 
ticks off ABDS & ABDU
This is the implementation of part of F2023 new feature US 03.
Extracting tokens from a string, SPLIT intrinsic.

It's section 16.9.196 SPLIT (STRING, SET, POS [, BACK]) of Fortran 2023
Standard.

It's part of Flang issue
[#178044](#178044). Note that
I work with @kwyatt-ext on this issue. He implemented the other part,
TOKENIZE.

A test will be added into
[llvm-test-suite](https://github.com/llvm/llvm-test-suite) later after
this PR is merged.
MSVC apparently also warns about deprecation at the implementation
of deprecated functions...

Pull Request: #187702
When converting from fir.alloca to memref.alloca, also copy the acc
variable name attribute if it exists
…f/maximumf (#187647)

The reduction recipe init region was producing 0.0 instead of the
correct identity value (largest representable float for min, smallest
for max) when the reduction operator was AccMinnumf, AccMinimumf,
AccMaxnumf, or AccMaximumf. Only AccMin and AccMax were handled,
causing the new operator variants to fall through to the else branch
which returns 0.

This caused min reductions to always produce 0.0 since
min(x, 0.0) = 0.0 for all positive x.

Replace the duplicated identity value logic with
arith::getIdentityValue, using a mapping from acc::ReductionOperator
to arith::AtomicRMWKind. Use minimumf/maximumf (which respect
useOnlyFiniteValue) instead of minnumf/maxnumf (whose MLIR identity
is NaN) to get correct finite identity values.

This also fixes a pre-existing bug where the max reduction identity
for floats used getSmallest (smallest subnormal, -1.4e-45) instead
of getLargest with negative (-3.4e+38).
Document compiler invocation in the compiler output, to aide subsequent
regeneration.
…date (#182155)

This changes the inputs to `update`. It's now the data stack that was
the result of `@init`. This makes `update` more predictable, as its
inputs are the same between the first call and the Nth call.
When acc.par_width was introduced in
#184864
there was a discussion on whether to use index or create a new type for
the output of the operation. It was decided to create a new type; but
this means that launch arguments cannot be used directly in the region
such as for loop bounds without a conversion from the new type to index.
In order to avoid the casting operations (and introduction of an actual
operation to do this cast), simply restore acc.par_width to generate
index type. This allows its result to be directly used in
acc.compute_region.
Patch models ordered reductions as a series of extractelements for the
cases which cannot be modeled as unordered reductions.

Fixes #50590

Reviewers: RKSimon, hiraditya

Pull Request: #182644
@pull pull bot locked and limited conversation to collaborators Mar 20, 2026
@pull pull bot added the ⤵️ pull label Mar 20, 2026
@pull pull bot merged commit eaeca6d into MPACT-ORG:main Mar 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.