[pull] main from llvm:main by pull[bot] · Pull Request #1147 · MPACT-ORG/llvm-project

pull · 2026-03-20T17:55:05Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Simplify exactly as InstCombine does. A follow-up would include simplifying add x, (sub 0, y) -> sub x, y. Alive2 proof: https://alive2.llvm.org/ce/z/Af7QiD

…uilt libc (#181913) This is to add GPU wrappers for headers that are currently supported by libc built for SPIRV.

Ensure that the analyzer doesn't rule out the equality (or guarantee disequality) of a pointer to the stack and a symbolic pointer in unknown space. Previously the analyzer incorrectly assumed that stack pointers cannot be equal to symbolic pointers in unknown space. It is true that functions cannot validly return pointers to their own stack frame, but they can easily return a pointer to some other stack frame (e.g. a function can return a pointer recieved as an argument). The old behavior was introduced intentionally in 2012 by commit 3563fde, but it causes incorrect analysis, e.g. it prevents the correct handling of some testcases from the Juliet suite because it rules out the "fgets succeeds" branch. Reported-by: Daniel Krupp <daniel.krupp@ericsson.com>

@iBlanket

Corrected language and spelling errors in a comment within file.cpp. Credit GH user @iBlanket for identifying this typo.

…7016) When narrowing interleave groups, the main vector loop processes IC iterations instead of VF * IC. Update selectEpilogueVectorizationFactor to use the effective VF, checking if the canonical IV controlling the loop now steps by UF instead of VFxUF. This avoids epilogue vectorization with dead epilogue vector loops and also prevents crashes in cases where we can prove both the epilogue and scalar loop are dead. Fixes #186846 PR: #187016

…part 36) (#187628) Tests converted from test/Lower/Intrinsics: maxloc.f90, maxval.f90, merge.f90, merge_bits.f90, minloc.f90

By using a native `v_cvt_i16/u16_f16` conversion and saturation at `i16` we avoid additional `f16` to `f32` conversion that is required to perform saturation at `i32`. It also allows to perform clamping using `i16` instructions, reducing number of registers needed in *true16* mode in some of the lit tests. The behavior is disabled for pre-gfx8 targets by checking `has16BitInsts()`.

This is part of patches to port BBAddrMap to COFF. Introduce BBAddrMap.h and move BBAddrMap/PGOAnalysisMap type definitions out of ELFTypes.h.

This patch introduces the following reduction operators: spirv.Tosa.ReduceAll spirv.Tosa.ReduceAny spirv.Tosa.ReduceMax spirv.Tosa.ReduceMin spirv.Tosa.ReduceProduct spirv.Tosa.ReduceSum Also dialect and serialization round-trip tests have been added. Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>

As detailed here: https://github.com/InstLatx64/InstLatX64_Demo/blob/master/GFNI_Demo.h These are a bit more complicated than gf2p8affine look ups, requiring us to convert a SHL shift value / amount into a GF so we can perform a multiplication. SRL/SRA need to be converted to SHL via bitreverse/variable-sign-extension. Followup to #89115

Exposed by 8ccda46.

) When rematerializing S_MOV_B64 or S_MOV_B64_IMM_PSEUDO and only a single 32-bit lane of the result is used at the remat point, emit S_MOV_B32 with the appropriate half of the 64-bit immediate instead. This reduces register pressure by defining a 32-bit register instead of a 64-bit pair when the other half is unused.

Essentially do the same thing as for LoopInfo. Anything inside a cycle is mutually reachable, and the cycle can be replaced by its exit blocks in the walk. An interesting additional thing we could do for CycleInfo (but not LoopInfo) is to early exit the walk if the stop block is not in a cycle and dominates the start block. I've not included this in this patch to keep the implementation the same as for LoopInfo to start with.

This test failed on the llvm-clang-win-x-aarch64 buildbot. It seems the rounding is different, leading to a different output. Instead of: Cost for VF 4: 9 (Estimated cost per lane: 2.2) The windows buildbot it fails because the test output is: Cost for VF 4: 9 (Estimated cost per lane: 2.3)

…box loads (#187152) When a boxed array is privatized via `omp.private`, the `SourceKind` of the loaded box data was being misclassified as `SourceKind::Indirect` by the alias analyzer. Instead its `SourceKind::Allocate` should be preserved. This caused AliasAnalysis to conservatively return `MayAlias` for accesses to privatized arrays vs dummy arguments. This prevented InlineHLFIRAssign from inlining array section assignments. Propagate the Allocate source kind when the box source is classified as `Allocate`, so that alias analysis correctly returns `NoAlias`.

This patch makes ClangIR emit .cir and .mlir files when the-save-temps flag is specified. Having these files emitted is useful e.g. when inspecting the generated code for OpenMP offloading. Co-authored-by: Claude Opus 4.6 noreply@anthropic.com

…case (#187705) Listening on all interfaces is probably not permitted on the bots and causes failures of llvm-debuginfod-find/headers-winhttp.test after 39d6bb2. Restricting them to localhost should fix that.

…lines (#187684) The inherited constructors are inline thunks, so like other inline functions they are exempted from ABI compatibility concerns with this flag, and should not be exported. This is a follow-up to #182706

This PR enhance the multi-reduction layout propagation: 1. improve inst_data and lane_data to support fractional subgroup size 2. improve subgroup_layout/data setup to utilize the (nested) slice layout from consumer op It also removes the restriction in load_matrix/store_matrix layout propagation to allow nd (n>2) layout

As pointed out by #152770 (comment), 81e8a1e causes build errors with older versions of Xcode (Xcode 14 and older) when using std::not_fn() with llvm::make_filter_range(). This implements the same fix as in d1d9413.

) - Invert the condition to make the code more straight and sink single-use variables there. - Add a comment about on `createTargetMachine` side effects for `-mcpu=help`. - Remove redundant call to `setPGOOptions`

…7386)

…rs (#186443) The frozen C++03 headers got an invalid simplification in #134045 that changed the signature of random_shuffle to use a forwarding reference instead of a lvalue reference. This patch fixes it and adds a test. --------- Co-authored-by: Louis Dionne <ldionne.2@gmail.com>

Selects of the form `cond ? 1 : 0` are created during unrolling of setcc+vselect. Currently these are not optimized away post-legalization even if fully redundant. Having these extra selects sitting between things can prevent other folds from applying. Enabling this requires some mitigations in the ARM backend, in particular in the interaction with MVE support. There's two changes here: * Form CSINV/CSNEG/CSINC from CMOV, rather than only creating it during SELECT_CC lowering. (After this change, the lowering in SELECT_CC can be dropped without test changes, let me know if I should do that.) * Support pushing negations through CMOV in more cases, in particular if the operands are constant or the negation can be handled by flipping lshr/ashr. Additionally, in the X86 backend, try to simplify CMOV to SETCC if only the low bit is demanded.

#181725) Based on the suggestions in #140639, this PR adds the rewrite pattern `a bitwiseop (~b +/- c)` -> `a bitwiseop ~(b -/+ c)` for AND, OR, and XOR operations. This rewrite enables lowering to `ANDN`, `ORN`, and `XORN` operations. Added new MIR tests in `combine-binop-neg.mir` for AArch64 to verify the new combine works for various commuted and uncommuted forms with AND, OR, and XOR and added new LLVM IR tests for RISC-V in `rv32zbb-zbkb.ll` to ensure the combine produces the expected `ANDN`, `ORN`, and `XORN` operations.

…#187454) As of AI Usage: This PR is assisted by Claude Closes #187201

…ecialized functions (#187645) For non-specialized functions, ACCSpecializeForDevice collects ACC ops inside compute constructs and applies device specialization patterns via applyOpPatternsGreedily. With the default AnyOp strictness, the greedy driver expands the worklist to parent ops when inner ops are modified, accidentally unwrapping the parent acc.parallel via ACCRegionUnwrapConversion. This leaves orphaned acc.loop combined(parallel) ops that lose their parallelism and reduction information downstream. Set GreedyRewriteStrictness::ExistingOps so the greedy driver only processes the initially collected inner ops, preserving the parent compute construct for ACCComputeLowering to handle.

If the instructions state is alternate and/or contains non-directly matching instructions, need to check if it is better to represent such operations as non-alternate with copyables. To do this, we need to compare operands between the instructions in their different representations and choose the best one for optimal vectorization. Reviewers: RKSimon, hiraditya Pull Request: #183777

- Enable `NoF16PseudoScalarTransInlineConstants` for 11.7. - Add test for `RequiredExportPriority`, one of the differences between 11.5 and 11.7.

…rv-val` output (#182549) KhronosGroup/SPIRV-Tools#6232 added support for `SPV_INTEL_function_pointers` on `spirv-val`. This PR updates some relevant tests to run `spirv-val` and document why some others are failing.

Adds a port for AArch64MIPeepholeOpt - Refactored lib/Target/AArch64/AArch64MIPeepholeOpt.cpp to extract base logic as Impl - Renamed existing pass with "Legacy" suffix and updated references - Added NewPM pass AArch64MIPeepholeOptPass - Updated tests

Fix build with `BUILD_SHARED_LIBS=On`

This adds a SPIR-V intrinsic for associating a name (textual identifier) to a specialisation constant. The name is encoded in metadata, and is intended to be used within LLVM / by the SPIR-V BE (e.g. #134016 would be a direct user), as it is never emitted into the SPIR-V object. Non-boolean and composite specialisation constants will be handled in the future, via dedicated intrinsics, if there is interest.

Can now be used as `REQUIRES: flang-rt`, for example.

…#187262)

…7556) LoopSequence keeps track of whether it contains code that would be an invalid intervening code, or that would prevent loop nesting from being a perfect nesting. To improve the quality of diagnostic messages store the pointer to the offending parser::ExecutionPartConstruct. Issue: #185287

Reverts #184164. Issue hit in testing, LCOMPILER-1587.

…187727) a3db68a seemed t be the obvious fix for the winhttp issue from 39d6bb2 in llvm-debuginfod-find, but there are still bots failing. This patch disables the test on all bots that cannot spawn an HTTP server in Python and record request headers. Ideally it turns all affected bots back to green and gives us an error message to investigate.

…s_poisoned (#187466) Align beg address down instead of up in __asan_region_is_poisoned(), so the shadow scan includes the first granule. This fixes a false negative when first granule has an unpoisoned prefix and poisoned suffix. Add test that covers this scenario.

…ls/clang-ssaf/` This patch extracts the shared code between `clang-ssaf-format` and `clang-ssaf-linker` into a new `clangScalableStaticAnalysisFrameworkTool` library at `clang/lib/ScalableStaticAnalysisFramework/Tool/`, with the public header at `clang/include/clang/ScalableStaticAnalysisFramework/Tool/Utils.h`. This shared library provides: - `fail()` overloads for fatal error reporting - `initTool()` — sets the tool name and version, configures the version printer, hides unrelated command-line options, and parses arguments - `getToolName()` — accessor for the tool name set by `initTool()` - `loadPlugins()` — loads plugin shared libraries from a list of paths - `getFormatForExtension()` — cached format-registry lookup - `SummaryFile` — resolves a file path to its serialization format - `ErrorMessages` — shared diagnostic string constants Tool-specific error strings remain in a `LocalErrorMessages` namespace in each tool's anonymous namespace. Binary names and locations (`bin/clang-ssaf-format`, `bin/clang-ssaf-linker`) are unchanged.

This code is adapted from `SelectionDAG::computeKnownBits` part of #150515 ticks off ABDS & ABDU

@kwyatt-ext

This is the implementation of part of F2023 new feature US 03. Extracting tokens from a string, SPLIT intrinsic. It's section 16.9.196 SPLIT (STRING, SET, POS [, BACK]) of Fortran 2023 Standard. It's part of Flang issue [#178044](#178044). Note that I work with @kwyatt-ext on this issue. He implemented the other part, TOKENIZE. A test will be added into [llvm-test-suite](https://github.com/llvm/llvm-test-suite) later after this PR is merged.

MSVC apparently also warns about deprecation at the implementation of deprecated functions... Pull Request: #187702

When converting from fir.alloca to memref.alloca, also copy the acc variable name attribute if it exists

…f/maximumf (#187647) The reduction recipe init region was producing 0.0 instead of the correct identity value (largest representable float for min, smallest for max) when the reduction operator was AccMinnumf, AccMinimumf, AccMaxnumf, or AccMaximumf. Only AccMin and AccMax were handled, causing the new operator variants to fall through to the else branch which returns 0. This caused min reductions to always produce 0.0 since min(x, 0.0) = 0.0 for all positive x. Replace the duplicated identity value logic with arith::getIdentityValue, using a mapping from acc::ReductionOperator to arith::AtomicRMWKind. Use minimumf/maximumf (which respect useOnlyFiniteValue) instead of minnumf/maxnumf (whose MLIR identity is NaN) to get correct finite identity values. This also fixes a pre-existing bug where the max reduction identity for floats used getSmallest (smallest subnormal, -1.4e-45) instead of getLargest with negative (-3.4e+38).

Document compiler invocation in the compiler output, to aide subsequent regeneration.

…date (#182155) This changes the inputs to `update`. It's now the data stack that was the result of `@init`. This makes `update` more predictable, as its inputs are the same between the first call and the Nth call.

When acc.par_width was introduced in #184864 there was a discussion on whether to use index or create a new type for the output of the operation. It was decided to create a new type; but this means that launch arguments cannot be used directly in the region such as for loop bounds without a conversion from the new type to index. In order to avoid the casting operations (and introduction of an actual operation to do this cast), simply restore acc.par_width to generate index type. This allows its result to be directly used in acc.compute_region.

Patch models ordered reductions as a series of extractelements for the cases which cannot be modeled as unordered reductions. Fixes #50590 Reviewers: RKSimon, hiraditya Pull Request: #182644

Fixes breakage reported here: #187352 (comment)

artagnon and others added 30 commits March 20, 2026 11:58

[LV] Regen induction-ptrcasts test with UTC (NFC) (#187678)

b6accfa

[VPlan] Simplify mul x, -1 -> sub 0, x (#187551)

1dfd268

Simplify exactly as InstCombine does. A follow-up would include simplifying add x, (sub 0, y) -> sub x, y. Alive2 proof: https://alive2.llvm.org/ce/z/Af7QiD

[OFFLOAD] Add GPU wrappers for headers currently supported by SPIRV b…

bdc8d92

…uilt libc (#181913) This is to add GPU wrappers for headers that are currently supported by libc built for SPIRV.

[libc][NFC] Fix typo in file.cpp (#91192) (#187688)

2600c72

Corrected language and spelling errors in a comment within file.cpp. Credit GH user @iBlanket for identifying this typo.

[flang][NFC] Converted five tests from old lowering to new lowering (…

da8d0ab

…part 36) (#187628) Tests converted from test/Lower/Intrinsics: maxloc.f90, maxval.f90, merge.f90, merge_bits.f90, minloc.f90

[NFC][Object] Move BBAddrMap related types to a shared header (#187268)

e3959a9

This is part of patches to port BBAddrMap to COFF. Introduce BBAddrMap.h and move BBAddrMap/PGOAnalysisMap type definitions out of ELFTypes.h.

[AMDGPU] Remove unused forward declaration of GCNSubtarget (#187695)

bd3ba60

[gn] port aa34657

9ab77fa

Exposed by 8ccda46.

[SPIR-V] Fix isAggregateType function implementation (#187685)

7872925

[gn] port 7bf871c

17d2890

[gn] port a021a93

d339d00

[llvm] Restrict llvm-debginfod-find test to localhost to fix winhttp …

a3db68a

…case (#187705) Listening on all interfaces is probably not permitted on the bots and causes failures of llvm-debuginfod-find/headers-winhttp.test after 39d6bb2. Restricting them to localhost should fix that.

[clang][ModulesDriver] Fix build failure with Xcode 14 (#187713)

f58b675

As pointed out by #152770 (comment), 81e8a1e causes build errors with older versions of Xcode (Xcode 14 and older) when using std::not_fn() with llvm::make_filter_range(). This implements the same fix as in d1d9413.

[llc] Flatten SkipModule branch and sink defs to their use(NFC) (#187661

7025821

) - Invert the condition to make the code more straight and sink single-use variables there. - Add a comment about on `createTargetMachine` side effects for `-mcpu=help`. - Remove redundant call to `setPGOOptions`

[ARM] Add a phase ordering test for multiple reductions. NFC

7cc4692

AMDGPU/GlobalISel: RegBankLegalize rules for readlane, writelane (#18…

0506c03

…7386)

osmanyasar05 and others added 27 commits March 20, 2026 15:53

[clang-tidy] Generate valid JSON for characters that require escaping (…

6d45f6d

…#187454) As of AI Usage: This PR is assisted by Claude Closes #187201

[AMDGPU] Update features for gfx1170 (#186107)

93d7583

- Enable `NoF16PseudoScalarTransInlineConstants` for 11.7. - Add test for `RequiredExportPriority`, one of the differences between 11.5 and 11.7.

[SPIR-V] Fix linker error after #187685 (#187722)

e1347d1

Fix build with `BUILD_SHARED_LIBS=On`

[Clang] Fix -Wunused-variable

bf57f91

[offload] Define flang-rt as an available test feature (#187732)

c3e7b45

Can now be used as `REQUIRES: flang-rt`, for example.

[SPIR-V] Fix SPV_INTEL_long_composites continued instruction handling (…

97a1a70

…#187262)

Revert "[AMDGPU] Generate more swaps" (#187723)

18f7e62

Reverts #184164. Issue hit in testing, LCOMPILER-1587.

[GlobalISel] Add G_ABDU and G_ABDS to computeKnownBits. (#186822)

68a9e9c

This code is adapted from `SelectionDAG::computeKnownBits` part of #150515 ticks off ABDS & ABDU

[IR][NFC] Fix MSVC deprecation warnings about BranchInst (#187702)

537a8cc

MSVC apparently also warns about deprecation at the implementation of deprecated functions... Pull Request: #187702

[FIRToMemRef] copy ACC Variable Name attribute (#187724)

965ee6c

When converting from fir.alloca to memref.alloca, also copy the acc variable name attribute if it exists

[lldb][bytecode] Document invocation in compiler output (#187547)

d8e1f50

Document compiler invocation in the compiler output, to aide subsequent regeneration.

[SLP] Initial support for ordered reductions

94e366e

Patch models ordered reductions as a series of extractelements for the cases which cannot be modeled as unordered reductions. Fixes #50590 Reviewers: RKSimon, hiraditya Pull Request: #182644

[clang] fix #187352 breakage on 32-bit platforms (#187741)

eaeca6d

Fixes breakage reported here: #187352 (comment)

pull bot locked and limited conversation to collaborators Mar 20, 2026

pull bot added the ⤵️ pull label Mar 20, 2026

pull bot merged commit eaeca6d into MPACT-ORG:main Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from llvm:main#1147

[pull] main from llvm:main#1147
pull[bot] merged 61 commits intoMPACT-ORG:mainfrom
llvm:main

pull bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

pull bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

pull bot commented Mar 20, 2026 •

edited

Loading