[pull] main from llvm:main#1147
Merged
pull[bot] merged 61 commits intoMPACT-ORG:mainfrom Mar 20, 2026
Merged
Conversation
Simplify exactly as InstCombine does. A follow-up would include simplifying add x, (sub 0, y) -> sub x, y. Alive2 proof: https://alive2.llvm.org/ce/z/Af7QiD
…uilt libc (#181913) This is to add GPU wrappers for headers that are currently supported by libc built for SPIRV.
Ensure that the analyzer doesn't rule out the equality (or guarantee disequality) of a pointer to the stack and a symbolic pointer in unknown space. Previously the analyzer incorrectly assumed that stack pointers cannot be equal to symbolic pointers in unknown space. It is true that functions cannot validly return pointers to their own stack frame, but they can easily return a pointer to some other stack frame (e.g. a function can return a pointer recieved as an argument). The old behavior was introduced intentionally in 2012 by commit 3563fde, but it causes incorrect analysis, e.g. it prevents the correct handling of some testcases from the Juliet suite because it rules out the "fgets succeeds" branch. Reported-by: Daniel Krupp <daniel.krupp@ericsson.com>
…7016) When narrowing interleave groups, the main vector loop processes IC iterations instead of VF * IC. Update selectEpilogueVectorizationFactor to use the effective VF, checking if the canonical IV controlling the loop now steps by UF instead of VFxUF. This avoids epilogue vectorization with dead epilogue vector loops and also prevents crashes in cases where we can prove both the epilogue and scalar loop are dead. Fixes #186846 PR: #187016
…part 36) (#187628) Tests converted from test/Lower/Intrinsics: maxloc.f90, maxval.f90, merge.f90, merge_bits.f90, minloc.f90
By using a native `v_cvt_i16/u16_f16` conversion and saturation at `i16` we avoid additional `f16` to `f32` conversion that is required to perform saturation at `i32`. It also allows to perform clamping using `i16` instructions, reducing number of registers needed in *true16* mode in some of the lit tests. The behavior is disabled for pre-gfx8 targets by checking `has16BitInsts()`.
This is part of patches to port BBAddrMap to COFF. Introduce BBAddrMap.h and move BBAddrMap/PGOAnalysisMap type definitions out of ELFTypes.h.
This patch introduces the following reduction operators: spirv.Tosa.ReduceAll spirv.Tosa.ReduceAny spirv.Tosa.ReduceMax spirv.Tosa.ReduceMin spirv.Tosa.ReduceProduct spirv.Tosa.ReduceSum Also dialect and serialization round-trip tests have been added. Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
As detailed here: https://github.com/InstLatx64/InstLatX64_Demo/blob/master/GFNI_Demo.h These are a bit more complicated than gf2p8affine look ups, requiring us to convert a SHL shift value / amount into a GF so we can perform a multiplication. SRL/SRA need to be converted to SHL via bitreverse/variable-sign-extension. Followup to #89115
) When rematerializing S_MOV_B64 or S_MOV_B64_IMM_PSEUDO and only a single 32-bit lane of the result is used at the remat point, emit S_MOV_B32 with the appropriate half of the 64-bit immediate instead. This reduces register pressure by defining a 32-bit register instead of a 64-bit pair when the other half is unused.
Essentially do the same thing as for LoopInfo. Anything inside a cycle is mutually reachable, and the cycle can be replaced by its exit blocks in the walk. An interesting additional thing we could do for CycleInfo (but not LoopInfo) is to early exit the walk if the stop block is not in a cycle and dominates the start block. I've not included this in this patch to keep the implementation the same as for LoopInfo to start with.
This test failed on the llvm-clang-win-x-aarch64 buildbot. It seems the rounding is different, leading to a different output. Instead of: Cost for VF 4: 9 (Estimated cost per lane: 2.2) The windows buildbot it fails because the test output is: Cost for VF 4: 9 (Estimated cost per lane: 2.3)
…box loads (#187152) When a boxed array is privatized via `omp.private`, the `SourceKind` of the loaded box data was being misclassified as `SourceKind::Indirect` by the alias analyzer. Instead its `SourceKind::Allocate` should be preserved. This caused AliasAnalysis to conservatively return `MayAlias` for accesses to privatized arrays vs dummy arguments. This prevented InlineHLFIRAssign from inlining array section assignments. Propagate the Allocate source kind when the box source is classified as `Allocate`, so that alias analysis correctly returns `NoAlias`.
This patch makes ClangIR emit .cir and .mlir files when the-save-temps flag is specified. Having these files emitted is useful e.g. when inspecting the generated code for OpenMP offloading. Co-authored-by: Claude Opus 4.6 noreply@anthropic.com
This PR enhance the multi-reduction layout propagation: 1. improve inst_data and lane_data to support fractional subgroup size 2. improve subgroup_layout/data setup to utilize the (nested) slice layout from consumer op It also removes the restriction in load_matrix/store_matrix layout propagation to allow nd (n>2) layout
As pointed out by #152770 (comment), 81e8a1e causes build errors with older versions of Xcode (Xcode 14 and older) when using std::not_fn() with llvm::make_filter_range(). This implements the same fix as in d1d9413.
Selects of the form `cond ? 1 : 0` are created during unrolling of setcc+vselect. Currently these are not optimized away post-legalization even if fully redundant. Having these extra selects sitting between things can prevent other folds from applying. Enabling this requires some mitigations in the ARM backend, in particular in the interaction with MVE support. There's two changes here: * Form CSINV/CSNEG/CSINC from CMOV, rather than only creating it during SELECT_CC lowering. (After this change, the lowering in SELECT_CC can be dropped without test changes, let me know if I should do that.) * Support pushing negations through CMOV in more cases, in particular if the operands are constant or the negation can be handled by flipping lshr/ashr. Additionally, in the X86 backend, try to simplify CMOV to SETCC if only the low bit is demanded.
#181725) Based on the suggestions in #140639, this PR adds the rewrite pattern `a bitwiseop (~b +/- c)` -> `a bitwiseop ~(b -/+ c)` for AND, OR, and XOR operations. This rewrite enables lowering to `ANDN`, `ORN`, and `XORN` operations. Added new MIR tests in `combine-binop-neg.mir` for AArch64 to verify the new combine works for various commuted and uncommuted forms with AND, OR, and XOR and added new LLVM IR tests for RISC-V in `rv32zbb-zbkb.ll` to ensure the combine produces the expected `ANDN`, `ORN`, and `XORN` operations.
…ecialized functions (#187645) For non-specialized functions, ACCSpecializeForDevice collects ACC ops inside compute constructs and applies device specialization patterns via applyOpPatternsGreedily. With the default AnyOp strictness, the greedy driver expands the worklist to parent ops when inner ops are modified, accidentally unwrapping the parent acc.parallel via ACCRegionUnwrapConversion. This leaves orphaned acc.loop combined(parallel) ops that lose their parallelism and reduction information downstream. Set GreedyRewriteStrictness::ExistingOps so the greedy driver only processes the initially collected inner ops, preserving the parent compute construct for ACCComputeLowering to handle.
If the instructions state is alternate and/or contains non-directly matching instructions, need to check if it is better to represent such operations as non-alternate with copyables. To do this, we need to compare operands between the instructions in their different representations and choose the best one for optimal vectorization. Reviewers: RKSimon, hiraditya Pull Request: #183777
- Enable `NoF16PseudoScalarTransInlineConstants` for 11.7. - Add test for `RequiredExportPriority`, one of the differences between 11.5 and 11.7.
…rv-val` output (#182549) KhronosGroup/SPIRV-Tools#6232 added support for `SPV_INTEL_function_pointers` on `spirv-val`. This PR updates some relevant tests to run `spirv-val` and document why some others are failing.
Adds a port for AArch64MIPeepholeOpt - Refactored lib/Target/AArch64/AArch64MIPeepholeOpt.cpp to extract base logic as Impl - Renamed existing pass with "Legacy" suffix and updated references - Added NewPM pass AArch64MIPeepholeOptPass - Updated tests
This adds a SPIR-V intrinsic for associating a name (textual identifier) to a specialisation constant. The name is encoded in metadata, and is intended to be used within LLVM / by the SPIR-V BE (e.g. #134016 would be a direct user), as it is never emitted into the SPIR-V object. Non-boolean and composite specialisation constants will be handled in the future, via dedicated intrinsics, if there is interest.
Can now be used as `REQUIRES: flang-rt`, for example.
Reverts #184164. Issue hit in testing, LCOMPILER-1587.
…187727) a3db68a seemed t be the obvious fix for the winhttp issue from 39d6bb2 in llvm-debuginfod-find, but there are still bots failing. This patch disables the test on all bots that cannot spawn an HTTP server in Python and record request headers. Ideally it turns all affected bots back to green and gives us an error message to investigate.
…s_poisoned (#187466) Align beg address down instead of up in __asan_region_is_poisoned(), so the shadow scan includes the first granule. This fixes a false negative when first granule has an unpoisoned prefix and poisoned suffix. Add test that covers this scenario.
…ls/clang-ssaf/` This patch extracts the shared code between `clang-ssaf-format` and `clang-ssaf-linker` into a new `clangScalableStaticAnalysisFrameworkTool` library at `clang/lib/ScalableStaticAnalysisFramework/Tool/`, with the public header at `clang/include/clang/ScalableStaticAnalysisFramework/Tool/Utils.h`. This shared library provides: - `fail()` overloads for fatal error reporting - `initTool()` — sets the tool name and version, configures the version printer, hides unrelated command-line options, and parses arguments - `getToolName()` — accessor for the tool name set by `initTool()` - `loadPlugins()` — loads plugin shared libraries from a list of paths - `getFormatForExtension()` — cached format-registry lookup - `SummaryFile` — resolves a file path to its serialization format - `ErrorMessages` — shared diagnostic string constants Tool-specific error strings remain in a `LocalErrorMessages` namespace in each tool's anonymous namespace. Binary names and locations (`bin/clang-ssaf-format`, `bin/clang-ssaf-linker`) are unchanged.
This code is adapted from `SelectionDAG::computeKnownBits` part of #150515 ticks off ABDS & ABDU
This is the implementation of part of F2023 new feature US 03. Extracting tokens from a string, SPLIT intrinsic. It's section 16.9.196 SPLIT (STRING, SET, POS [, BACK]) of Fortran 2023 Standard. It's part of Flang issue [#178044](#178044). Note that I work with @kwyatt-ext on this issue. He implemented the other part, TOKENIZE. A test will be added into [llvm-test-suite](https://github.com/llvm/llvm-test-suite) later after this PR is merged.
MSVC apparently also warns about deprecation at the implementation of deprecated functions... Pull Request: #187702
When converting from fir.alloca to memref.alloca, also copy the acc variable name attribute if it exists
…f/maximumf (#187647) The reduction recipe init region was producing 0.0 instead of the correct identity value (largest representable float for min, smallest for max) when the reduction operator was AccMinnumf, AccMinimumf, AccMaxnumf, or AccMaximumf. Only AccMin and AccMax were handled, causing the new operator variants to fall through to the else branch which returns 0. This caused min reductions to always produce 0.0 since min(x, 0.0) = 0.0 for all positive x. Replace the duplicated identity value logic with arith::getIdentityValue, using a mapping from acc::ReductionOperator to arith::AtomicRMWKind. Use minimumf/maximumf (which respect useOnlyFiniteValue) instead of minnumf/maxnumf (whose MLIR identity is NaN) to get correct finite identity values. This also fixes a pre-existing bug where the max reduction identity for floats used getSmallest (smallest subnormal, -1.4e-45) instead of getLargest with negative (-3.4e+38).
Document compiler invocation in the compiler output, to aide subsequent regeneration.
…date (#182155) This changes the inputs to `update`. It's now the data stack that was the result of `@init`. This makes `update` more predictable, as its inputs are the same between the first call and the Nth call.
When acc.par_width was introduced in #184864 there was a discussion on whether to use index or create a new type for the output of the operation. It was decided to create a new type; but this means that launch arguments cannot be used directly in the region such as for loop bounds without a conversion from the new type to index. In order to avoid the casting operations (and introduction of an actual operation to do this cast), simply restore acc.par_width to generate index type. This allows its result to be directly used in acc.compute_region.
Fixes breakage reported here: #187352 (comment)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )