Rollup of 7 pull requests#152632
Conversation
This moves all LLVM intrinsic handling out of the regular call path for cg_gcc and makes it easier to hook into this code for future cg_llvm changes.
…acro_transparency`
It's described as a "backwards compatibility hack to keep the diff small". Removing it requires only a modest amount of churn, and the resulting code is clearer without the invisible derefs.
…ics-generation Simplify intrinsics generation
Regenerate intrinsics
Move LTO to OngoingCodegen::join This will make it easier to in the future move all this code to link_binary. Follow up to rust-lang#147810 Part of rust-lang/compiler-team#908
This is the conceptual opposite of the rust-cold calling convention and
is particularly useful in combination with the new `explicit_tail_calls`
feature.
For relatively tight loops implemented with tail calling (`become`) each
of the function with the regular calling convention is still responsible
for restoring the initial value of the preserved registers. So it is not
unusual to end up with a situation where each step in the tail call loop
is spilling and reloading registers, along the lines of:
foo:
push r12
; do things
pop r12
jmp next_step
This adds up quickly, especially when most of the clobberable registers
are already used to pass arguments or other uses.
I was thinking of making the name of this ABI a little less LLVM-derived
and more like a conceptual inverse of `rust-cold`, but could not come
with a great name (`rust-cold` is itself not a great name: cold in what
context? from which perspective? is it supposed to mean that the
function is rarely called?)
Fix segfault related to __builtin_unreachable with inline asm
…bilee
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- rust-lang#60637
- rust-lang#137407
- rust-lang#122623
- rust-lang#97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
…ochenkov
abi: add a rust-preserve-none calling convention
This is the conceptual opposite of the rust-cold calling convention and is particularly useful in combination with the new `explicit_tail_calls` feature.
For relatively tight loops implemented with tail calling (`become`) each of the function with the regular calling convention is still responsible for restoring the initial value of the preserved registers. So it is not unusual to end up with a situation where each step in the tail call loop is spilling and reloading registers, along the lines of:
foo:
push r12
; do things
pop r12
jmp next_step
This adds up quickly, especially when most of the clobberable registers are already used to pass arguments or other uses.
I was thinking of making the name of this ABI a little less LLVM-derived and more like a conceptual inverse of `rust-cold`, but could not come with a great name (`rust-cold` is itself not a great name: cold in what context? from which perspective? is it supposed to mean that the function is rarely called?)
|
@bors r+ rollup=never p=5 |
This comment has been minimized.
This comment has been minimized.
…uwer Rollup of 7 pull requests Successful merges: - #152622 (Update GCC subtree) - #145024 (Optimize indexing slices and strs with inclusive ranges) - #151365 (UnsafePinned: implement opsem effects of UnsafeUnpin) - #152381 (Do not require `'static` for obtaining reflection information.) - #143575 (Remove named lifetimes in some `PartialOrd` & `PartialEq` `impl`s) - #152404 (tests: adapt align-offset.rs for InstCombine improvements in LLVM 23) - #152582 (rustc_query_impl: Use `ControlFlow` in `visit_waiters` instead of nested options)
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
💔 Test for 60f234f failed: CI. Failed job:
|
|
@bors retry |
This comment has been minimized.
This comment has been minimized.
|
📌 Perf builds for each rolled up PR:
previous master: a33907a7a5 In the case of a perf regression, run the following command for each PR you suspect might be the cause: |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing a33907a (parent) -> 7bee525 (this PR) Test differencesShow 713 test diffsStage 1
Stage 2
Additionally, 703 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 7bee525095c0872e87c038c412c781b9bbb3f5dc --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (7bee525): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowOur benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.7%, secondary -6.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 7.8%, secondary 2.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary 0.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 483.617s -> 481.339s (-0.47%) |
…jhpratt Add information to spurious `oneshot::send_before_recv_timeout` test This test regularly spuriously fails in CI, such as rust-lang#152632 (comment) We can just remove the assertion but I'd like to understand why, so I'm adding more information to the assert
…jhpratt Add information to spurious `oneshot::send_before_recv_timeout` test This test regularly spuriously fails in CI, such as rust-lang#152632 (comment) We can just remove the assertion but I'd like to understand why, so I'm adding more information to the assert
Successful merges:
'staticfor obtaining reflection information. #152381 (Do not require'staticfor obtaining reflection information.)PartialOrd&PartialEqimpls #143575 (Remove named lifetimes in somePartialOrd&PartialEqimpls)ControlFlowinvisit_waitersinstead of nested options #152582 (rustc_query_impl: UseControlFlowinvisit_waitersinstead of nested options)r? @ghost
Create a similar rollup