-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Make the allocator shim participate in LTO again #146232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Make the allocator shim participate in LTO again
4ec9a9b
to
d0e65a9
Compare
Forgot to revert the changes in exported_symbols_for_lto. This shouldn't affect the perf run other than possibly showing less of a performance improvement than it should give in the end. |
This comment has been minimized.
This comment has been minimized.
r? @fee1-dead rustbot has assigned @fee1-dead. Use |
Some changes occurred in compiler/rustc_codegen_ssa |
This test reproduces some from of these other two issues ( //@ compile-flags: --crate-type cdylib -C lto
use std::alloc::{GlobalAlloc, Layout};
struct MyAllocator;
unsafe impl GlobalAlloc for MyAllocator {
unsafe fn alloc(&self, _layout: Layout) -> *mut u8 {
todo!()
}
unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) {
}
}
#[global_allocator]
static GLOBAL: MyAllocator = MyAllocator;
You've since done this, IIUC. |
Thanks! Had to modify it slightly to work with compiletest.
Correct |
Otherwise this looks good to me and fixes the regressions, so that's great, thanks! I'm not sure we care about the perf results, but they should be available in 3-4hours. You can r=me at your preference. |
e10f5b6
to
e072d7d
Compare
Finished benchmarking commit (5ab6398): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.8%, secondary -2.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -13.2%, secondary -8.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 53.0%, secondary 59.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 467.829s -> 466.151s (-0.36%) |
It improves things even more than it previously regressed. $ echo 'fn main() {}' | RUSTC_BOOTSTRAP=1 rustc +nightly-2025-09-01 - -Zhuman-readable-cgu-names -O -Csave-temps -Clto=true && ls -l
-rwxrwxr-x 1 bjorn bjorn 1772392 5 sep 20:53 rust_out
-rw-rw-r-- 1 bjorn bjorn 2609776 5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn 6979364 5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.lto.after-restriction.bc
-rw-rw-r-- 1 bjorn bjorn 6979324 5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.lto.input.bc
-rw-rw-r-- 1 bjorn bjorn 4512 5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.no-opt.bc
-rw-rw-r-- 1 bjorn bjorn 3211728 5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.o
$ echo 'fn main() {}' | RUSTC_BOOTSTRAP=1 rustc +nightly - -Zhuman-readable-cgu-names -O -Csave-temps -Clto=true && ls -l
-rwxrwxr-x 1 bjorn bjorn 1766032 5 sep 20:54 rust_out
-rw-rw-r-- 1 bjorn bjorn 2612612 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn 6982788 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.lto.after-restriction.bc
-rw-rw-r-- 1 bjorn bjorn 6982752 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.lto.input.bc
-rw-rw-r-- 1 bjorn bjorn 4512 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.no-opt.bc
-rw-rw-r-- 1 bjorn bjorn 3188680 5 sep 20:54 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.o
-rw-rw-r-- 1 bjorn bjorn 3484 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-crate.allocator.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn 3224 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-crate.allocator.rcgu.o
$ echo 'fn main() {}' | RUSTC_BOOTSTRAP=1 rustc +5ab63980021f7c1ae280eba3261d66240d594007 - -Zhuman-readable-cgu-names -O -Csave-temps -Clto=true && ls -l
-rwxrwxr-x 1 bjorn bjorn 2264800 5 sep 20:55 rust_out
-rw-rw-r-- 1 bjorn bjorn 4512 5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-cgu.0.rcgu.no-opt.bc
-rw-rw-r-- 1 bjorn bjorn 2619640 5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn 6982380 5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.lto.after-restriction.bc
-rw-rw-r-- 1 bjorn bjorn 6982344 5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.lto.input.bc
-rw-rw-r-- 1 bjorn bjorn 3701008 5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.o I suspect what happened is that I didn't restore the check in fat LTO that ensures the allocator module is not used as base to merge all other modules into. The allocator module is not configured to be optimized, so we probably skipped all optimizations after doing the module merging pass of fat LTO. I've added a new commit to fix this. @bors try @rust-timer queue |
☔ The latest upstream changes (presumably #146255) made this pull request unmergeable. Please resolve the merge conflicts. |
9623f42
to
3851246
Compare
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
Not sure how to easily add a test either. @bors r=lqd |
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
3851246
to
2cf94b9
Compare
Added a missing @bors r=lqd |
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 6d5caf3 (parent) -> bea625f (this PR) Test differencesShow 5 test diffsStage 1
Stage 2
Additionally, 2 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard bea625f3275e3c897dc965ed97a1d19ef7831f01 --output-dir test-dashboard And then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
Finished benchmarking commit (bea625f): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.2%, secondary -3.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.2%, secondary -2.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.8%, secondary -1.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 469.247s -> 468.001s (-0.27%) |
This is likely the cause of the perf regression in #145955. It also caused some functional regressions.
Fixes #146235
Fixes #146239