Skip to content

Conversation

bjorn3
Copy link
Member

@bjorn3 bjorn3 commented Sep 5, 2025

This is likely the cause of the perf regression in #145955. It also caused some functional regressions.

Fixes #146235
Fixes #146239

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 5, 2025
@bjorn3
Copy link
Member Author

bjorn3 commented Sep 5, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Sep 5, 2025
Make the allocator shim participate in LTO again
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 5, 2025
@bjorn3 bjorn3 force-pushed the lto_allocator_shim branch from 4ec9a9b to d0e65a9 Compare September 5, 2025 10:04
@bjorn3
Copy link
Member Author

bjorn3 commented Sep 5, 2025

Forgot to revert the changes in exported_symbols_for_lto. This shouldn't affect the perf run other than possibly showing less of a performance improvement than it should give in the end.

@bjorn3 bjorn3 mentioned this pull request Sep 5, 2025
@rust-bors
Copy link

rust-bors bot commented Sep 5, 2025

☀️ Try build successful (CI)
Build commit: 5ab6398 (5ab63980021f7c1ae280eba3261d66240d594007, parent: ad85bc524b1ad696e42061ad8338d382dffbdbe5)

@rust-timer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Sep 5, 2025

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 5, 2025
@rustbot
Copy link
Collaborator

rustbot commented Sep 5, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

@lqd
Copy link
Member

lqd commented Sep 5, 2025

This test reproduces some from of these other two issues (rust-lld: error: undefined hidden symbol: __rustc::__rg_oom without this PR). Can you add it to the PR?

//@ compile-flags: --crate-type cdylib -C lto 

use std::alloc::{GlobalAlloc, Layout};

struct MyAllocator;

unsafe impl GlobalAlloc for MyAllocator {
    unsafe fn alloc(&self, _layout: Layout) -> *mut u8 {
        todo!()
    }

    unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) {
    }
}

#[global_allocator]
static GLOBAL: MyAllocator = MyAllocator;

Forgot to revert the changes in exported_symbols_for_lto

You've since done this, IIUC.

@bjorn3
Copy link
Member Author

bjorn3 commented Sep 5, 2025

Thanks! Had to modify it slightly to work with compiletest.

Forgot to revert the changes in exported_symbols_for_lto

You've since done this, IIUC.

Correct

@lqd
Copy link
Member

lqd commented Sep 5, 2025

Otherwise this looks good to me and fixes the regressions, so that's great, thanks!

I'm not sure we care about the perf results, but they should be available in 3-4hours. You can r=me at your preference.

@bjorn3 bjorn3 force-pushed the lto_allocator_shim branch from e10f5b6 to e072d7d Compare September 5, 2025 15:02
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5ab6398): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.2% [0.1%, 0.3%] 2
Improvements ✅
(primary)
-1.3% [-27.1%, -0.3%] 229
Improvements ✅
(secondary)
-1.1% [-46.7%, -0.0%] 264
All ❌✅ (primary) -1.3% [-27.1%, -0.3%] 229

Max RSS (memory usage)

Results (primary 1.8%, secondary -2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
7.4% [7.4%, 7.4%] 1
Regressions ❌
(secondary)
2.1% [1.4%, 2.7%] 5
Improvements ✅
(primary)
-0.9% [-1.4%, -0.5%] 2
Improvements ✅
(secondary)
-4.5% [-6.5%, -2.5%] 9
All ❌✅ (primary) 1.8% [-1.4%, 7.4%] 3

Cycles

Results (primary -13.2%, secondary -8.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.9% [2.9%, 2.9%] 1
Improvements ✅
(primary)
-13.2% [-24.6%, -2.4%] 6
Improvements ✅
(secondary)
-10.8% [-44.4%, -1.6%] 6
All ❌✅ (primary) -13.2% [-24.6%, -2.4%] 6

Binary size

Results (primary 53.0%, secondary 59.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
70.9% [70.9%, 71.0%] 3
Regressions ❌
(secondary)
118.8% [118.8%, 118.8%] 1
Improvements ✅
(primary)
-0.9% [-0.9%, -0.9%] 1
Improvements ✅
(secondary)
-0.4% [-0.4%, -0.4%] 1
All ❌✅ (primary) 53.0% [-0.9%, 71.0%] 4

Bootstrap: 467.829s -> 466.151s (-0.36%)
Artifact size: 390.58 MiB -> 387.87 MiB (-0.69%)

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 5, 2025
@bjorn3
Copy link
Member Author

bjorn3 commented Sep 5, 2025

It improves things even more than it previously regressed.

$ echo 'fn main() {}' | RUSTC_BOOTSTRAP=1 rustc +nightly-2025-09-01  - -Zhuman-readable-cgu-names -O -Csave-temps -Clto=true && ls -l
-rwxrwxr-x 1 bjorn bjorn 1772392  5 sep 20:53 rust_out
-rw-rw-r-- 1 bjorn bjorn 2609776  5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn 6979364  5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.lto.after-restriction.bc
-rw-rw-r-- 1 bjorn bjorn 6979324  5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.lto.input.bc
-rw-rw-r-- 1 bjorn bjorn    4512  5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.no-opt.bc
-rw-rw-r-- 1 bjorn bjorn 3211728  5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.o
$ echo 'fn main() {}' | RUSTC_BOOTSTRAP=1 rustc +nightly  - -Zhuman-readable-cgu-names -O -Csave-temps -Clto=true && ls -l
-rwxrwxr-x 1 bjorn bjorn 1766032  5 sep 20:54 rust_out
-rw-rw-r-- 1 bjorn bjorn 2612612  5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn 6982788  5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.lto.after-restriction.bc
-rw-rw-r-- 1 bjorn bjorn 6982752  5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.lto.input.bc
-rw-rw-r-- 1 bjorn bjorn    4512  5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.no-opt.bc
-rw-rw-r-- 1 bjorn bjorn 3188680  5 sep 20:54 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.o
-rw-rw-r-- 1 bjorn bjorn    3484  5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-crate.allocator.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn    3224  5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-crate.allocator.rcgu.o
$ echo 'fn main() {}' | RUSTC_BOOTSTRAP=1 rustc +5ab63980021f7c1ae280eba3261d66240d594007 - -Zhuman-readable-cgu-names -O -Csave-temps -Clto=true && ls -l
-rwxrwxr-x 1 bjorn bjorn 2264800  5 sep 20:55 rust_out
-rw-rw-r-- 1 bjorn bjorn    4512  5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-cgu.0.rcgu.no-opt.bc
-rw-rw-r-- 1 bjorn bjorn 2619640  5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn 6982380  5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.lto.after-restriction.bc
-rw-rw-r-- 1 bjorn bjorn 6982344  5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.lto.input.bc
-rw-rw-r-- 1 bjorn bjorn 3701008  5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.o

I suspect what happened is that I didn't restore the check in fat LTO that ensures the allocator module is not used as base to merge all other modules into. The allocator module is not configured to be optimized, so we probably skipped all optimizations after doing the module merging pass of fat LTO. I've added a new commit to fix this.

@bors try @rust-timer queue

@bors
Copy link
Collaborator

bors commented Sep 6, 2025

☔ The latest upstream changes (presumably #146255) made this pull request unmergeable. Please resolve the merge conflicts.

@bjorn3 bjorn3 force-pushed the lto_allocator_shim branch from 9623f42 to 3851246 Compare September 6, 2025 08:36
@rustbot
Copy link
Collaborator

rustbot commented Sep 6, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@bjorn3
Copy link
Member Author

bjorn3 commented Sep 6, 2025

Not sure how to easily add a test either.

@bors r=lqd

@bors
Copy link
Collaborator

bors commented Sep 6, 2025

📌 Commit 3851246 has been approved by lqd

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 6, 2025
@bors
Copy link
Collaborator

bors commented Sep 6, 2025

⌛ Testing commit 3851246 with merge 831db11...

bors added a commit that referenced this pull request Sep 6, 2025
Make the allocator shim participate in LTO again

This is likely the cause of the perf regression in #145955. It also caused some functional regressions.

Fixes #146235
Fixes #146239
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Sep 6, 2025

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 6, 2025
@bjorn3 bjorn3 force-pushed the lto_allocator_shim branch from 3851246 to 2cf94b9 Compare September 6, 2025 13:31
@bjorn3
Copy link
Member Author

bjorn3 commented Sep 6, 2025

Added a missing //@ needs-crate-type: cdylib in the new test.

@bors r=lqd

@bors
Copy link
Collaborator

bors commented Sep 6, 2025

📌 Commit 2cf94b9 has been approved by lqd

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 6, 2025
@bors
Copy link
Collaborator

bors commented Sep 6, 2025

⌛ Testing commit 2cf94b9 with merge bea625f...

@bors
Copy link
Collaborator

bors commented Sep 6, 2025

☀️ Test successful - checks-actions
Approved by: lqd
Pushing bea625f to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 6, 2025
@bors bors merged commit bea625f into rust-lang:master Sep 6, 2025
11 checks passed
@rustbot rustbot added this to the 1.91.0 milestone Sep 6, 2025
@bjorn3 bjorn3 deleted the lto_allocator_shim branch September 6, 2025 18:36
Copy link
Contributor

github-actions bot commented Sep 6, 2025

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 6d5caf3 (parent) -> bea625f (this PR)

Test differences

Show 5 test diffs

Stage 1

  • [ui] tests/ui/lto/lto-global-allocator.rs: [missing] -> pass (J0)

Stage 2

  • [ui] tests/ui/lto/lto-global-allocator.rs: [missing] -> pass (J1)
  • [ui] tests/ui/lto/lto-global-allocator.rs: [missing] -> ignore (skipping test as target does not support all of the crate types ["cdylib"]) (J2)

Additionally, 2 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard bea625f3275e3c897dc965ed97a1d19ef7831f01 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. x86_64-gnu-llvm-19-3: 6327.4s -> 7257.5s (14.7%)
  2. dist-x86_64-apple: 8410.6s -> 7204.6s (-14.3%)
  3. dist-apple-various: 4219.0s -> 3687.9s (-12.6%)
  4. dist-aarch64-apple: 7069.4s -> 6361.9s (-10.0%)
  5. aarch64-apple: 5645.9s -> 5088.4s (-9.9%)
  6. dist-aarch64-msvc: 5196.0s -> 5651.3s (8.8%)
  7. dist-i686-mingw: 9459.6s -> 8857.2s (-6.4%)
  8. dist-sparcv9-solaris: 4946.9s -> 5257.6s (6.3%)
  9. dist-x86_64-netbsd: 4951.7s -> 4642.1s (-6.3%)
  10. aarch64-gnu: 6523.1s -> 6121.3s (-6.2%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (bea625f): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.1% [0.1%, 0.2%] 2
Improvements ✅
(primary)
-0.9% [-1.9%, -0.3%] 221
Improvements ✅
(secondary)
-0.9% [-2.2%, -0.0%] 260
All ❌✅ (primary) -0.9% [-1.9%, -0.3%] 221

Max RSS (memory usage)

Results (primary 1.2%, secondary -3.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.5% [4.5%, 4.5%] 1
Regressions ❌
(secondary)
2.1% [1.9%, 2.3%] 2
Improvements ✅
(primary)
-2.1% [-2.1%, -2.1%] 1
Improvements ✅
(secondary)
-5.0% [-7.1%, -2.2%] 9
All ❌✅ (primary) 1.2% [-2.1%, 4.5%] 2

Cycles

Results (primary -2.2%, secondary -2.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.6% [4.6%, 4.6%] 1
Improvements ✅
(primary)
-2.2% [-3.0%, -1.9%] 5
Improvements ✅
(secondary)
-3.7% [-9.5%, -2.0%] 9
All ❌✅ (primary) -2.2% [-3.0%, -1.9%] 5

Binary size

Results (primary -0.8%, secondary -1.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.8% [-0.9%, -0.8%] 4
Improvements ✅
(secondary)
-1.7% [-1.7%, -1.7%] 1
All ❌✅ (primary) -0.8% [-0.9%, -0.8%] 4

Bootstrap: 469.247s -> 468.001s (-0.27%)
Artifact size: 390.35 MiB -> 387.44 MiB (-0.74%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hidden symbol in nightly. rustc emits an unexpected _rdl symbols for WASM with lto=true
8 participants