-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Ignore intrinsic calls in cross-crate-inlining cost model #145910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Ignore intrinsic calls in cross-crate-inlining cost model
This comment has been minimized.
This comment has been minimized.
if let Some((fn_def_id, _)) = func.const_fn_def() { | ||
if self.tcx.intrinsic(fn_def_id).is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this would benefit from combining into one if
using either let-chaining or is_some_and
.
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e8d1f9d): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.1%, secondary -3.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 0.8%, secondary 0.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.1%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 466.645s -> 466.461s (-0.04%) |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Ignore intrinsic calls in cross-crate-inlining cost model
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@bors try cancel |
Try build cancelled. Cancelled workflows: |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Ignore intrinsic calls in cross-crate-inlining cost model
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (0f272e5): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.0%, secondary -2.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.5%, secondary 0.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.1%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 468.329s -> 467.725s (-0.13%) |
The perf impact that's concerning to me is the regression in eza incr-patched, where it looks like we are dirtying 5 additional CGUs. @Kobzol this is the kind of perf issue I saying in Delft that I really want tooling to diagnose. I have locally hacked a compiler to see what functions are affected by this change and I can't find how they are related to the diffs in eza. I could brush off the regressions but I think there is a legitimate undesirable regression that I can't figure out how to understand. |
Adding it to my TODO list. Sadly I'm still stuck on fixing bootstrap after the stage0 redesign, and didn't have time yet to go back to incremental profiling. |
2fbad6d
to
5dc2b2e
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Ignore intrinsic calls in cross-crate-inlining cost model
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (c3f0a64): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -2.0%, secondary -0.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -6.7%, secondary 12.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 1.3%, secondary 0.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 465.874s -> 466.593s (0.15%) |
This is the useful compare link: https://perf.rust-lang.org/compare.html?start=7cb1a81145a739c4fd858abe3c624ce8e6e5f9cd&end=c3f0a64dbf9fba4722dacf8e39d2fe00069c995e&stat=instructions%3Au Because it disables CGU merging in both commits, so effects that cause changes in the sysroot to perturb partitioning downstream are excluded. |
5dc2b2e
to
53bb74b
Compare
53bb74b
to
ab91a63
Compare
@Kobzol I figured out this case, see the updated PR description |
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
I noticed in a side project that a function which just compares to
[u64; 2]
for equality is not cross-crate-inlinable. That was surprising to me because I didn't think that code contained a function call, but of course our array comparisons are lowered to an intrinsic. Intrinsic calls don't make a function no longer a leaf, so it makes sense to add this as an exception to the "only leaves" cross-crate-inline heuristic.This is the useful compare link: https://perf.rust-lang.org/compare.html?start=7cb1a81145a739c4fd858abe3c624ce8e6e5f9cd&end=c3f0a64dbf9fba4722dacf8e39d2fe00069c995e&stat=instructions%3Au because it disables CGU merging in both commits, so effects that cause changes in the sysroot to perturb partitioning downstream are excluded. Perturbations to what is and isn't cross-crate-inlinable in the sysroot has chaotic effects on what items are in which CGUs after merging. It looks like before this PR by sheer luck some of the CGUs dirtied by the patch in eza incr-unchanged happened to be merged together, and with this PR they are not.
The perf runs on this PR point to a nice runtime performance improvement.