Skip to content

Conversation

YangKeao
Copy link
Member

What problem does this PR solve?

Issue Number: close #61223

Problem Summary:

The cpu quota will always be 0 if it failed to get the cgroup limit. The same issue also exists for memory.

What changed and how does it work?

Change the logic of error handling in cgmon. Make the cpu quota defaults to the count of CPU, and the memory quota defaults to the memory get from /proc/meminfo. Then update these values according to the cgroup settings.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.
  1. Deploy master branch of tidb in rockylinux 9. Open the grafana. The cpu quota is 0.

image

  1. Deploy this branch of TiDB in rockylinux 9. Open the grafana. The cpu quota is 16.

image

  1. Modify the tiup config of TiDB. Set CPU quota to 800 and memory quota to 8G. The value is correct.

image
image

  1. Manually remove the memory controller from related cgroup, so that TiDB will get error when it's reading the memory quota, then the memory quota goes to the memory of the whole machine (16G).

image

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Fix the issue that the cpu quota is always 0 if `cpu.max` doesn't exist

@YangKeao
Copy link
Member Author

Ref #50468

@ti-chi-bot ti-chi-bot bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label May 26, 2025
ctx context.Context
cancel context.CancelFunc
wg sync.WaitGroup
cfgMaxProcs int
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is not used. I don't know why it was here before.

@ti-chi-bot ti-chi-bot bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 26, 2025
}

if quota != lastMaxProcs {
if quota != lastCPU {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a rename. Because it's actually not GOMAXPROCS, it's NumCPU.

@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 26, 2025
@YangKeao YangKeao requested review from hawkingrei and lance6716 May 26, 2025 09:13
@YangKeao YangKeao force-pushed the fix-61223 branch 2 times, most recently from e5fd8ca to 64d3cf6 Compare May 26, 2025 09:46
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 26, 2025
Copy link

codecov bot commented May 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.6258%. Comparing base (3ab8c7f) to head (348dad8).
Report is 7 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #61322        +/-   ##
================================================
+ Coverage   73.1942%   73.6258%   +0.4315%     
================================================
  Files          1726       1726                
  Lines        478576     479082       +506     
================================================
+ Hits         350290     352728      +2438     
+ Misses       106833     104949      -1884     
+ Partials      21453      21405        -48     
Flag Coverage Δ
integration 42.5659% <100.0000%> (?)
unit 72.5047% <81.2500%> (+0.0292%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.7804% <ø> (ø)
parser ∅ <ø> (∅)
br 47.5567% <ø> (-0.0164%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@YangKeao
Copy link
Member Author

/retest

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 27, 2025
Copy link

ti-chi-bot bot commented May 27, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Defined2014, hawkingrei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Defined2014,hawkingrei]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 27, 2025
Copy link

ti-chi-bot bot commented May 27, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-05-27 05:19:36.736676938 +0000 UTC m=+332707.108464387: ☑️ agreed by hawkingrei.
  • 2025-05-27 05:45:27.1218382 +0000 UTC m=+334257.493625658: ☑️ agreed by Defined2014.

@ti-chi-bot ti-chi-bot bot merged commit a761430 into pingcap:master May 27, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TiDB should be able to handle the case when cpu.max doesn't exist and the cgroup config doesn't have cpu controller
3 participants