Skip to content

Conversation

JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Jul 1, 2025

What problem does this PR solve?

Issue Number: close #10272

Problem Summary:

What is changed and how it works?

* Add metrics about
  * Number of keyspace, Storage instance, Segment instance, MemTable instance
  * Bytes of MemTable and allocated bytes of MemTable
* System table
  * Add `delta_cache_alloc_size` to `system.dt_segments`
  * Add `column_count`,`delta_cache_alloc_size` to `system.dt_tables`

New Grafana panels on TiFlash-Summary:

  • "Threads" - a copy of "Threads" from TiFlash-Proxy-Details. We can check whether specified thread request a large IO, etc
  • "Imbalanced read/write" - Check whether the CPU usage/read throughput/write throughput is imbalanced between different TiFlash instances
  • "Memory trace" - A summary status of objects in memory, check whether there are OOM risk. Number of tables, segments, memtable bytes, etc

With the "Memory trace" panel, we can get more insight about the issue #10253 (comment) that mem-table may take too much memory.
Image

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
* deploy a cluster with 2 tiflash
* load chbenchmark 1500
* run chbenchmark workload with 50 tp thread and 1 ap thread for 60 minutes
"tiup bench ch --host 10.2.12.81 -P 8020 --warehouses 1500 run -D chbenchmark -T 50 -t 1 --time 60m"
* collect the metrics below

image
image
image

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Enhance the observability for TiFlash OOM risk under wide-column scenario

Signed-off-by: JaySon-Huang <[email protected]>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 1, 2025
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 4, 2025
Signed-off-by: JaySon-Huang <[email protected]>
@JaySon-Huang JaySon-Huang force-pushed the add_wide_table_metrics branch from c79ab59 to 908c195 Compare July 4, 2025 07:50
@JaySon-Huang JaySon-Huang force-pushed the add_wide_table_metrics branch from 73354ee to 7c212ca Compare July 7, 2025 04:41
Signed-off-by: JaySon-Huang <[email protected]>
@JaySon-Huang JaySon-Huang force-pushed the add_wide_table_metrics branch from 7c212ca to b2bfcf3 Compare July 7, 2025 04:45
@JaySon-Huang
Copy link
Contributor Author

/test pull-integration-next-gen

@JaySon-Huang
Copy link
Contributor Author

/test pull-integration

Copy link
Contributor

ti-chi-bot bot commented Jul 7, 2025

@JaySon-Huang: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test pull-integration-next-gen
/test pull-integration-test
/test pull-unit-next-gen
/test pull-unit-test

Use /test all to run all jobs.

In response to this:

/test pull-integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@JaySon-Huang
Copy link
Contributor Author

/test pull-integration-test

@JaySon-Huang JaySon-Huang changed the title WIP *: Enhance the o11y of TiFlash storage layer Jul 7, 2025
@JaySon-Huang JaySon-Huang changed the title *: Enhance the o11y of TiFlash storage layer metrics: Enhance the o11y of TiFlash storage layer Jul 7, 2025
@JaySon-Huang
Copy link
Contributor Author

/test pull-integration-test

1 similar comment
@JaySon-Huang
Copy link
Contributor Author

/test pull-integration-test

@JaySon-Huang JaySon-Huang force-pushed the add_wide_table_metrics branch from 185fad4 to 8de8ae1 Compare July 7, 2025 17:04
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jul 8, 2025
};
using SegmentsStats = std::vector<SegmentStats>;

struct StoreStats
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New added fields:

  • column_count
  • delta_cache_alloc_size

namespace DB::DM
{

struct SegmentStats
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New added fields:

  • delta_cache_alloc_size

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jul 8, 2025
Copy link
Member

@CalvinNeo CalvinNeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 8, 2025
Copy link
Contributor

ti-chi-bot bot commented Jul 8, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-07-08 07:56:57.028637041 +0000 UTC m=+1986469.751816022: ☑️ agreed by Lloyd-Pottiger.
  • 2025-07-08 08:05:25.817386988 +0000 UTC m=+1986978.540565970: ☑️ agreed by CalvinNeo.

Copy link
Contributor

ti-chi-bot bot commented Jul 8, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CalvinNeo, JinheLin, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [CalvinNeo,JinheLin,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot merged commit 6344098 into pingcap:master Jul 8, 2025
7 checks passed
@JaySon-Huang JaySon-Huang deleted the add_wide_table_metrics branch July 8, 2025 08:21
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Jul 8, 2025
ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jul 8, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #10288.
But this PR has conflicts, please resolve them!

@ti-chi-bot ti-chi-bot bot added needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. labels Jul 8, 2025
ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jul 8, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #10289.
But this PR has conflicts, please resolve them!

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #10290.
But this PR has conflicts, please resolve them!

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jul 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance the o11y of TiFlash storage layer
5 participants