Runaway query control based on resource group

## Motivation
Run-away queries are queries that consume more resources beyond user expectation. This could be caused by improper SQL statement, suboptimal plan.
Runaway query can impact overall performance if they are not managed properly. We need to manage run-away queries effectively. Long-running operations should be identified and aborted. 
Currently, we already have the deadline mechanism pushed down to the TiKV layer that one coprocessor request would not execute in TiKV more than 60s by default. But a runaway query may not cost too much time on one single coprocessor request, thus the deadline mechanism can't help avoid run-away queries.  In the meantime, deadlines can't be too small, otherwise, normal requests can be quickly aborted. 

## How to identify run-away queries?
Runaway queries can adversely impact overall performance if they are not managed properly. Resource manager can take action when a query exceeds more than a specified amount of elapsed time. The elasped time indicates the time of being processed, which excludes the waiting time. 
Differentiating run-away queries from queries that really need to perform a full table/index scan is hard. There is no absolute rule. So we just let users define the rule to identify run-away queries. They can twist it on their own needs. The criteria are only the execution time, at least at present. Maybe add more dimension later.
TiKV would send back the scan detail in coprocessor responses. If the total elapsed time of the query exceeds the threshold, then it would be recognized as a run-away query(statement). 

## Task Breakdown

- [x] Extend resource group meta with runaway config 
  - [x] Update kvproto @Connor1996 https://github.com/pingcap/kvproto/pull/1114
  - [x] Extend create/alter resource group statement @CabinfeverB 
     - [x] parser part https://github.com/pingcap/tidb/pull/43843
     - [x] pd part
       - [x] https://github.com/tikv/pd/pull/6475
       - [x] https://github.com/tikv/pd/pull/6510
       - [x] https://github.com/tikv/pd/pull/6515
     - [x] Make alter patch style https://github.com/pingcap/tidb/pull/44322
     - [x] Extend admin table `information_schema.resource_groups`  https://github.com/pingcap/tidb/pull/43877 
- [x] Identify runaway in cop client and perform action @Connor1996 
  - [x] Introduce runaway checker https://github.com/pingcap/tidb/pull/44339
    - [x] Use const default resource group name https://github.com/pingcap/tidb/pull/44526
  - [x] Update kvproto to add `override_priority` https://github.com/pingcap/kvproto/pull/1114
  - [x] Override resource group priority on tikv side https://github.com/tikv/tikv/pull/14926
  - [x] Fix mock PD client https://github.com/tikv/client-go/pull/839
- [x] Introduce admin table `information_schema.runaway_queries` 
  - [x] Persist runaway records in kv with flush mechanism https://github.com/pingcap/tidb/pull/44654 @CabinfeverB
  - [x] clean history records https://github.com/pingcap/tidb/pull/44784 @CabinfeverB
- [x] Quarantine runaway queries @CabinfeverB 
  - [x] Identify later matched SQL or similar SQL and reject with error https://github.com/pingcap/tidb/pull/44474
  - [x] Provide admin table `information_schema.qurantined_watch` https://github.com/pingcap/tidb/pull/44654
- [x] Provide a way to mark runaway manually
  - [x] Add query watch statement https://github.com/pingcap/tidb/pull/45500
  - [x] Let watch records sync among TiDBs https://github.com/pingcap/tidb/pull/45465

Misc
- [x] Introduce a metric "max query elapsed time by resource groups" https://github.com/pingcap/tidb/pull/44746
- [x] Add user document https://github.com/pingcap/docs-cn/pull/14242
- [x] Publish RFC https://github.com/pingcap/tidb/pull/44745


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Runaway query control based on resource group #43691

Motivation

How to identify run-away queries?

Task Breakdown

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Runaway query control based on resource group #43691

Description

Motivation

How to identify run-away queries?

Task Breakdown

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions