Skip to content

Slow log parse performance improvement for splitByColon #54538

@yibin87

Description

@yibin87

Enhancement

Do cpu profile for query like:

SELECT
  Digest,
  Query,
  Conn_ID,
  (UNIX_TIMESTAMP(Time) + 0E0) AS timestamp,
  Query_time,
  Mem_max,
  Process_keys
FROM
  `INFORMATION_SCHEMA`.`CLUSTER_SLOW_QUERY`
WHERE
  Time BETWEEN FROM_UNIXTIME(?)
  AND FROM_UNIXTIME(?)
ORDER BY
  Query_time DESC
LIMIT 100;

Find the splitByColon takes about 80% cpu times:
img_v3_02cl_7c4b6024-a3b3-4168-b5c4-86858cb0605g

The implementation uses regexp to parse key: value strings:

// kvSplitRegex: it was just for split "field: value field: value..."
var kvSplitRegex = regexp.MustCompile(`\w+: `)
// splitByColon split a line like "field: value field: value..."
func splitByColon(line string) (fields []string, values []string) {
matches := kvSplitRegex.FindAllStringIndex(line, -1)

However, in most cases, we can just go through the string, and take string before colon as key, string after colon as value, which seems can improve performance significantly.

Metadata

Metadata

Assignees

Labels

affects-6.1This bug affects the 6.1.x(LTS) versions.affects-6.5This bug affects the 6.5.x(LTS) versions.affects-7.1This bug affects the 7.1.x(LTS) versions.affects-7.5This bug affects the 7.5.x(LTS) versions.affects-8.1This bug affects the 8.1.x(LTS) versions.report/customerCustomers have encountered this bug.severity/majortype/bugThe issue is confirmed as a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions