Skip to content

Conversation

breezewish
Copy link
Member

@breezewish breezewish commented Apr 22, 2025

What problem does this PR solve?

Issue Number: ref #1793

Problem Summary:

What changed and how does it work?

This PR adds complete support of creating table and inserting data using a FullText index, with latest TiFlash nightly.

This PR does not support reading from FullText index (which will be included in later PRs).

Reading will simply fail:

mysql> SELECT * FROM stock_items
    ->     WHERE fts_match_word("bluetoothイヤホン", title)
    ->     ORDER BY fts_match_word("bluetoothイヤホン", title)
    ->     DESC LIMIT 10;
ERROR 1105 (HY000): cannot use 'FTS_MATCH_WORD()' outside of fulltext index

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
CREATE TABLE stock_items(
    id INT, 
    title TEXT,
    FULLTEXT INDEX (title) WITH PARSER MULTILINGUAL
);

INSERT INTO stock_items VALUES (1, "イヤホン bluetooth ワイヤレスイヤホン ");
INSERT INTO stock_items VALUES (2, "完全ワイヤレスイヤホン/ウルトラノイズキャンセリング 2.0 ");
INSERT INTO stock_items VALUES (3, "ワイヤレス ヘッドホン Bluetooth 5.3 65時間再生 ヘッドホン 40mm HD ");
INSERT INTO stock_items VALUES (4, "楽器用 オンイヤーヘッドホン 密閉型【国内正規品】");
INSERT INTO stock_items VALUES (5, "ワイヤレスイヤホン ハイブリッドANC搭載 40dBまでアクティブノイズキャンセル");
INSERT INTO stock_items VALUES (6, "Lightweight Bluetooth Earbuds with 48 Hours Playtime");
INSERT INTO stock_items VALUES (7, "True Wireless Noise Cancelling Earbuds - Compatible with Apple & Android, Built-in Microphone");
INSERT INTO stock_items VALUES (8, "In-Ear Earbud Headphones with Mic, Black");
INSERT INTO stock_items VALUES (9, "Wired Headphones, HD Bass Driven Audio, Lightweight Aluminum Wired in Ear Earbud Headphones");
INSERT INTO stock_items VALUES (10, "LED Light Bar, Music Sync RGB Light Bar, USB Ambient Lamp");
INSERT INTO stock_items VALUES (11, "无线消噪耳机-黑色 手势触控蓝牙降噪 主动降噪头戴式耳机(智能降噪 长久续航)");
INSERT INTO stock_items VALUES (12, "专业版USB7.1声道游戏耳机电竞耳麦头戴式电脑网课办公麦克风带线控");
INSERT INTO stock_items VALUES (13, "投影仪家用智能投影机便携卧室手机投影");
INSERT INTO stock_items VALUES (14, "无线蓝牙耳机超长续航42小时快速充电 流光金属耳机");
INSERT INTO stock_items VALUES (15, "皎月银 国家补贴 澎湃OS 2 心率血氧监测 蓝牙通话 智能手表 男女表");
mysql> SELECT * FROM INFORMATION_SCHEMA.TIFLASH_INDEXES;
+---------------+-------------+----------+-------------+------------+-----------+----------+------------+---------------------+-------------------------+--------------------+------------------------+---------------+------------------+
| TIDB_DATABASE | TIDB_TABLE  | TABLE_ID | COLUMN_NAME | INDEX_NAME | COLUMN_ID | INDEX_ID | INDEX_KIND | ROWS_STABLE_INDEXED | ROWS_STABLE_NOT_INDEXED | ROWS_DELTA_INDEXED | ROWS_DELTA_NOT_INDEXED | ERROR_MESSAGE | TIFLASH_INSTANCE |
+---------------+-------------+----------+-------------+------------+-----------+----------+------------+---------------------+-------------------------+--------------------+------------------------+---------------+------------------+
| test          | stock_items |      112 | title       | title      |         2 |        1 | FullText   |                   0 |                       0 |                  0 |                     15 |               | 127.0.0.1:3930   |
+---------------+-------------+----------+-------------+------------+-----------+----------+------------+---------------------+-------------------------+--------------------+------------------------+---------------+------------------+
1 row in set (0.00 sec)

mysql> alter table stock_items compact;
Query OK, 0 rows affected (0.01 sec)

mysql> SELECT * FROM INFORMATION_SCHEMA.TIFLASH_INDEXES;
+---------------+-------------+----------+-------------+------------+-----------+----------+------------+---------------------+-------------------------+--------------------+------------------------+---------------+------------------+
| TIDB_DATABASE | TIDB_TABLE  | TABLE_ID | COLUMN_NAME | INDEX_NAME | COLUMN_ID | INDEX_ID | INDEX_KIND | ROWS_STABLE_INDEXED | ROWS_STABLE_NOT_INDEXED | ROWS_DELTA_INDEXED | ROWS_DELTA_NOT_INDEXED | ERROR_MESSAGE | TIFLASH_INSTANCE |
+---------------+-------------+----------+-------------+------------+-----------+----------+------------+---------------------+-------------------------+--------------------+------------------------+---------------+------------------+
| test          | stock_items |      112 | title       | title      |         2 |        1 | FullText   |                  15 |                       0 |                  0 |                      0 |               | 127.0.0.1:3930   |
+---------------+-------------+----------+-------------+------------+-----------+----------+------------+---------------------+-------------------------+--------------------+------------------------+---------------+------------------+
1 row in set (0.00 sec)

mysql> show indexes from stock_items;
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+-----------+--------+
| Table       | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression | Clustered | Global |
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+-----------+--------+
| stock_items |          1 | title    |            1 | title       | A         |           0 |     NULL | NULL   | YES  | FULLTEXT   |         |               | YES     | NULL       | NO        | NO     |
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+-----------+--------+
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. sig/planner SIG: Planner labels Apr 22, 2025
@breezewish breezewish changed the title *: Support building FULL TEXT index via TiFlash *: Support building FULLTEXT index Apr 22, 2025
Copy link

codecov bot commented Apr 22, 2025

Codecov Report

Attention: Patch coverage is 54.37500% with 146 lines in your changes missing coverage. Please review.

Project coverage is 75.0775%. Comparing base (3fc3e98) to head (343b15a).
Report is 24 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #60720        +/-   ##
================================================
+ Coverage   73.1447%   75.0775%   +1.9328%     
================================================
  Files          1716       1764        +48     
  Lines        475657     487247     +11590     
================================================
+ Hits         347918     365813     +17895     
+ Misses       106376      98696      -7680     
- Partials      21363      22738      +1375     
Flag Coverage Δ
integration 49.1007% <6.3694%> (?)
unit 72.5787% <54.0625%> (+0.2162%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.6553% <ø> (ø)
parser ∅ <ø> (∅)
br 62.6783% <ø> (+15.2248%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

switch indexOptions.Tp {
case ast.IndexTypeVector, ast.IndexTypeInverted:
// Accepted
case ast.IndexTypeFulltext:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which field do we use to check the storage that this index uses?

Copy link
Member Author

@breezewish breezewish Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PM has not decided the user behavior of TiCI yet, so it is not in this PR. This PR only covers necessary parts to use FTS using TiFlash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to latest discussion with PM, all columnar indexes (FTS, Vector, Inverted) will have an "advanced mode" parameter. When advanced mode is set, TiCI will be used (and required). When it is not set, TiFlash will be used (and required).

Advanced mode will be added later, because currently there lacks a syntax to specify parameter for columnar indexes. This will be implemented by @Lloyd-Pottiger

@breezewish breezewish changed the title *: Support building FULLTEXT index *: Support building FULLTEXT index | tidb-test=pr/2507 Apr 24, 2025
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 24, 2025
@XuHuaiyu
Copy link
Contributor

The expression part LGTM

@breezewish breezewish changed the title *: Support building FULLTEXT index | tidb-test=pr/2507 *: Support building FULLTEXT index Apr 24, 2025
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 25, 2025
Copy link

ti-chi-bot bot commented Apr 25, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-04-24 06:14:02.897339068 +0000 UTC m=+508986.709129448: ☑️ agreed by tangenta.
  • 2025-04-25 00:09:06.83896467 +0000 UTC m=+573490.650755050: ☑️ agreed by winoros.

Copy link

@yudongusa yudongusa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need document PR for this later

Copy link

ti-chi-bot bot commented Apr 25, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tangenta, winoros, XuHuaiyu, yudongusa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Apr 25, 2025
@ti-chi-bot ti-chi-bot bot merged commit 351445e into pingcap:master Apr 25, 2025
26 checks passed
@breezewish breezewish deleted the wenxuan/ks2 branch April 25, 2025 03:51
Kiran01bm pushed a commit to Kiran01bm/tidb that referenced this pull request Apr 29, 2025
budney pushed a commit to budney/tidb that referenced this pull request May 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner sig/vector size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants