Skip to content

In PhysicalExchangeSender::ResolveIndicesItself, the HashCols is shallow copied instead of deep copied, which may cause random mpp issues #60517

@windtalker

Description

@windtalker

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

func (p *PhysicalExchangeSender) ResolveIndicesItself() (err error) {
for i, col := range p.HashCols {
colExpr, err1 := col.Col.ResolveIndices(p.Children()[0].Schema())
if err1 != nil {
return err1
}
p.HashCols[i].Col, _ = colExpr.(*expression.Column)
}
return
}

As the code shows in PhysicalExchangeSender::ResolveIndicesItself, p.HashCols[i] is not newly constructed, instead, only p.HashCols[i].Col is deep copied, which is not enough since p.HashCols[i] is not deep copied, so if multiple PhysicalExchangeSender shares the same HashCols, the later one will overwrite the previous one.
This is a bug is introduced by #26789 and should be in TiDB since v5.2, once triggered, it will cause serious result for mpp query(TiFlash crash, or mpp query get wrong result). Fortunately, this bug is relatively hard to trigger, it should at least meet the following conditions:

  • more than 1 partition join is involved in the plan
  • the join key must be large than 1
  • join keys are the same column for more than 1 join
  • the join key should have different types in different tables(otherwise there is no extra exchange sender)

A simplest query to produce this error should be

mysql> use test;
Database changed
mysql> create table t1(id int, v1 int, v2 int, v3 int);
Query OK, 0 rows affected (0.06 sec)

mysql> create table t2(id int, v1 int, v2 int, v3 int, v4 int, v5 int);
Query OK, 0 rows affected (0.06 sec)

mysql> create table t3(id int, v1 bigint, v2 bigint, v3 bigint, v4 bigint);
Query OK, 0 rows affected (0.06 sec)

mysql> alter table t1 set tiflash replica 1;
Query OK, 0 rows affected (0.07 sec)

mysql> alter table t2 set tiflash replica 1;
Query OK, 0 rows affected (0.07 sec)

mysql> alter table t3 set tiflash replica 1;
Query OK, 0 rows affected (0.06 sec)

mysql> set tidb_broadcast_join_threshold_count=0;
Query OK, 0 rows affected (0.00 sec)

mysql> set tidb_broadcast_join_threshold_size=0;
Query OK, 0 rows affected (0.00 sec)

mysql> set tidb_isolation_read_engines=tiflash;
Query OK, 0 rows affected (0.00 sec)

mysql> explain select t1.id from t1 join t2 on t1.id = t2.id and t1.v1 = t2.v1 join t3 on t1.id = t3.id and t1.v1 = t3.v1;
+------------------------------------------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------+
| id                                             | estRows  | task         | access object | operator info                                                                                                    |
+------------------------------------------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------+
| Projection_12                                  | 15593.77 | root         |               | test.t1.id                                                                                                       |
| └─TableReader_36                               | 15593.77 | root         |               | data:ExchangeSender_35                                                                                           |
|   └─ExchangeSender_35                          | 15593.77 | mpp[tiflash] |               | ExchangeType: PassThrough                                                                                        |
|     └─HashJoin_13                              | 15593.77 | mpp[tiflash] |               | inner join, equal:[eq(test.t1.id, test.t3.id) eq(test.t1.v1, test.t3.v1)]                                        |
|       ├─ExchangeReceiver_31(Build)             | 9980.01  | mpp[tiflash] |               |                                                                                                                  |
|       │ └─ExchangeSender_30                    | 9980.01  | mpp[tiflash] |               | ExchangeType: HashPartition, Hash Cols: [name: test.t3.id, collate: binary], [name: test.t3.v1, collate: binary] |
|       │   └─Selection_29                       | 9980.01  | mpp[tiflash] |               | not(isnull(test.t3.id)), not(isnull(test.t3.v1))                                                                 |
|       │     └─TableFullScan_28                 | 10000.00 | mpp[tiflash] | table:t3      | keep order:false, stats:pseudo                                                                                   |
|       └─ExchangeReceiver_34(Probe)             | 12475.01 | mpp[tiflash] |               |                                                                                                                  |
|         └─ExchangeSender_33                    | 12475.01 | mpp[tiflash] |               | ExchangeType: HashPartition, Hash Cols: [name: test.t2.id, collate: binary], [name: Column#19, collate: binary]  |
|           └─Projection_32                      | 12475.01 | mpp[tiflash] |               | test.t1.id, test.t1.v1, test.t2.id, test.t2.v1, cast(test.t2.v1, bigint(20))->Column#19                          |
|             └─HashJoin_16                      | 12475.01 | mpp[tiflash] |               | inner join, equal:[eq(test.t1.id, test.t2.id) eq(test.t1.v1, test.t2.v1)]                                        |
|               ├─ExchangeReceiver_21(Build)     | 9980.01  | mpp[tiflash] |               |                                                                                                                  |
|               │ └─ExchangeSender_20            | 9980.01  | mpp[tiflash] |               | ExchangeType: HashPartition, Hash Cols: [name: test.t1.id, collate: binary], [name: test.t1.v1, collate: binary] |
|               │   └─Selection_19               | 9980.01  | mpp[tiflash] |               | not(isnull(test.t1.id)), not(isnull(test.t1.v1))                                                                 |
|               │     └─TableFullScan_18         | 10000.00 | mpp[tiflash] | table:t1      | keep order:false, stats:pseudo                                                                                   |
|               └─ExchangeReceiver_25(Probe)     | 9980.01  | mpp[tiflash] |               |                                                                                                                  |
|                 └─ExchangeSender_24            | 9980.01  | mpp[tiflash] |               | ExchangeType: HashPartition, Hash Cols: [name: test.t2.id, collate: binary], [name: test.t2.v1, collate: binary] |
|                   └─Selection_23               | 9980.01  | mpp[tiflash] |               | not(isnull(test.t2.id)), not(isnull(test.t2.v1))                                                                 |
|                     └─TableFullScan_22         | 10000.00 | mpp[tiflash] | table:t2      | keep order:false, stats:pseudo                                                                                   |
+------------------------------------------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------+
20 rows in set (0.00 sec)

mysql> insert into t1 values(1,2,3,4);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t2 values(1,2,3,4,5,6);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t3 values(1,2,3,4,5);
Query OK, 1 row affected (0.00 sec)

mysql> explain select t1.id from t1 join t2 on t1.id = t2.id and t1.v1 = t2.v1 join t3 on t1.id = t3.id and t1.v1 = t3.v1;
+------------------------------------------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------+
| id                                             | estRows  | task         | access object | operator info                                                                                                    |
+------------------------------------------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------+
| Projection_12                                  | 15593.77 | root         |               | test.t1.id                                                                                                       |
| └─TableReader_36                               | 15593.77 | root         |               | data:ExchangeSender_35                                                                                           |
|   └─ExchangeSender_35                          | 15593.77 | mpp[tiflash] |               | ExchangeType: PassThrough                                                                                        |
|     └─HashJoin_13                              | 15593.77 | mpp[tiflash] |               | inner join, equal:[eq(test.t1.id, test.t3.id) eq(test.t1.v1, test.t3.v1)]                                        |
|       ├─ExchangeReceiver_31(Build)             | 9980.01  | mpp[tiflash] |               |                                                                                                                  |
|       │ └─ExchangeSender_30                    | 9980.01  | mpp[tiflash] |               | ExchangeType: HashPartition, Hash Cols: [name: test.t3.id, collate: binary], [name: test.t3.v1, collate: binary] |
|       │   └─Selection_29                       | 9980.01  | mpp[tiflash] |               | not(isnull(test.t3.id)), not(isnull(test.t3.v1))                                                                 |
|       │     └─TableFullScan_28                 | 10000.00 | mpp[tiflash] | table:t3      | keep order:false, stats:pseudo                                                                                   |
|       └─ExchangeReceiver_34(Probe)             | 12475.01 | mpp[tiflash] |               |                                                                                                                  |
|         └─ExchangeSender_33                    | 12475.01 | mpp[tiflash] |               | ExchangeType: HashPartition, Hash Cols: [name: test.t2.id, collate: binary], [name: Column#19, collate: binary]  |
|           └─Projection_32                      | 12475.01 | mpp[tiflash] |               | test.t1.id, test.t1.v1, test.t2.id, test.t2.v1, cast(test.t2.v1, bigint(20))->Column#19                          |
|             └─HashJoin_16                      | 12475.01 | mpp[tiflash] |               | inner join, equal:[eq(test.t1.id, test.t2.id) eq(test.t1.v1, test.t2.v1)]                                        |
|               ├─ExchangeReceiver_21(Build)     | 9980.01  | mpp[tiflash] |               |                                                                                                                  |
|               │ └─ExchangeSender_20            | 9980.01  | mpp[tiflash] |               | ExchangeType: HashPartition, Hash Cols: [name: test.t1.id, collate: binary], [name: test.t1.v1, collate: binary] |
|               │   └─Selection_19               | 9980.01  | mpp[tiflash] |               | not(isnull(test.t1.id)), not(isnull(test.t1.v1))                                                                 |
|               │     └─TableFullScan_18         | 10000.00 | mpp[tiflash] | table:t1      | keep order:false, stats:pseudo                                                                                   |
|               └─ExchangeReceiver_25(Probe)     | 9980.01  | mpp[tiflash] |               |                                                                                                                  |
|                 └─ExchangeSender_24            | 9980.01  | mpp[tiflash] |               | ExchangeType: HashPartition, Hash Cols: [name: test.t2.id, collate: binary], [name: test.t2.v1, collate: binary] |
|                   └─Selection_23               | 9980.01  | mpp[tiflash] |               | not(isnull(test.t2.id)), not(isnull(test.t2.v1))                                                                 |
|                     └─TableFullScan_22         | 10000.00 | mpp[tiflash] | table:t2      | keep order:false, stats:pseudo                                                                                   |
+------------------------------------------------+----------+--------------+---------------+------------------------------------------------------------------------------------------------------------------+
20 rows in set (0.00 sec)

mysql>  select t1.id from t1 join t2 on t1.id = t2.id and t1.v1 = t2.v1 join t3 on t1.id = t3.id and t1.v1 = t3.v1;
+------+
| id   |
+------+
|    1 |
+------+
1 row in set (0.01 sec)

mysql>  select t1.id from t1 join t2 on t1.id = t2.id and t1.v1 = t2.v1 join t3 on t1.id = t3.id and t1.v1 = t3.v1;
ERROR 1105 (HY000): rpc error: code = Unavailable desc = error reading from server: EOF

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects-6.1This bug affects the 6.1.x(LTS) versions.affects-6.5This bug affects the 6.5.x(LTS) versions.affects-7.1This bug affects the 7.1.x(LTS) versions.affects-7.5This bug affects the 7.5.x(LTS) versions.affects-8.1This bug affects the 8.1.x(LTS) versions.affects-8.5This bug affects the 8.5.x(LTS) versions.severity/criticalsig/executionSIG executiontype/bugThe issue is confirmed as a bug.type/regression

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions