Skip to content

lightning failed with '503 Service Unavailable' when inject pd leader network partition #49744

@Lily2025

Description

@Lily2025

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1、run lightning
2、inject pd leader network partition from all other pod

2. What did you expect to see? (Required)

lightning can success whenpd leader network partition

3. What did you see instead (Required)

lightning failed when inject pd leader network partition

Verbose debug logs will be written to /tmp/lightning.log.2023-12-23T02.30.44Z
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| # | CHECK ITEM | TYPE | PASSED |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 1 | Source data files size is proper | performance | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 2 | the checkpoints are valid | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 3 | table schemas are valid | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 4 | all importing tables on the target are empty | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 5 | Cluster version check passed | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 6 | Lightning has the correct storage permission | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 7 | local disk resources are rich, estimate sorted data size 26.05GiB, local available is 3.399TiB | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 8 | The storage space is rich, which TiKV/Tiflash is 5.368TiB/0B. The estimated storage space is 78.16GiB/0B. | performance | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 9 | Cluster doesn't have too many empty regions | performance | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 10 | Cluster region distribution is balanced | performance | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 11 | no CDC or PiTR task found | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
{"level":"warn","ts":"2023-12-23T02:32:12.363771Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001f321c0/tc-pd-2.tc-pd-peer.ha-test-lightning-tps-5340957-1-769.svc:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2023-12-23T02:32:12.364291Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001ce6000/tc-pd:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2023-12-23T02:32:28.370028Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001f321c0/tc-pd-2.tc-pd-peer.ha-test-lightning-tps-5340957-1-769.svc:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2023-12-23T02:32:48.400951Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001ce6000/tc-pd:2379","attempt":0,"error":"rpc error: code = Unknown desc = context deadline exceeded"}
{"level":"warn","ts":"2023-12-23T02:35:54.402124Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001f321c0/tc-pd-2.tc-pd-peer.ha-test-lightning-tps-5340957-1-769.svc:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
tidb lightning encountered error: [Lightning:Restore:ErrRestoreTable]restore table sysbench.user_data1 failed: request pd http api failed with status: '503 Service Unavailable'

tidb logs:
tidb-0.log
tidb-1.log

request pd http api failed with status: '503 Service Unavailable'
github.com/tikv/pd/client/http.(*clientInner).doRequest
/go/pkg/mod/github.com/tikv/pd/[email protected]/http/client.go:242
github.com/tikv/pd/client/http.(*clientInner).requestWithRetry
/go/pkg/mod/github.com/tikv/pd/[email protected]/http/client.go:156
github.com/tikv/pd/client/http.(*client).request
/go/pkg/mod/github.com/tikv/pd/[email protected]/http/client.go:414
github.com/tikv/pd/client/http.(*client).SetRegionLabelRule
/go/pkg/mod/github.com/tikv/pd/[email protected]/http/interface.go:540
github.com/pingcap/tidb/br/pkg/pdutil.pauseSchedulerByKeyRangeWithTTL
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/pdutil/pd.go:1002
github.com/pingcap/tidb/br/pkg/pdutil.PauseSchedulersByKeyRange
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/pdutil/pd.go:970
github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).ImportEngine
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:1564
github.com/pingcap/tidb/br/pkg/lightning/backend.(*ClosedEngine).Import
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/backend.go:366
github.com/pingcap/tidb/br/pkg/lightning/importer.(*TableImporter).importKV
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/importer/table_import.go:1343
github.com/pingcap/tidb/br/pkg/lightning/importer.(*TableImporter).importEngine
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/importer/table_import.go:919
github.com/pingcap/tidb/br/pkg/lightning/importer.(*TableImporter).importEngines.func3
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/importer/table_import.go:525
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1650

4. What is your TiDB version? (Required)

./tidb-server -V
Release Version: v7.6.0-alpha
Edition: Community
Git Commit Hash: 5c279d8
Git Branch: heads/refs/tags/v7.6.0-alpha
UTC Build Time: 2023-12-21 07:56:10
GoVersion: go1.21.5
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2023-12-23T04:16:24.782+0800

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions