batch_client: auto re-connect idle connection when wait connection ready #833

crazycs520 · 2023-06-09T11:10:34Z

No description provided.

Signed-off-by: crazycs520 <[email protected]>

cfzjywxk · 2023-06-09T12:16:23Z

internal/client/client_batch.go

@@ -555,6 +555,8 @@ func (c *batchCommandsClient) waitConnReady() (err error) {
 			cancel()
 			break
 		}
+		// Trigger idle connection to reconnection
+		c.conn.Connect()


This seems to be an experimental API which may not be stable enough?

Besides, it seems each time the API is triggered a new goroutine would be created for each sub-channel which could be a risk.

Nice catch. For now, I've only found Connect API to work, I try to use the TiKV RPC interface such as heartbeat or health check RPC API, but I didn't find such an interface.

Anyway, I just added a test about this.

Actually GetState and WaitForStateChange are both experimental, I think it's ok to use Connect here. How about to trigger Connect only once (maybe just before the for loop) and only when the state is IDLE (I'm not sure if we need to connect when the state is TRANSIENT_FAILURE since there is an internal backoff for reconnecting by grpc itself)?

Connect API internal already checks the state, and only do connect when state is idle.

WaitForStateChange has been used for some time and the risk is relatively low.
Maybe we'd better check the gRPC client source code in detail about the state management and how could the Connect impact the underlying gRPC logic, and make sure the risk is acceptable and the usage is correct.

Signed-off-by: crazycs520 <[email protected]>

cfzjywxk

It seems that there is no standard operation for this reconnection. The transient failure state will automatically be converted to connecting for retry and backoff internally. I have approved it for now, but we need to continue monitoring whether there are any abnormalities in terms of availability testing, stress testing, and disaster recovery testing. In addition, we also need to further organize and improve the code of batch client/region cache for maintainability.

crazycs520 · 2023-08-07T10:49:18Z

release 7.1 doesn't have this issue, so no need to cherry-pick to tidb-7.1

batch_client: auto re-connect idle connection when wait connection ready

067ec93

Signed-off-by: crazycs520 <[email protected]>

crazycs520 mentioned this pull request Jun 9, 2023

client-go: update tikv client-go for batch_client auto re-connect idle connection when wait connection ready pingcap/tidb#44561

Merged

12 tasks

zyguan approved these changes Jun 9, 2023

View reviewed changes

cfzjywxk reviewed Jun 9, 2023

View reviewed changes

add test

607af7d

Signed-off-by: crazycs520 <[email protected]>

cfzjywxk approved these changes Jun 12, 2023

View reviewed changes

cfzjywxk merged commit 3a1f16a into tikv:tidb-6.5 Jun 12, 2023

This was referenced Jun 12, 2023

client_batch: tiny optimize #834

Merged

client_batch: add test for auto re-connect idle connection when wait connection ready and fix ci #835

Merged

crazycs520 mentioned this pull request Aug 7, 2023

storage: backport the stale read enhancement and bug fix to release 6.5 pingcap/tidb#43481

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

batch_client: auto re-connect idle connection when wait connection ready #833

batch_client: auto re-connect idle connection when wait connection ready #833

Uh oh!

crazycs520 commented Jun 9, 2023

Uh oh!

cfzjywxk Jun 9, 2023

Uh oh!

cfzjywxk Jun 9, 2023 •

edited

Loading

Uh oh!

crazycs520 Jun 9, 2023

Uh oh!

zyguan Jun 10, 2023 •

edited

Loading

Uh oh!

crazycs520 Jun 10, 2023

Uh oh!

cfzjywxk Jun 10, 2023 •

edited

Loading

Uh oh!

cfzjywxk left a comment

Uh oh!

crazycs520 commented Aug 7, 2023

Uh oh!

Uh oh!

batch_client: auto re-connect idle connection when wait connection ready #833

batch_client: auto re-connect idle connection when wait connection ready #833

Uh oh!

Conversation

crazycs520 commented Jun 9, 2023

Uh oh!

cfzjywxk Jun 9, 2023

Choose a reason for hiding this comment

Uh oh!

cfzjywxk Jun 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crazycs520 Jun 9, 2023

Choose a reason for hiding this comment

Uh oh!

zyguan Jun 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crazycs520 Jun 10, 2023

Choose a reason for hiding this comment

Uh oh!

cfzjywxk Jun 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cfzjywxk left a comment

Choose a reason for hiding this comment

Uh oh!

crazycs520 commented Aug 7, 2023

Uh oh!

Uh oh!

cfzjywxk Jun 9, 2023 •

edited

Loading

zyguan Jun 10, 2023 •

edited

Loading

cfzjywxk Jun 10, 2023 •

edited

Loading