-
Notifications
You must be signed in to change notification settings - Fork 243
batch_client: auto re-connect idle connection when wait connection ready #833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: crazycs520 <[email protected]>
@@ -555,6 +555,8 @@ func (c *batchCommandsClient) waitConnReady() (err error) { | |||
cancel() | |||
break | |||
} | |||
// Trigger idle connection to reconnection | |||
c.conn.Connect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be an experimental API which may not be stable enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, it seems each time the API is triggered a new goroutine would be created for each sub-channel which could be a risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. For now, I've only found Connect
API to work, I try to use the TiKV RPC interface such as heartbeat or health check RPC API, but I didn't find such an interface.
Anyway, I just added a test about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually GetState
and WaitForStateChange
are both experimental, I think it's ok to use Connect
here. How about to trigger Connect
only once (maybe just before the for loop) and only when the state is IDLE (I'm not sure if we need to connect when the state is TRANSIENT_FAILURE since there is an internal backoff for reconnecting by grpc itself)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connect
API internal already checks the state, and only do connect when state is idle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WaitForStateChange
has been used for some time and the risk is relatively low.
Maybe we'd better check the gRPC client source code in detail about the state management and how could the Connect
impact the underlying gRPC logic, and make sure the risk is acceptable and the usage is correct.
Signed-off-by: crazycs520 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that there is no standard operation for this reconnection. The transient failure state will automatically be converted to connecting for retry and backoff internally. I have approved it for now, but we need to continue monitoring whether there are any abnormalities in terms of availability testing, stress testing, and disaster recovery testing. In addition, we also need to further organize and improve the code of batch client/region cache for maintainability.
release 7.1 doesn't have this issue, so no need to cherry-pick to tidb-7.1 |
No description provided.