Skip to content

Recovery time is not expected in slow store scheduler #9384

@rleungx

Description

@rleungx

Bug Report

We set the lastSlowStoreCaptureTS according to the time when we first detect the slow store.

func (conf *evictSlowStoreSchedulerConfig) readyForRecovery() bool {
conf.RLock()
defer conf.RUnlock()
recoveryDurationGap := conf.RecoveryDurationGap
failpoint.Inject("transientRecoveryGap", func() {
recoveryDurationGap = 0
})
return uint64(time.Since(conf.lastSlowStoreCaptureTS).Seconds()) >= recoveryDurationGap
}
func (conf *evictSlowStoreSchedulerConfig) setStoreAndPersist(id uint64) error {
conf.Lock()
defer conf.Unlock()
conf.EvictedStores = []uint64{id}
conf.lastSlowStoreCaptureTS = time.Now()
return conf.save()
}

If the recovery time is less than the jitter duration, after the slow store disappears, we will transfer leaders back immediately. Instead, we should wait for the recovery time before the transfer.

The first one is 10m and the second is 30m for recovery time, which looks similar in this case.
Image
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects-8.1This bug affects the 8.1.x(LTS) versions.affects-8.5This bug affects the 8.5.x(LTS) versions.severity/majortype/bugThe issue is confirmed as a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions