-
Notifications
You must be signed in to change notification settings - Fork 3.8k
fix(limits): Set ready when all partitions ready #18092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e2c8706
to
f5a3e24
Compare
This commit moves the partition readiness check after the service and lifecycler readiness checks. It does this because the partition readiness check cannot pass until the consumer is running, and the consumer.Run() goroutine is not started until running() is called. That means it makes sense to check the service state and the lifecycler before checking the partitions.
f5a3e24
to
6e0276f
Compare
This commit fixes a race condition where the maximum attempts is never satisifed. This race condition comes about as two (or more) goroutines can load and increment the atomic concurrently, which would cause the next Load() to load a value greater than the maximum number of attempts, breaking the equality check.
60ee2da
to
e962ba3
Compare
This commit fixes the race condition that exists between two (or more) goroutines concurrently executing CheckReady. What can happen is both goroutines call Load at the same time, do the check, and then increment the number of attempts without using check-and-set. This allowed the number of attempts to exceed the maximum number of attempts.
e962ba3
to
a6c2ffa
Compare
414fff0
to
4a8dabf
Compare
4a8dabf
to
865bc62
Compare
c12dffd
to
c83e653
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// The maximum amount of time to wait to join the consumer group and be | ||
// assigned some partitions before giving up and going ready in | ||
// [Service.CheckReady]. | ||
partitionReadinessWaitAssignPeriod = 30 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this so now the time that we wait is independent of:
- The readiness check interval
- The number of clients calling
/ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it:
This pull request makes the ingest-limits readiness handler to consider partition state before signaling ready. I.e. the service is considered to be ready and consume traffic when the following conditions apply:
Which issue(s) this PR fixes:
Fixes #grafana/loki-private/issues/1722
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR