-
Notifications
You must be signed in to change notification settings - Fork 458
Description
/kind bug
What steps did you take and what happened:
The addition of #5537, MachinePools are able to reflect more up to date status than they previously were. However, we seem to be encountering an edge case where a scale-up attempt is partially fulfilled but is not completed due to capacity constraints. The scale-up attempt eventually times out due to the capacity constraints, but cluster-autoscaler + CAPZ endlessly try to continue to provision infrastructure.
- Drive workload so that scaling occurs
- Scaling starts to fail due to capacity constraints
- Attempts to provision machines in the constrained zones continue despite the scale-up timing out.
- This continues even when there are no longer pending pods driving the need for the scale-up attempt.
What did you expect to happen:
I expect the scale-up to time out correctly so that cluster-autoscaler can try to scale up other similar pools or just backoff for a period of time.
Anything else you would like to add:
Running the same tests on a release without #5537 works as expected where the scale-up is correctly abandoned and the cluster ceases trying to provision infrastructure in the capacity constrained zone. I've identified a possible fix on the cluster-autoscaler side but was thinking it might be more appropriate to try and fix a fix in CAPZ since the change stems from here.
Environment:
- cluster-api-provider-azure version:
1.20.0
and a forked1.17.5
with MachinePool: avoid SetNotReady during normal processing #5537 cherry-picked in. - Kubernetes version: (use
kubectl version
):Server Version: v1.30.5
- OS (e.g. from
/etc/os-release
):PRETTY_NAME="Ubuntu 24.04.1 LTS" NAME="Ubuntu" VERSION_ID="24.04" VERSION="24.04.1 LTS (Noble Numbat)" VERSION_CODENAME=noble ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=noble LOGO=ubuntu-logo
Metadata
Metadata
Assignees
Labels
Type
Projects
Status