Skip to content

MachinePool always ready causes endless scale-up attempts #5860

@maroche15

Description

@maroche15

/kind bug

What steps did you take and what happened:

The addition of #5537, MachinePools are able to reflect more up to date status than they previously were. However, we seem to be encountering an edge case where a scale-up attempt is partially fulfilled but is not completed due to capacity constraints. The scale-up attempt eventually times out due to the capacity constraints, but cluster-autoscaler + CAPZ endlessly try to continue to provision infrastructure.

  1. Drive workload so that scaling occurs
  2. Scaling starts to fail due to capacity constraints
  3. Attempts to provision machines in the constrained zones continue despite the scale-up timing out.
  4. This continues even when there are no longer pending pods driving the need for the scale-up attempt.

What did you expect to happen:

I expect the scale-up to time out correctly so that cluster-autoscaler can try to scale up other similar pools or just backoff for a period of time.

Anything else you would like to add:
Running the same tests on a release without #5537 works as expected where the scale-up is correctly abandoned and the cluster ceases trying to provision infrastructure in the capacity constrained zone. I've identified a possible fix on the cluster-autoscaler side but was thinking it might be more appropriate to try and fix a fix in CAPZ since the change stems from here.

Environment:

  • cluster-api-provider-azure version: 1.20.0 and a forked 1.17.5 with MachinePool: avoid SetNotReady during normal processing #5537 cherry-picked in.
  • Kubernetes version: (use kubectl version): Server Version: v1.30.5
  • OS (e.g. from /etc/os-release):
    PRETTY_NAME="Ubuntu 24.04.1 LTS"
    NAME="Ubuntu"
    VERSION_ID="24.04"
    VERSION="24.04.1 LTS (Noble Numbat)"
    VERSION_CODENAME=noble
    ID=ubuntu
    ID_LIKE=debian
    HOME_URL="https://www.ubuntu.com/"
    SUPPORT_URL="https://help.ubuntu.com/"
    BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
    PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
    UBUNTU_CODENAME=noble
    LOGO=ubuntu-logo
    

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions