Skip to content

Integration with Kueue #68

@kannon92

Description

@kannon92

We have discussed some comparisons of other schedulers (#29).

I think it would be worth describing how a kueue integration would work.

KAI could support Jobs/Jobset/Pytorch jobs without much effort for Kueue.

For KAI support of services I think #63 is needed.

To expand on batch jobs, I think one needs to investigate if it is possible to use Kueue's ClusterQueues/LocalQueues in place of KAI Queues. To put it simple, Kueue integration (sans Topology Aware Scheduling) could be that Kueue handles queueing and resuming workloads once their is capacity in the cluster (queueing) and KAI can handle scheduling.

For KAI maintainers, the main request would be to figure out what would be lost if KAI's queueing logic was folded into Kueue. Is there anything missing in Kueue that would not allow KAI to utilize Kueue for queueing while leaving scheduling for KAI?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions