-
Notifications
You must be signed in to change notification settings - Fork 93
Description
We have discussed some comparisons of other schedulers (#29).
I think it would be worth describing how a kueue integration would work.
KAI could support Jobs/Jobset/Pytorch jobs without much effort for Kueue.
For KAI support of services I think #63 is needed.
To expand on batch jobs, I think one needs to investigate if it is possible to use Kueue's ClusterQueues/LocalQueues in place of KAI Queues. To put it simple, Kueue integration (sans Topology Aware Scheduling) could be that Kueue handles queueing and resuming workloads once their is capacity in the cluster (queueing) and KAI can handle scheduling.
For KAI maintainers, the main request would be to figure out what would be lost if KAI's queueing logic was folded into Kueue. Is there anything missing in Kueue that would not allow KAI to utilize Kueue for queueing while leaving scheduling for KAI?