Skip to content

Conversation

aadit-juneja
Copy link

@aadit-juneja aadit-juneja commented Sep 12, 2025

to let user check if sandbox has been scheduled to a worker

UPDATED: replacing wait_for_scheduled name with wait_until_running

Describe your changes

This PR provides a user facing API to check if their sandbox has been assigned onto a worker (Linear Issue ART-5 in agent runtime).
It exposes a wait_until_running() function that can be utilized to check if the sandbox has been assigned to a worker, raising an error if no worker is assigned and returning None if a worker gets assigned.

Checklists

Compatibility checklist

Check these boxes or delete any item (or this section) if not relevant for this PR.

  • Client+Server: this change is compatible with old servers
  • Client forward compatibility: this change ensures client can accept data intended for later versions of itself

Note on protobuf: protobuf message changes in one place may have impact to
multiple entities (client, server, worker, database). See points above.


Changelog

  • Add a wait_until_running() function to Sandbox class that wraps the _get_task_id() function for a discoverable method to check if your sandbox has been assigned to a worker.
import modal
import time
app = modal.App.lookup("my-app", create_if_missing=True)

sb = modal.Sandbox.create(app=app)
s=time.time()
sb.wait_until_running()
e=time.time()
print(f"Sandbox assigned in {e-s:.2f} seconds"

@mwaskom
Copy link
Contributor

mwaskom commented Sep 12, 2025

It exposes a wait_for_scheduled() function that can be utilized to check if the sandbox has been assigned to a worker, raising an error if no worker is assigned and returning None if a worker gets assigned.

This is a little confusing to me. The name implies that the function will block until the Sandbox has been scheduled. So under what circumstances does it raise?

@mwaskom
Copy link
Contributor

mwaskom commented Sep 12, 2025

Did you consider making this a parameter of modal.Sandbox.create?

@aadit-juneja
Copy link
Author

Oh my bad, I initially thought retry_transient_errors always enforced a timeout, but that isn't the case.

I've added support for a wait_for_scheduled boolean parameter in the modal.Sandbox.create method. This gets passed to the _create method.

modal/sandbox.py Outdated
@@ -320,6 +320,7 @@ async def create(
] = None, # Experimental controls over fine-grained scheduling (alpha).
client: Optional[_Client] = None,
environment_name: Optional[str] = None, # *DEPRECATED* Optionally override the default environment
wait_for_scheduled: bool = False, # Enables waiting for the sandbox to be assigned to worker before returning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would wait_until_running be a better name? I'm having trouble parsing wait_for_scheduled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah wait_for_scheduled could be confusing. I will change it to wait_until_running.

@aadit-juneja aadit-juneja changed the title add wait_for_scheduled() function to let user check if sandbox has be… add wait_until_running() function to let user check if sandbox has be… Sep 12, 2025
modal/sandbox.py Outdated
@@ -320,7 +320,7 @@ async def create(
] = None, # Experimental controls over fine-grained scheduling (alpha).
client: Optional[_Client] = None,
environment_name: Optional[str] = None, # *DEPRECATED* Optionally override the default environment
wait_for_scheduled: bool = False, # Enables waiting for the sandbox to be assigned to worker before returning
wait_until_running: bool = False, # Enable waiting for the sandbox to be assigned to a worker before returning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wait_until_running: bool = False, # Enable waiting for the sandbox to be assigned to a worker before returning
wait_until_running: bool = False, # Wait for the sandbox to start running before returning

The "worker" concept is part of our internal ontology but not something we really expose to users. And it's redundant to say "Enable ..." when you have a boolean parameter

Copy link
Author

@aadit-juneja aadit-juneja Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I can get rid of the references to a worker in the comments and the use of enable.

modal/sandbox.py Outdated
@@ -706,8 +706,8 @@ async def _get_task_id(self) -> str:
await asyncio.sleep(0.5)
return self._task_id

async def wait_for_scheduled(self) -> None:
"""Allows user to wait for the sandbox to be scheduled onto a worker."""
async def wait_until_running(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a usecase for a public method here if we're going to have a boolean for it in the constructor? Could the constructor call self._get_task_id() directly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. I can get rid of the extra method and just directly call get_task_id() in the _create() function.

@paulgb
Copy link
Member

paulgb commented Sep 15, 2025

We discussed this on the Agent Runtime team and are in favor of this being a separate method rather than a sandbox parameter for these reasons:

  • The goal here is to increase the discoverability of the functionality. The more intuitive place for someone to look is in the tab-completion list of a sandbox instance, rather than in the constructor IMHO.
  • The argument list of a constructor typically takes attributes of that object; adding additional blocking behavior based on it feels unintuitive.
  • It doesn't support the pattern of:
    • Start a sandbox as part of initialization and await to ensure that the request is processed
    • Do some work unrelated to the sandbox
    • Wait for the sandbox to start

My vote is to remove the constructor arg and support only the method.

@paulgb paulgb changed the title add wait_until_running() function to let user check if sandbox has be… add wait_until_running() function [ART-5] Sep 15, 2025
@mwaskom
Copy link
Contributor

mwaskom commented Sep 15, 2025

OK — FWIW I actually think that I would have expected modal.Sandbox.create to block until the Sandbox is actually running as its default behavior. That aligns with my naive assumption of what it means to "create" a Sandbox: it starts a container running somewhere.

Not sure if it's weird to have modal.Sandbox.wait() and modal.Sandbox.wait_until_running() methods; probably in retrospect you'd rather have wait_until_running / wait_until_finished?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants