Skip to content

Conversation

danielshaar
Copy link
Contributor

@danielshaar danielshaar commented Sep 17, 2025

Describe your changes

We have a bug in which we create excessive input plane attempts due to accidentally submitting AttemptRetry on every AttemptAwait timeout of 55s. We haven't caught it because most input plane calls take less than a minute. This will be patched server-side.

Also fixes a separate bug where internal failures can eat into client specified retries.

SVC-863

davidxia added a commit that referenced this pull request Sep 18, 2025
> Checks for functions with a high McCabe complexity

https://docs.astral.sh/ruff/rules/complex-structure/

Motivated by bugs like the one introduced in #3456 and fixed
in #3565.

Signed-off-by: David Xia <[email protected]>
@davidxia
Copy link
Member

@cursor review

cursor[bot]

This comment was marked as outdated.

@danielshaar danielshaar force-pushed the dshaar/fix-input-plane-retries branch from 7772425 to 41f9c58 Compare September 18, 2025 19:54
Comment on lines -319 to -320
# AttemptAwait will return a failure until this is 0. It is decremented by 1 each time AttemptAwait is called.
self.attempt_await_failures_remaining = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored unit tests to not require this attribute.

@@ -91,7 +322,7 @@ def test_map(client, servicer, slow_put_inputs):

@pytest.mark.parametrize("slow_put_inputs", [False, True])
@pytest.mark.timeout(120)
def test_map_inputplane(client, servicer, slow_put_inputs):
def test_map_input_plane(client, servicer, slow_put_inputs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change and below are unrelated and simply fix camel-casing of "input plane" to be consistent.

assert servicer.attempt_await_failures_remaining == 0


def test_retry_limit_on_internal_error(client: Client, servicer: MockClientServicer, monkeypatch):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -35,25 +32,3 @@ def test_lookup_foo(client: Client, servicer: MockClientServicer):
assert f.remote() == "foo"
assert f._get_metadata().input_plane_url is not None
assert f._get_metadata().input_plane_region == "us-east"


def test_retry_on_internal_error(client: Client, servicer: MockClientServicer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidxia
Copy link
Member

I manually tested many of the unit test cases against prod. The client behaved as I expected.

),
],
)
def test_remote_input_plane(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reduce the duration of some of these tests?

60.53s call     test/function_test.py::test_remote_input_plane[long running]
7.43s call     test/function_test.py::test_remote_input_plane[honor user retry policy]
7.41s call     test/function_test.py::test_remote_input_plane[internal failure does not use up user retries]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants