Skip to content

Conversation

sanketkedia
Copy link
Contributor

@sanketkedia sanketkedia commented Sep 5, 2025

Description of changes

  • Improvements & Bug fixes
    • This PR adds client side retries to the python client in case of retryable errors from the FE
  • New functionality
    • ...

Test plan

tested in tilt

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

None. Need to send out a notice to upgrade clients

Observability plan

In tilt and staging

Documentation Changes

None

Copy link
Contributor Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@sanketkedia sanketkedia mentioned this pull request Sep 5, 2025
1 task
Copy link

github-actions bot commented Sep 5, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@sanketkedia sanketkedia marked this pull request as ready for review September 5, 2025 00:19
Copy link
Contributor

Add Client-Side Retries to Python FastAPI Client

This PR introduces client-side automatic retry logic for the Python FastAPI client in Chroma. Requests that encounter retryable network and server-side errors (such as transient connection failures or HTTP 502/503/504 errors from the FE) will now be retried with an exponential backoff strategy up to three times. The retry mechanism uses the tenacity library and applies to all HTTP requests issued by the client.

Key Changes

• Introduce is_retryable_exception function to classify exceptions that should trigger retries.
• Wrap the internal _make_request method with tenacity-based retry logic: 3 attempts, exponential backoff (start at 1s, max 60s), retries only for specific exceptions and server errors.
• Refactor the HTTP request workflow within _make_request to execute all logic within the retry block, re-raising the root cause if max attempts fail.
• Add import statements for the tenacity library and configure logging to trace retry attempts.

Affected Areas

• chromadb/api/fastapi.py (Python client for FastAPI)
• All downstream features relying on FastAPI._make_request method

This summary was automatically generated by @propel-code-bot

Copy link
Contributor Author

sanketkedia commented Sep 5, 2025

20 Jobs Failed:

PR checks / all-required-pr-checks-passed

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/api)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/api/test_collection.py)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/api/test_limit_offset.py)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/distributed/test_log_backpressure.py)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/distributed/test_log_failover.py)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/distributed/test_repair_collection_log_offset.py)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/distributed/test_sanity.py)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_add.py)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_collections.py)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_collections_with_database_tenant.py)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_collections_with_database_tenant_ove...

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_embeddings.py)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_filtering.py)

Logs not found.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_fork.py)

Logs not found.

4 more jobs failed (See summary below for more details)

1 job failed running on non-Blacksmith runners.


Summary: 1 successful workflow, 1 failed workflow

Last updated: 2025-09-08 18:40:09 UTC

max=60
),
retry=retry_if_exception_type(is_retryable_exception),
before_sleep=before_sleep_log(logger, logging.INFO),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

The retry condition uses retry_if_exception_type(is_retryable_exception) but is_retryable_exception returns a boolean, not an exception type. This should use retry_if_exception(is_retryable_exception) instead. The current code will cause tenacity to fail when trying to match exception types.

Suggested change
before_sleep=before_sleep_log(logger, logging.INFO),
retry=retry_if_exception(is_retryable_exception),

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**CriticalError**]

The retry condition uses `retry_if_exception_type(is_retryable_exception)` but `is_retryable_exception` returns a boolean, not an exception type. This should use `retry_if_exception(is_retryable_exception)` instead. The current code will cause tenacity to fail when trying to match exception types.

```suggestion
            retry=retry_if_exception(is_retryable_exception),
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: chromadb/api/fastapi.py
Line: 135

Copy link
Contributor

@jairad26 jairad26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add the same retry logic for js as well

Copy link
Contributor

@codetheweb codetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree we should do this for JS as well

retry=retry_if_exception_type(is_retryable_exception),
before_sleep=before_sleep_log(logger, logging.INFO),
reraise=True
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably ok for now but maybe good to make this configurable in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants