-
Notifications
You must be signed in to change notification settings - Fork 3k
Display performance metrics for API/MCP requests in View API page #11764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🪼 branch checks and previews
Install Gradio from this PR pip install https://gradio-pypi-previews.s3.amazonaws.com/ecde9eb88eff291e09ed2d276dad87d820d3ac94/gradio-5.44.0-py3-none-any.whl Install Gradio Python Client from this PR pip install "gradio-client @ git+https://github.com/gradio-app/gradio@ecde9eb88eff291e09ed2d276dad87d820d3ac94#subdirectory=client/python" Install Gradio JS Client from this PR npm install https://gradio-npm-previews.s3.amazonaws.com/ecde9eb88eff291e09ed2d276dad87d820d3ac94/gradio-client-1.17.1.tgz Use Lite from this PR <script type="module" src="https://gradio-lite-previews.s3.amazonaws.com/ecde9eb88eff291e09ed2d276dad87d820d3ac94/dist/lite.js""></script> |
🦄 change detectedThis Pull Request includes changes to the following packages.
|
Whoa this looks awesome! +1 on including for the API docs (and color coding to indicate successful endpoints would be cool!) |
We could also potentially expose this in the UI as well in interesting ways. For example, as pointed out here: #11149, errors on Spaces are frustrating because we don't know if it's user error or developer error. If a user had some idea of how often queries are usually successful, that would be a way of letting them know if they should try again (without leaking anything sensitive from a security perspective) |
@@ -284,6 +303,7 @@ | |||
current_language = "python"; | |||
} | |||
} | |||
controller.abort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fetching the mcp_server_url opens a persistent connection. That means opening the view api page 5 or more times from the same browser session causes the whole page to freeze because we reach the 5 concurrent connection limit. So I changed it to close the connection immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, interesting
gradio/mcp.py
Outdated
@@ -845,6 +845,7 @@ async def get_complete_schema(self, request) -> JSONResponse: | |||
"description": description, | |||
"inputSchema": schema, | |||
"meta": meta, | |||
"endpoint_name": block_fn.api_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just include this in the "meta" dict, less likely to break any downstream usage (I'm thinking of HF MCP in particular)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea good call
@@ -295,7 +314,7 @@ | |||
}); | |||
</script> | |||
|
|||
{#if info} | |||
{#if info && analytics} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no analytics, the entire api docs will not be visible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Analytics will always be an object once the request completes. So this is just so that the page is not visible while there are requests in flight. But I will verify jic.
test/test_queueing.py
Outdated
api_name="/predict", | ||
) | ||
event_analytics = tc.get("/monitoring/summary").json() | ||
print("event_analytics", event_analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print("event_analytics", event_analytics) |
gradio/queueing.py
Outdated
>= self.ANAYLTICS_CACHE_FREQUENCY | ||
): | ||
df = pd.DataFrame(list(event_analytics.values())) | ||
self.n_events_since_last_analytics_cache = len(event_analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable name is confusing since it sounds like the number of events since the last analytics cache, but actually, it's the number of events at the time analytics were last cached. I'd suggest n_events_at_last_cache
or even event_count_at_last_cache
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh do we really need to cache? Seems like everything after this point should run extremely fast. In particular, the way we've cached here, we won't see any analytics at all until after 10 requests which may be unexpected to developers. (I was confused why I wasn't seeing analytics because I initially thought ANAYLTICS_CACHE_FREQUENCY
was measured in terms of seconds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea good call on the renaming. I think it makes sense to compute this once at the queue level rather than having each independent request to View API
do the same computation in parallel. I can lower the event_count_at_last_cache
to 1 though so developers and users get more of a live view.
I'm not sure if the success percentage is computed accurately since it says 100% even after a request that errors out: Screen.Recording.2025-08-28.at.1.27.06.PM.mov |
Very cool @freddyaboulton! I had a few suggestions on the UI side, some suggestions, feel free to take/ignore: (1) put the # of total requests first, e.g. "Total requests: 3 (40% successful) | p50..." (2) Order the MCP tools and/or API endpoints in order of the most popular endpoints based on the total requests (3) Color-code the "40% successful" bit as green (if 100% successful), red (if 0% successful), or orange (otherwise) ![]() |
@abidlabs thank you for the review! I have addressed your feedback:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM great stuff @freddyaboulton! I actually think we could bring these performance metric into the core Gradio UI to provide users more info/context about why their requests are erroring out but that can be a separate PR. I'll merge this in and release. Let's make some noise about this tom!
Description
For each MCP tool/resource/prompt and API request show the % of successful requests along with some latency percentiles. Only tracks events that go through the queue.
For efficiency, the metrics are re-computed every 10 requests. This can be controlled via the
GRADIO_ANALYTICS_CACHE_FREQUENCY
env variable (purposely undocumented. Used mainly for tests).While testing, if you want to see the metrics more frequently launch the app like this:
GRADIO_ANALYTICS_CACHE_FREQUENCY=2 python <script>.py
MCP Requests
API Requests
Closes: #11474
🎯 PRs Should Target Issues
Before your create a PR, please check to see if there is an existing issue for this change. If not, please create an issue before you create this PR, unless the fix is very small.
Not adhering to this guideline will result in the PR being closed.
Testing and Formatting Your Code
PRs will only be merged if tests pass on CI. We recommend at least running the backend tests locally, please set up your Gradio environment locally and run the backed tests:
bash scripts/run_backend_tests.sh
Please run these bash scripts to automatically format your code:
bash scripts/format_backend.sh
, and (if you made any changes to non-Python files)bash scripts/format_frontend.sh