Skip to content

Conversation

freddyaboulton
Copy link
Collaborator

@freddyaboulton freddyaboulton commented Aug 21, 2025

Description

For each MCP tool/resource/prompt and API request show the % of successful requests along with some latency percentiles. Only tracks events that go through the queue.

For efficiency, the metrics are re-computed every 10 requests. This can be controlled via the GRADIO_ANALYTICS_CACHE_FREQUENCY env variable (purposely undocumented. Used mainly for tests).

While testing, if you want to see the metrics more frequently launch the app like this: GRADIO_ANALYTICS_CACHE_FREQUENCY=2 python <script>.py

MCP Requests

MCPMetrics

API Requests

APIRequests

Closes: #11474

🎯 PRs Should Target Issues

Before your create a PR, please check to see if there is an existing issue for this change. If not, please create an issue before you create this PR, unless the fix is very small.

Not adhering to this guideline will result in the PR being closed.

Testing and Formatting Your Code

  1. PRs will only be merged if tests pass on CI. We recommend at least running the backend tests locally, please set up your Gradio environment locally and run the backed tests: bash scripts/run_backend_tests.sh

  2. Please run these bash scripts to automatically format your code: bash scripts/format_backend.sh, and (if you made any changes to non-Python files) bash scripts/format_frontend.sh

@gradio-pr-bot
Copy link
Collaborator

gradio-pr-bot commented Aug 21, 2025

🪼 branch checks and previews

Name Status URL
Spaces ready! Spaces preview
Website ready! Website preview
Storybook ready! Storybook preview
🦄 Changes detected! Details

Install Gradio from this PR

pip install https://gradio-pypi-previews.s3.amazonaws.com/ecde9eb88eff291e09ed2d276dad87d820d3ac94/gradio-5.44.0-py3-none-any.whl

Install Gradio Python Client from this PR

pip install "gradio-client @ git+https://github.com/gradio-app/gradio@ecde9eb88eff291e09ed2d276dad87d820d3ac94#subdirectory=client/python"

Install Gradio JS Client from this PR

npm install https://gradio-npm-previews.s3.amazonaws.com/ecde9eb88eff291e09ed2d276dad87d820d3ac94/gradio-client-1.17.1.tgz

Use Lite from this PR

<script type="module" src="https://gradio-lite-previews.s3.amazonaws.com/ecde9eb88eff291e09ed2d276dad87d820d3ac94/dist/lite.js""></script>

@gradio-pr-bot
Copy link
Collaborator

gradio-pr-bot commented Aug 21, 2025

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
@gradio/core minor
gradio minor

  • Display performance metrics for API/MCP requests in View API page

‼️ Changeset not approved by maintainers. Ensure the version bump is appropriate for all packages before approving.

  • Maintainers can approve the changeset by selecting this checkbox.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

@abidlabs
Copy link
Member

Whoa this looks awesome! +1 on including for the API docs (and color coding to indicate successful endpoints would be cool!)

@abidlabs
Copy link
Member

We could also potentially expose this in the UI as well in interesting ways. For example, as pointed out here: #11149, errors on Spaces are frustrating because we don't know if it's user error or developer error. If a user had some idea of how often queries are usually successful, that would be a way of letting them know if they should try again (without leaking anything sensitive from a security perspective)

@@ -284,6 +303,7 @@
current_language = "python";
}
}
controller.abort();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching the mcp_server_url opens a persistent connection. That means opening the view api page 5 or more times from the same browser session causes the whole page to freeze because we reach the 5 concurrent connection limit. So I changed it to close the connection immediately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, interesting

@freddyaboulton freddyaboulton changed the title Performance metrics for MCP requests Display performance metrics for API/MCP requests in View API page Aug 28, 2025
@freddyaboulton freddyaboulton marked this pull request as ready for review August 28, 2025 15:47
@freddyaboulton freddyaboulton requested review from abidlabs, pngwn, aliabd, aliabid94, dawoodkhan82 and hannahblair and removed request for abidlabs and pngwn August 28, 2025 15:47
gradio/mcp.py Outdated
@@ -845,6 +845,7 @@ async def get_complete_schema(self, request) -> JSONResponse:
"description": description,
"inputSchema": schema,
"meta": meta,
"endpoint_name": block_fn.api_name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just include this in the "meta" dict, less likely to break any downstream usage (I'm thinking of HF MCP in particular)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea good call

@@ -295,7 +314,7 @@
});
</script>

{#if info}
{#if info && analytics}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no analytics, the entire api docs will not be visible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Analytics will always be an object once the request completes. So this is just so that the page is not visible while there are requests in flight. But I will verify jic.

api_name="/predict",
)
event_analytics = tc.get("/monitoring/summary").json()
print("event_analytics", event_analytics)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("event_analytics", event_analytics)

>= self.ANAYLTICS_CACHE_FREQUENCY
):
df = pd.DataFrame(list(event_analytics.values()))
self.n_events_since_last_analytics_cache = len(event_analytics)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable name is confusing since it sounds like the number of events since the last analytics cache, but actually, it's the number of events at the time analytics were last cached. I'd suggest n_events_at_last_cache or even event_count_at_last_cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh do we really need to cache? Seems like everything after this point should run extremely fast. In particular, the way we've cached here, we won't see any analytics at all until after 10 requests which may be unexpected to developers. (I was confused why I wasn't seeing analytics because I initially thought ANAYLTICS_CACHE_FREQUENCY was measured in terms of seconds)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea good call on the renaming. I think it makes sense to compute this once at the queue level rather than having each independent request to View API do the same computation in parallel. I can lower the event_count_at_last_cache to 1 though so developers and users get more of a live view.

@abidlabs
Copy link
Member

I'm not sure if the success percentage is computed accurately since it says 100% even after a request that errors out:

Screen.Recording.2025-08-28.at.1.27.06.PM.mov

@abidlabs
Copy link
Member

Very cool @freddyaboulton! I had a few suggestions on the UI side, some suggestions, feel free to take/ignore: (1) put the # of total requests first, e.g. "Total requests: 3 (40% successful) | p50..." (2) Order the MCP tools and/or API endpoints in order of the most popular endpoints based on the total requests (3) Color-code the "40% successful" bit as green (if 100% successful), red (if 0% successful), or orange (otherwise)

image

@freddyaboulton
Copy link
Collaborator Author

freddyaboulton commented Aug 28, 2025

@abidlabs thank you for the review!

I have addressed your feedback:

  • Color coded success rate
  • Put the n requests first
  • Compute the summary on every event
  • fixed bug calculating success rate

Copy link
Member

@abidlabs abidlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM great stuff @freddyaboulton! I actually think we could bring these performance metric into the core Gradio UI to provide users more info/context about why their requests are erroring out but that can be a separate PR. I'll merge this in and release. Let's make some noise about this tom!

@abidlabs abidlabs merged commit e6ce731 into main Aug 28, 2025
21 of 24 checks passed
@abidlabs abidlabs deleted the mcp-metrics branch August 28, 2025 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose % of successful requests that were processed by MCP server along with basic stats (e.g. average time, distribution of time)
3 participants