Skip to content

Conversation

nuclearcat
Copy link
Member

No description provided.

@nuclearcat nuclearcat marked this pull request as ready for review September 5, 2025 17:57
@nuclearcat nuclearcat force-pushed the health-improve branch 4 times, most recently from 8ba961e to 02b23e9 Compare September 7, 2025 12:35
The current health check only monitors heartbeat updates but fails to
detect when individual scheduler threads crash or become unresponsive.
This is problematic because threads use blocking event loops that can
wait for hours, making it impossible to distinguish between legitimate
waiting and actual crashes.

The health check now detects:
- Completely crashed/dead threads via thread.is_alive()
- Missing threads from failed startup
- Thread restart frequency from exception handling
- Overall thread pool health status

Signed-off-by: Denys Fedoryshchenko <[email protected]>
@aliceinwire
Copy link
Member

nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants