WaitHandleWait profiler #5814

verdie-g · 2024-07-26T14:09:57Z

Is your feature request related to a problem? Please describe.

Thread pool starvation is a very common performance issue in the services of my company. They are usually caused by sync-over-async. It can be relatively easy to find them when it's a continuous issue, but it gets tricky when it occurs the callback of a 15 minutes timer.

Describe the solution you'd like

In .NET 9+, I added an event (dotnet/runtime#94737) that is emitted whenever a managed thread is blocked (Monitor.Wait, Task.Wait, MRES.Wait, SemaphoreSlim.Wait, ...). It would be awesome if this event was collected just like the exceptions, allocations, contention profilers.

One annoying this about this event is that most occurrences will usually happen in non-thread-pool threads that spend most of their time blocking. For that, I opened dotnet/runtime#102326.

Describe alternatives you've considered

Currently, I took super long traces of the WaitHandleWait event and try to find the issue in PerfView but it's not reliable.

gleocadie · 2024-07-31T08:16:11Z

Hi @verdie-g

thanks for the FR, that's really interesting. Thread starvation is a kind of things we are trying to detect and report in our UI.
Currently, we push a metric on the number of managed threads which can help to detect a threadpool starvation (continuously increasing number of threads) but nothing more (no callstacks, no additional info).
We want to add more info and thanks to your PR we could add another level of information (thanks for that :) ).

We can collect this event and it will be done for all managed threads. As of today, the filter is done in the UI (using the name/id...).

Just to be sure, and make this useful for everyone:
As a user, how would see that information usefully displayed to you (and gives you actionable ways to investigate/fix it) ? (callstacks, metrics, monitors...)

We'll let you know about the status of this FR.
Thanks for the proposition.

verdie-g · 2024-07-31T11:12:52Z

Just to be sure, and make this useful for everyone:
As a user, how would see that information usefully displayed to you

I would like to see a flamegraph of the waits just like the contention.

Ideally, the unit would but the wait time. The number of events can be extremely misleading. Unlike the contention event, it was decided not to include it in the wait event. In my custom perfview, I use the thread id of the event to match the start and the stop.

As a metric, the total wait time could also be interesting. For example, when a spike of latency is observed between two services but the calling service has a spike of wait time, it could help understand that the issue is not in the called service.

gleocadie · 2024-08-01T06:57:56Z

Just to be sure, and make this useful for everyone:
As a user, how would see that information usefully displayed to you

I would like to see a flamegraph of the waits just like the contention.

Ideally, the unit would but the wait time. The number of events can be extremely misleading. Unlike the contention event, it was decided not to include it in the wait event. In my custom perfview, I use the thread id of the event to match the start and the stop.

As a metric, the total wait time could also be interesting. For example, when a spike of latency is observed between two services but the calling service has a spike of wait time, it could help understand that the issue is not in the called service.

Thanks, I'll create a FR on our backlog and we'll have to plan it.

gleocadie added the area:profiler Issues related to the continous-profiler label Jul 31, 2024

gleocadie self-assigned this Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WaitHandleWait profiler #5814

WaitHandleWait profiler #5814

verdie-g commented Jul 26, 2024

gleocadie commented Jul 31, 2024

verdie-g commented Jul 31, 2024

gleocadie commented Aug 1, 2024

WaitHandleWait profiler #5814

WaitHandleWait profiler #5814

Comments

verdie-g commented Jul 26, 2024

gleocadie commented Jul 31, 2024

verdie-g commented Jul 31, 2024

gleocadie commented Aug 1, 2024