[cuda] Optimize device signal to host wait synchronization #17354
Job | Run time |
---|---|
6s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
10s | |
16s |
Job | Run time |
---|---|
6s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
10s | |
16s |