Fixes for shutting down during async operations #1141

bghgary · 2022-09-29T00:47:59Z

This change set up a coding pattern for invoking continuations on the JavaScript thread while properly handling garbage collection and shutdown scenarios. This pattern is described in the JsRuntimeScheduler class. See the class comment for more information.

…into async-fixes

Core/JsRuntime/Include/Babylon/JsRuntimeScheduler.h

If the shared_ptr is copied, the final release may end up on a different thread. Adding std::forward will ensure that the lambda is moved and not copied which will prevent multiple threads from owning the shared_ptr. The shared_ptr is being used as a hack anyways. The correct fix is to not use shared_ptr and make Dispatch use the Dispatchable class from AppRuntime or if we can use C++23 someday, use std::move_only_function.

Core/AppRuntime/Source/WorkQueue.cpp

Core/AppRuntime/Source/AppRuntime_Chakra.cpp

Core/JsRuntime/Include/Babylon/JsRuntime.h

ryantrem · 2023-05-16T00:05:53Z

Core/JsRuntime/InternalInclude/Babylon/JsRuntimeScheduler.h

+    //
+    //   private:
+    //       arcana::cancellation_source m_cancellationSource;
+    //       JsRuntimeScheduler m_runtimeScheduler;


Will you add a constructor to this example code just to show how the JsRuntimeScheduler is initialized?

Yes, that's good idea.

Core/JsRuntime/InternalInclude/Babylon/JsRuntimeScheduler.h

ryantrem · 2023-05-16T00:15:30Z

Core/JsRuntime/Source/JsRuntime.cpp

+
+    void JsRuntime::RegisterDisposing(JsRuntime& runtime, IDisposingCallback* callback)
+    {
+        auto& callbacks = runtime.m_disposingCallbacks;


How can we be certain this won't be called after NotifyDisposing (which invalidates m_disposingCallbacks)?

It shouldn't happen assuming correct usage patterns with cancellation. But if it does happen, then callbacks registered after NotifyDisposing will not fire. All of these *Disposing functions must be called on the JavaScript thread.

ryantrem · 2023-05-16T22:00:03Z

Plugins/NativeCamera/Source/MediaDevices.cpp

@@ -38,8 +38,7 @@ namespace Babylon::Plugins::Internal
                }
            }

-            auto runtimeScheduler{std::make_unique<JsRuntimeScheduler>(JsRuntime::GetFromJavaScript(env))};
-            MediaStream::NewAsync(env, videoConstraints).then(*runtimeScheduler, arcana::cancellation::none(), [runtimeScheduler = std::move(runtimeScheduler), env, deferred](const arcana::expected<Napi::Object, std::exception_ptr>& result) {
+            MediaStream::NewAsync(env, videoConstraints).then(arcana::inline_scheduler, arcana::cancellation::none(), [env, deferred](const arcana::expected<Napi::Object, std::exception_ptr>& result) {


Is inline_scheduler ok? Doesn't this mean that while it is still running on the JS thread, it's not tracked in the context of the JsRuntimeScheduler?

I am enforcing that NewAsync must return on the JS thread, which it already does.

ryantrem · 2023-05-18T00:00:03Z

Polyfills/Window/Source/TimeoutDispatcher.cpp

@@ -70,7 +53,7 @@ namespace Babylon::Polyfills::Internal

        if (time <= earliestTime)
        {
-            m_runtime.Dispatch([this](Napi::Env) {
+            m_runtimeScheduler.Get()([this]() {


Can the ones in this class not just be m_runtime.Dispatch?

ryantrem · 2023-05-19T17:40:03Z

Core/JsRuntime/InternalInclude/Babylon/JsRuntimeScheduler.h

+
+        SchedulerImpl m_scheduler;
+        std::atomic<int> m_count{0};
+        arcana::manual_dispatcher<128> m_disposingDispatcher{};


If each JsRuntimeScheduler instance has its own dispatch queue, and it is only pumped by a call to Rundown in the destructor, then if we ever have a situation where one object is trying to call into another object (as part of the async code executing during destruction), then the separate dispatch queue in the separate object will be in a state where it is getting pumped and we will basically deadlock. It seems like this is fixable by having a shared dispatch queue across all objects, and a given object's destructor's call to Rundown is pumping for all queued work. I don't think this breaks anything, but I think it would prevent this potential deadlock situation. I'm not sure where a shared dispatch queue would exist... maybe we'd need to push some of this logic up to JSRuntime.

bghgary · 2023-05-19T17:47:03Z

Polyfills/Window/Source/TimeoutDispatcher.cpp

+
+        // Wait for async operations to complete.
+        m_runtimeScheduler.Rundown();
    }


Is this actually necessary?

PolygonalSun · 2023-05-19T22:17:37Z

Plugins/NativeCamera/Source/MediaStream.cpp

+        // HACK: This is a hack to make sure the camera device is destroyed on the JS thread.
+        // The napi-jsi adapter currently calls the destructors of JS objects possibly on the wrong thread.
+        // Once this is fixed, this hack will no longer be needed.


Do we have/need an issue created to address this hack in the future?

I didn't find one. I will create one.

PolygonalSun · 2023-05-19T22:18:41Z

Plugins/NativeCamera/Source/MediaStream.cpp

+
+        // HACK: This is a hack to make sure the camera device is destroyed on the JS thread.
+        // The napi-jsi adapter currently calls the destructors of JS objects possibly on the wrong thread.
+        // Once this is fixed, this hack will no longer be needed.
        if (m_cameraDevice != nullptr)
        {
            // The cameraDevice should be destroyed on the JS thread as it may need to access main thread resources
            // move ownership of the cameraDevice to a lambda and dispatch it with the runtimeScheduler so the destructor


Nit: "and move..."

bghgary · 2023-05-23T18:46:44Z

Core/JsRuntime/InternalInclude/Babylon/JsRuntimeScheduler.h

+    //      an assert if there are outstanding schedulers not yet invoked.
+    //   2. The last continuation that accesses members of the N-API object, including the cancellation associated with
+    //      the continuation, must capture a persistent reference to the N-API object itself to prevent the GC from
+    //      collecting the N-API object during the asynchronous operation. Failing to do so will result in a hang


Add comment about lifetime of members during GC.

ryantrem · 2023-05-24T17:54:11Z

Core/AppRuntime/Source/AppRuntime.cpp

    {
    }

    AppRuntime::AppRuntime(std::function<void(const std::exception&)> unhandledExceptionHandler)
-        : m_workQueue{std::make_unique<WorkQueue>([this] { RunPlatformTier(); })}
-        , m_unhandledExceptionHandler{unhandledExceptionHandler}
+        : m_impl{std::make_unique<AppRuntimeImpl>(unhandledExceptionHandler)}


This this be:
: m_impl{std::make_unique<AppRuntimeImpl>(std::move(unhandledExceptionHandler))}

CedricGuillemet · 2024-06-19T14:37:20Z

@ryantrem @bghgary I'm rehydrating this PR. I'll close it when my draft is ready.

WIP: Fixes for shutting down during async operations

43dce92

bghgary requested review from ryantrem and docEdub September 29, 2022 00:47

bghgary added 6 commits September 29, 2022 14:00

Change to explicitly call Rundown

925d243

Cannot throw in destructors

c999153

Merge remote-tracking branch 'origin/master' into async-fixes

204efb9

Fix merge issues

9d1e098

Merge with new timeout code with work queue fixes

124def4

Fix Canvas

309c4d9

bghgary force-pushed the async-fixes branch 2 times, most recently from da0d314 to cc94258 Compare October 28, 2022 16:26

Fix Android build

ad88d30

bghgary force-pushed the async-fixes branch from cc94258 to ad88d30 Compare October 28, 2022 16:42

bghgary changed the title ~~WIP: Fixes for shutting down during async operations~~ Fixes for shutting down during async operations Oct 28, 2022

bghgary mentioned this pull request Mar 7, 2023

Asynchronous shader compilation #1209

Merged

bghgary added 4 commits March 29, 2023 09:51

Merge remote-tracking branch 'origin/master' into async-fixes

11ca510

Merge branch 'async-fixes' of https://github.com/bghgary/BabylonNative …

31248b6

…into async-fixes

Update arcana.cpp

9322360

Temp fixes for MediaStream

b4a3c73

bghgary commented Apr 5, 2023

View reviewed changes

Core/JsRuntime/Include/Babylon/JsRuntimeScheduler.h Outdated Show resolved Hide resolved

bghgary added 4 commits April 19, 2023 13:59

Work queue shutdown fixes

df246e2

Miscellaneous Windows AppRuntime fixes

7350715

Fix typo in AppRuntime for Win32

77dc8f3

bghgary commented May 1, 2023

View reviewed changes

Core/AppRuntime/Source/WorkQueue.cpp Outdated Show resolved Hide resolved

bghgary added 4 commits May 2, 2023 14:01

Better fix for work queue shutdown issue

def047c

Merge remote-tracking branch 'origin/master' into async-fixes

bdaf1b9

Fix build issues from merge

0ae6cf4

Update arcana.cpp to include continuation fix

71efaf4

bghgary force-pushed the async-fixes branch from 9c96977 to 71efaf4 Compare May 9, 2023 00:32

bghgary added 4 commits May 8, 2023 17:45

Minor style fixes

a0db419

Update comment

6909f55

Update comment 2

e9d229d

More style fixes

423ed5e

bghgary removed the request for review from docEdub May 9, 2023 00:55

bghgary marked this pull request as ready for review May 9, 2023 23:04

ryantrem reviewed May 18, 2023

View reviewed changes

ryantrem reviewed May 19, 2023

View reviewed changes

bghgary commented May 19, 2023

View reviewed changes

PolygonalSun reviewed May 19, 2023

View reviewed changes

bghgary commented May 23, 2023

View reviewed changes

Merge AppRuntime and WorkQueue to fix race condition

36a4661

bghgary force-pushed the async-fixes branch from c54627a to 36a4661 Compare May 24, 2023 01:17

ryantrem reviewed May 24, 2023

View reviewed changes

CedricGuillemet mentioned this pull request Sep 5, 2023

Google tests framework for UnitTests #1277

Merged

CedricGuillemet mentioned this pull request Jun 21, 2024

[WIP] Async + shutdown #1397

Draft

bghgary mentioned this pull request Sep 5, 2024

Fix async issues caused by runtime teardown #1420

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for shutting down during async operations #1141

Fixes for shutting down during async operations #1141

bghgary commented Sep 29, 2022 •

edited

Loading

ryantrem May 16, 2023

bghgary May 18, 2023

ryantrem May 16, 2023

bghgary May 18, 2023

ryantrem May 16, 2023

bghgary May 18, 2023

ryantrem May 18, 2023

ryantrem May 19, 2023

bghgary May 19, 2023

PolygonalSun May 19, 2023

bghgary Jun 16, 2023

PolygonalSun May 19, 2023

bghgary May 23, 2023

ryantrem May 24, 2023 •

edited

Loading

CedricGuillemet commented Jun 19, 2024

Fixes for shutting down during async operations #1141

Are you sure you want to change the base?

Fixes for shutting down during async operations #1141

Conversation

bghgary commented Sep 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryantrem May 24, 2023 • edited Loading

Choose a reason for hiding this comment

CedricGuillemet commented Jun 19, 2024

bghgary commented Sep 29, 2022 •

edited

Loading

ryantrem May 24, 2023 •

edited

Loading