Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for shutting down during async operations #1141

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

bghgary
Copy link
Contributor

@bghgary bghgary commented Sep 29, 2022

This change set up a coding pattern for invoking continuations on the JavaScript thread while properly handling garbage collection and shutdown scenarios. This pattern is described in the JsRuntimeScheduler class. See the class comment for more information.

@bghgary bghgary force-pushed the async-fixes branch 2 times, most recently from da0d314 to cc94258 Compare October 28, 2022 16:26
@bghgary bghgary changed the title WIP: Fixes for shutting down during async operations Fixes for shutting down during async operations Oct 28, 2022
If the shared_ptr is copied, the final release may end up on a different thread. Adding std::forward will ensure that the lambda is moved and not copied which will prevent multiple threads from owning the shared_ptr. The shared_ptr is being used as a hack anyways. The correct fix is to not use shared_ptr and make Dispatch use the Dispatchable class from AppRuntime or if we can use C++23 someday, use std::move_only_function.
@bghgary bghgary removed the request for review from docEdub May 9, 2023 00:55
@bghgary bghgary marked this pull request as ready for review May 9, 2023 23:04
Core/AppRuntime/Source/AppRuntime_Chakra.cpp Outdated Show resolved Hide resolved
Core/JsRuntime/Include/Babylon/JsRuntime.h Show resolved Hide resolved
//
// private:
// arcana::cancellation_source m_cancellationSource;
// JsRuntimeScheduler m_runtimeScheduler;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you add a constructor to this example code just to show how the JsRuntimeScheduler is initialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's good idea.


void JsRuntime::RegisterDisposing(JsRuntime& runtime, IDisposingCallback* callback)
{
auto& callbacks = runtime.m_disposingCallbacks;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we be certain this won't be called after NotifyDisposing (which invalidates m_disposingCallbacks)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't happen assuming correct usage patterns with cancellation. But if it does happen, then callbacks registered after NotifyDisposing will not fire. All of these *Disposing functions must be called on the JavaScript thread.

@@ -38,8 +38,7 @@ namespace Babylon::Plugins::Internal
}
}

auto runtimeScheduler{std::make_unique<JsRuntimeScheduler>(JsRuntime::GetFromJavaScript(env))};
MediaStream::NewAsync(env, videoConstraints).then(*runtimeScheduler, arcana::cancellation::none(), [runtimeScheduler = std::move(runtimeScheduler), env, deferred](const arcana::expected<Napi::Object, std::exception_ptr>& result) {
MediaStream::NewAsync(env, videoConstraints).then(arcana::inline_scheduler, arcana::cancellation::none(), [env, deferred](const arcana::expected<Napi::Object, std::exception_ptr>& result) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is inline_scheduler ok? Doesn't this mean that while it is still running on the JS thread, it's not tracked in the context of the JsRuntimeScheduler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am enforcing that NewAsync must return on the JS thread, which it already does.

@@ -70,7 +53,7 @@ namespace Babylon::Polyfills::Internal

if (time <= earliestTime)
{
m_runtime.Dispatch([this](Napi::Env) {
m_runtimeScheduler.Get()([this]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the ones in this class not just be m_runtime.Dispatch?


SchedulerImpl m_scheduler;
std::atomic<int> m_count{0};
arcana::manual_dispatcher<128> m_disposingDispatcher{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If each JsRuntimeScheduler instance has its own dispatch queue, and it is only pumped by a call to Rundown in the destructor, then if we ever have a situation where one object is trying to call into another object (as part of the async code executing during destruction), then the separate dispatch queue in the separate object will be in a state where it is getting pumped and we will basically deadlock. It seems like this is fixable by having a shared dispatch queue across all objects, and a given object's destructor's call to Rundown is pumping for all queued work. I don't think this breaks anything, but I think it would prevent this potential deadlock situation. I'm not sure where a shared dispatch queue would exist... maybe we'd need to push some of this logic up to JSRuntime.

Comment on lines +34 to 37

// Wait for async operations to complete.
m_runtimeScheduler.Rundown();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually necessary?

Comment on lines +63 to +65
// HACK: This is a hack to make sure the camera device is destroyed on the JS thread.
// The napi-jsi adapter currently calls the destructors of JS objects possibly on the wrong thread.
// Once this is fixed, this hack will no longer be needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have/need an issue created to address this hack in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find one. I will create one.


// HACK: This is a hack to make sure the camera device is destroyed on the JS thread.
// The napi-jsi adapter currently calls the destructors of JS objects possibly on the wrong thread.
// Once this is fixed, this hack will no longer be needed.
if (m_cameraDevice != nullptr)
{
// The cameraDevice should be destroyed on the JS thread as it may need to access main thread resources
// move ownership of the cameraDevice to a lambda and dispatch it with the runtimeScheduler so the destructor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "and move..."

// an assert if there are outstanding schedulers not yet invoked.
// 2. The last continuation that accesses members of the N-API object, including the cancellation associated with
// the continuation, must capture a persistent reference to the N-API object itself to prevent the GC from
// collecting the N-API object during the asynchronous operation. Failing to do so will result in a hang
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment about lifetime of members during GC.

{
}

AppRuntime::AppRuntime(std::function<void(const std::exception&)> unhandledExceptionHandler)
: m_workQueue{std::make_unique<WorkQueue>([this] { RunPlatformTier(); })}
, m_unhandledExceptionHandler{unhandledExceptionHandler}
: m_impl{std::make_unique<AppRuntimeImpl>(unhandledExceptionHandler)}
Copy link
Member

@ryantrem ryantrem May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This this be:
: m_impl{std::make_unique<AppRuntimeImpl>(std::move(unhandledExceptionHandler))}

@CedricGuillemet
Copy link
Contributor

@ryantrem @bghgary I'm rehydrating this PR. I'll close it when my draft is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants