Linux: More safe stack cleanup for clone #3424

Sonicadvance1 · 2024-02-14T21:46:20Z

Previously: Would keep one clone thread's stack active for teardown delaying.

With aggressive cloning and teardown, this was unsafe. Only reap the stack when told it is safe to do so.

Source/Tools/LinuxEmulation/LinuxSyscalls/Syscalls.cpp

Source/Tools/LinuxEmulation/LinuxSyscalls/Utils/Threads.cpp

neobrain · 2024-02-22T09:25:41Z

Source/Tools/LinuxEmulation/LinuxSyscalls/Utils/Threads.cpp

+  void DeallocateStackObjectAndExit(void *Ptr, int Status) {
+    RemoveStackFromLivePool(Ptr);
+    auto ReadyToBeReaped = AddStackToDeadPool(Ptr);
+    *ReadyToBeReaped = true;


There's no other logic in here, and the code reading ReadyToBeReaped is protected by a mutex. Can't we move setting the boolean into AddStackToDeadPool and avoid the obscure bool pointer return altogether hence? This could be optional behavior controlled by a function parameter. (This might also make it clearer why ReadyToBeReaped must be set here but not in the other call site of AddStackToDeadPool.)

We can move the ReadyToBeReaped store in to AddStackToDeadPool if we continue passing the status all the way to that function so we can call exit there.
At the point of ReadyToBeReaped being set to true, we no longer have ownership of the stack. We can't return from a function, we can't call a function, we must do the syscall immediately.
The mutex guarding this doesn't matter at all, we are dancing around the thread no longer having a stack.

The stack ownership is released when setting ReadyToBeReaped indeed, but this ownership isn't re-assigned until AllocateStackObject observes ReadyToBeReaped==1. This observation can't happen as long as DeadStackPoolMutex is locked. Until then, accessing the stack should still be safe.

So instead of the boolean pointer dance, why not keep DeadStackPoolMutex locked until after we return from AddStackDeadPool? This could be achieved for example by moving the mutex locking out of AddStackToDeadPool and locking/unlocking the mutex here manually. (To preserve current behavior elsewhere, the function could be renamed to "AddStackToDeadPoolInternal" and a helper function could be added that behaves like the current code).

As discussed externally, this won't work because unlocking the mutex itself requires a stack. (Leaving discussion open for visibility.)

Source/Tools/LinuxEmulation/LinuxSyscalls/Utils/Threads.h

neobrain · 2024-02-22T09:31:02Z

Source/Tools/LinuxEmulation/LinuxSyscalls/Utils/Threads.cpp

@@ -61,6 +82,32 @@ namespace FEX::LinuxEmulation::Threads {
    AddStackToDeadPool(Ptr);


Just to make sure, is it intentional that this code path doesn't set ReadyToBeReaped?

Intentional stack memory leak that is happening elsewhere and is a pre-existing condition. Subject to refactoring that has yet to occur. It's one of the reasons why I've spent months moving thread management to the frontend.

Previously: Would keep one clone thread's stack active for teardown delaying. With aggressive cloning and teardown, this was unsafe. Only reap the stack when told it is safe to do so.

neobrain

Sadly there's no easier way to solve this other than outright switching to a fallback stack, which wouldn't be any more readable overall either. Thanks for taking the time to convince me of this.

At least we did manage to isolate this tricky implementation detail to a single .cpp file, so let's move forward with this.

neobrain reviewed Feb 15, 2024

View reviewed changes

Source/Tools/LinuxEmulation/LinuxSyscalls/Syscalls.cpp Outdated Show resolved Hide resolved

neobrain reviewed Feb 15, 2024

View reviewed changes

Source/Tools/LinuxEmulation/LinuxSyscalls/Syscalls.cpp Show resolved Hide resolved

Sonicadvance1 force-pushed the safer_clone_stack_handling branch 3 times, most recently from 34152a7 to 91ad9d2 Compare February 17, 2024 05:14

neobrain reviewed Feb 19, 2024

View reviewed changes

Source/Tools/LinuxEmulation/LinuxSyscalls/Utils/Threads.cpp Show resolved Hide resolved

Sonicadvance1 force-pushed the safer_clone_stack_handling branch 2 times, most recently from b9368b9 to 9f9c684 Compare February 21, 2024 21:59

neobrain reviewed Feb 22, 2024

View reviewed changes

Linux: More safe stack cleanup for clone

3ac7fe3

Previously: Would keep one clone thread's stack active for teardown delaying. With aggressive cloning and teardown, this was unsafe. Only reap the stack when told it is safe to do so.

Sonicadvance1 force-pushed the safer_clone_stack_handling branch from 9f9c684 to 3ac7fe3 Compare February 24, 2024 09:05

neobrain approved these changes Feb 26, 2024

View reviewed changes

Sonicadvance1 merged commit 9687ac5 into FEX-Emu:main Feb 26, 2024
10 checks passed

Sonicadvance1 deleted the safer_clone_stack_handling branch February 26, 2024 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linux: More safe stack cleanup for clone #3424

Linux: More safe stack cleanup for clone #3424

Sonicadvance1 commented Feb 14, 2024

neobrain Feb 22, 2024

Sonicadvance1 Feb 24, 2024

neobrain Feb 26, 2024

neobrain Feb 26, 2024

neobrain Feb 22, 2024

Sonicadvance1 Feb 22, 2024

neobrain left a comment

		@@ -61,6 +82,32 @@ namespace FEX::LinuxEmulation::Threads {
		AddStackToDeadPool(Ptr);

Linux: More safe stack cleanup for clone #3424

Linux: More safe stack cleanup for clone #3424

Conversation

Sonicadvance1 commented Feb 14, 2024

neobrain Feb 22, 2024

Choose a reason for hiding this comment

Sonicadvance1 Feb 24, 2024

Choose a reason for hiding this comment

neobrain Feb 26, 2024

Choose a reason for hiding this comment

neobrain Feb 26, 2024

Choose a reason for hiding this comment

neobrain Feb 22, 2024

Choose a reason for hiding this comment

Sonicadvance1 Feb 22, 2024

Choose a reason for hiding this comment

neobrain left a comment

Choose a reason for hiding this comment