Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: More safe stack cleanup for clone #3424

Merged

Conversation

Sonicadvance1
Copy link
Member

Previously: Would keep one clone thread's stack active for teardown delaying.

With aggressive cloning and teardown, this was unsafe. Only reap the stack when told it is safe to do so.

@Sonicadvance1 Sonicadvance1 force-pushed the safer_clone_stack_handling branch 3 times, most recently from 34152a7 to 91ad9d2 Compare February 17, 2024 05:14
@Sonicadvance1 Sonicadvance1 force-pushed the safer_clone_stack_handling branch 2 times, most recently from b9368b9 to 9f9c684 Compare February 21, 2024 21:59
void DeallocateStackObjectAndExit(void *Ptr, int Status) {
RemoveStackFromLivePool(Ptr);
auto ReadyToBeReaped = AddStackToDeadPool(Ptr);
*ReadyToBeReaped = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no other logic in here, and the code reading ReadyToBeReaped is protected by a mutex. Can't we move setting the boolean into AddStackToDeadPool and avoid the obscure bool pointer return altogether hence? This could be optional behavior controlled by a function parameter. (This might also make it clearer why ReadyToBeReaped must be set here but not in the other call site of AddStackToDeadPool.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move the ReadyToBeReaped store in to AddStackToDeadPool if we continue passing the status all the way to that function so we can call exit there.
At the point of ReadyToBeReaped being set to true, we no longer have ownership of the stack. We can't return from a function, we can't call a function, we must do the syscall immediately.
The mutex guarding this doesn't matter at all, we are dancing around the thread no longer having a stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack ownership is released when setting ReadyToBeReaped indeed, but this ownership isn't re-assigned until AllocateStackObject observes ReadyToBeReaped==1. This observation can't happen as long as DeadStackPoolMutex is locked. Until then, accessing the stack should still be safe.

So instead of the boolean pointer dance, why not keep DeadStackPoolMutex locked until after we return from AddStackDeadPool? This could be achieved for example by moving the mutex locking out of AddStackToDeadPool and locking/unlocking the mutex here manually. (To preserve current behavior elsewhere, the function could be renamed to "AddStackToDeadPoolInternal" and a helper function could be added that behaves like the current code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed externally, this won't work because unlocking the mutex itself requires a stack. (Leaving discussion open for visibility.)

@@ -61,6 +82,32 @@ namespace FEX::LinuxEmulation::Threads {
AddStackToDeadPool(Ptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, is it intentional that this code path doesn't set ReadyToBeReaped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional stack memory leak that is happening elsewhere and is a pre-existing condition. Subject to refactoring that has yet to occur. It's one of the reasons why I've spent months moving thread management to the frontend.

Previously: Would keep one clone thread's stack active for teardown
delaying.

With aggressive cloning and teardown, this was unsafe.
Only reap the stack when told it is safe to do so.
Copy link
Member

@neobrain neobrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly there's no easier way to solve this other than outright switching to a fallback stack, which wouldn't be any more readable overall either. Thanks for taking the time to convince me of this.

At least we did manage to isolate this tricky implementation detail to a single .cpp file, so let's move forward with this.

@Sonicadvance1 Sonicadvance1 merged commit 9687ac5 into FEX-Emu:main Feb 26, 2024
10 checks passed
@Sonicadvance1 Sonicadvance1 deleted the safer_clone_stack_handling branch February 26, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants