-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux: More safe stack cleanup for clone #3424
Linux: More safe stack cleanup for clone #3424
Conversation
34152a7
to
91ad9d2
Compare
b9368b9
to
9f9c684
Compare
void DeallocateStackObjectAndExit(void *Ptr, int Status) { | ||
RemoveStackFromLivePool(Ptr); | ||
auto ReadyToBeReaped = AddStackToDeadPool(Ptr); | ||
*ReadyToBeReaped = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no other logic in here, and the code reading ReadyToBeReaped
is protected by a mutex. Can't we move setting the boolean into AddStackToDeadPool and avoid the obscure bool pointer return altogether hence? This could be optional behavior controlled by a function parameter. (This might also make it clearer why ReadyToBeReaped must be set here but not in the other call site of AddStackToDeadPool.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move the ReadyToBeReaped store in to AddStackToDeadPool
if we continue passing the status all the way to that function so we can call exit there.
At the point of ReadyToBeReaped
being set to true, we no longer have ownership of the stack. We can't return from a function, we can't call a function, we must do the syscall immediately.
The mutex guarding this doesn't matter at all, we are dancing around the thread no longer having a stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stack ownership is released when setting ReadyToBeReaped indeed, but this ownership isn't re-assigned until AllocateStackObject
observes ReadyToBeReaped==1
. This observation can't happen as long as DeadStackPoolMutex
is locked. Until then, accessing the stack should still be safe.
So instead of the boolean pointer dance, why not keep DeadStackPoolMutex
locked until after we return from AddStackDeadPool
? This could be achieved for example by moving the mutex locking out of AddStackToDeadPool
and locking/unlocking the mutex here manually. (To preserve current behavior elsewhere, the function could be renamed to "AddStackToDeadPoolInternal" and a helper function could be added that behaves like the current code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed externally, this won't work because unlocking the mutex itself requires a stack. (Leaving discussion open for visibility.)
@@ -61,6 +82,32 @@ namespace FEX::LinuxEmulation::Threads { | |||
AddStackToDeadPool(Ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure, is it intentional that this code path doesn't set ReadyToBeReaped
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentional stack memory leak that is happening elsewhere and is a pre-existing condition. Subject to refactoring that has yet to occur. It's one of the reasons why I've spent months moving thread management to the frontend.
Previously: Would keep one clone thread's stack active for teardown delaying. With aggressive cloning and teardown, this was unsafe. Only reap the stack when told it is safe to do so.
9f9c684
to
3ac7fe3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly there's no easier way to solve this other than outright switching to a fallback stack, which wouldn't be any more readable overall either. Thanks for taking the time to convince me of this.
At least we did manage to isolate this tricky implementation detail to a single .cpp
file, so let's move forward with this.
Previously: Would keep one clone thread's stack active for teardown delaying.
With aggressive cloning and teardown, this was unsafe. Only reap the stack when told it is safe to do so.