-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option for not using SIGSTOP/SIGCONT because not all apps take it well #13
base: master
Are you sure you want to change the base?
Conversation
Co-Authored-By: bjorn3 <bjorn3@users.noreply.github.com>
Co-Authored-By: bjorn3 <bjorn3@users.noreply.github.com>
Co-Authored-By: bjorn3 <bjorn3@users.noreply.github.com>
Co-Authored-By: bjorn3 <bjorn3@users.noreply.github.com>
Yes, the STOP/CONT are sent because the |
// avoid infinite loops | ||
if self.ctx.nth_frame > 1000 { | ||
warn!("possible infinite loop detected and avoided"); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you actually hit a genuine infinite loop here? AFAIK infinite loops shouldn't really be possible as you're going to overflow the stack sooner or later anyway.
Anyway, this change isn't really correct. Even though > 1000 frame deep stacks are certainly a sign of a problem they should still be gathered. I've seen such stack traces in the wild, and gathering as much of it as possible later helps to fix it if you can manage to get to the top. So what was your motivation in adding this here? If you want to limit stack traces to a certain length we could add an extra parameter instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've got it frequently in one of the cortex-15 app, but I cannot post it here nor dig into it further since I'm leaving the company.
What I managed to figure out though is that it looked like a arm unwinder bug, vec holding the frames kept allocating until it failed while trying to reallocate into 3.5 gigs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, do you still want to make this as command line option? I'm asking because failed allocation, which is how this bug manifests itself, is just an abort, no panic nor Err. This makes it hard to diagnose if it happens to someone.
This STOP/CONT pattern is used to avoid data-race between reading /proc and handling things like mmap events from the kernel, isn't it? Anyway, we are profiling some apps that don't take it very well since STOP causes syscalls to return abnormally. It should be fixed but you know, it is not always that easy. Therefore I'm proposing a switch to disable this behavior.