-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a question about auto grad. thx #11
Comments
Good question! For most neural networks we don’t use either of these kinds of stopping conditions. So it’s more an example of optimization than ideal way to train a neural network.The broader answer in the context of optimization problems is that both are valid with different trade offs. The zero check gets you closer to a local minima The difference between previous and current might stop before you reach a local minima, but it also might save you from waiting a very long time if the function is badly behaved and converging very slowly. I also just like the second approach because it’s more general purpose. Sometimes you want to check the convergence of something other than the gradient, which may not converge/minimize near zero. Sent from my iPhoneOn May 20, 2023, at 3:27 PM, yebangyu ***@***.***> wrote:
Dear Edward,
From page 21 to page 23, when we are talking about auto grad,
we choose to test conditon ||prev - cur || < epsilon satisfies or not to check whether we have got the minimun
my question is : why not just to test whether the grad of cur is zero or not ?
that is to say :
can
while torch.linalg.norm(x_cur-x_prev) > epsilon:
be replaced by
epsilon = 1e-12 # an enough small value while abs(cur.grad) > epsilon:
?
thanks a lot !
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
thanks for your reply, Edward according to the sgd formula: x_cur = x_prev - learning_rate * grad if grad is close to zero , we can get that x_cur is approximately equal to x_prev but not vice versa x_cur is approximately equal to x_prev does not mean that grad is close to zero (maybe just because learning rate is too small) Am i right ? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Dear Edward,
From page 21 to page 23, when we are talking about auto grad,
we choose to test conditon ||prev - cur || < epsilon satisfies or not to check whether we have got the minimun
my question is : why not just to test whether the grad of cur is zero or not ?
that is to say :
can
while torch.linalg.norm(x_cur-x_prev) > epsilon:
be replaced by
epsilon = 1e-12 # an enough small value
while abs(cur.grad) > epsilon:
?
thanks a lot !
The text was updated successfully, but these errors were encountered: