a question about auto grad. thx #11

yebangyu · 2023-05-20T06:26:57Z

Dear Edward,

From page 21 to page 23, when we are talking about auto grad,

we choose to test conditon ||prev - cur || < epsilon satisfies or not to check whether we have got the minimun

my question is : why not just to test whether the grad of cur is zero or not ?

that is to say :

can

while torch.linalg.norm(x_cur-x_prev) > epsilon:

be replaced by

epsilon = 1e-12 # an enough small value

while abs(cur.grad) > epsilon:

?

thanks a lot !

The text was updated successfully, but these errors were encountered:

EdwardRaff · 2023-05-20T06:51:19Z

Good question! For most neural networks we don’t use either of these kinds of stopping conditions. So it’s more an example of optimization than ideal way to train a neural network.The broader answer in the context of optimization problems is that both are valid with different trade offs. The zero check gets you closer to a local minima The difference between previous and current might stop before you reach a local minima, but it also might save you from waiting a very long time if the function is badly behaved and converging very slowly. I also just like the second approach because it’s more general purpose. Sometimes you want to check the convergence of something other than the gradient, which may not converge/minimize near zero. Sent from my iPhoneOn May 20, 2023, at 3:27 PM, yebangyu ***@***.***> wrote: Dear Edward, From page 21 to page 23, when we are talking about auto grad, we choose to test conditon ||prev - cur || < epsilon satisfies or not to check whether we have got the minimun my question is : why not just to test whether the grad of cur is zero or not ? that is to say : can while torch.linalg.norm(x_cur-x_prev) > epsilon: be replaced by epsilon = 1e-12 # an enough small value while abs(cur.grad) > epsilon: ? thanks a lot ! —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

yebangyu · 2023-05-20T07:59:03Z

thanks for your reply, Edward

according to the sgd formula:

x_cur = x_prev - learning_rate * grad

if grad is close to zero , we can get that x_cur is approximately equal to x_prev but not vice versa

x_cur is approximately equal to x_prev does not mean that grad is close to zero (maybe just because learning rate is too small)

Am i right ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a question about auto grad. thx #11

a question about auto grad. thx #11

yebangyu commented May 20, 2023 •

edited

Loading

EdwardRaff commented May 20, 2023 via email

yebangyu commented May 20, 2023 •

edited

Loading

a question about auto grad. thx #11

a question about auto grad. thx #11

Comments

yebangyu commented May 20, 2023 • edited Loading

EdwardRaff commented May 20, 2023 via email

yebangyu commented May 20, 2023 • edited Loading

yebangyu commented May 20, 2023 •

edited

Loading

yebangyu commented May 20, 2023 •

edited

Loading