Default learning rate and other hyper params. #6

msultan · 2017-12-09T17:28:39Z

Based upon some testing, I am starting to think that the default learning rate of 1e-4 is probably too low for our applications and might be better to bump it up to 5e-3 or even 1e-2. This is mostly based on empirical observations that the higher learning rates tend to get "similar" looking models even with differing architectures, batch sizes, and number of epochs. It also helps that we have the Adam optimizer which can attenuate the rate as training goes forward.

brookehus · 2017-12-09T22:17:14Z

Could also look at adaptive learning rate as a function of epoch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default learning rate and other hyper params. #6

Default learning rate and other hyper params. #6

msultan commented Dec 9, 2017

brookehus commented Dec 9, 2017

Default learning rate and other hyper params. #6

Default learning rate and other hyper params. #6

Comments

msultan commented Dec 9, 2017

brookehus commented Dec 9, 2017