-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: changed the assertions to make sure the num_updates is a multiple of num_evaluations #1083
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Louay-Ben-nessir, can you as well check if the other systems suffer from the same problem or not 🙏
A suggestion here that would remove the need for the assert and make mava easier to configure is to change the variable to evaluation frequency and then store num_evaluations in the config as |
I think this issue is exclusive to ppo systems
This a huge improvement over the current implementation but it's still not exact in some cases. losing some updates is worth it for the flexibility tho so I'll change it. |
Ah right we could lose some updates. @RuanJohn has had an issue with this in the past, so maybe a jnp.ceil is needed here, just double check with him |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hard assert here is a bit too strict in my opinion.
Something we could do is to make a warning that says the number of timesteps someone is assuming their experiment will run for might not happen and then give the total number of timesteps that will run.
What?
Changed the assertions to make sure the num_updates is a multiple of num_evaluations.
Why?
Only num_evaluation * num_updates_per_eval are ran while training which can lead to some missed updates if the num_updates is not a multiple of num_evaluations.
How?
changed the assertions.