[Feature request]: Improve functionality of GradientCheckCallback
by recording skipped samples
#231
Labels
enhancement
New feature or request
Feature/behavior summary
The callback allows us to skip training samples that produce NaNs, which would otherwise break model training instantly. The hope is that these will eventually get picked up in later epochs, but we currently do not have a great way to monitor this.
The feature suggestion would be to refactor the callback to record indices of skipped samples to make histograms for viewing the frequency of skipping per sample. This would hopefully diagnose problematic training runs.
Request attributes
Related issues
No response
Solution description
Ostensibly, we would need to:
wandb
or related platforms can be used to view the records meaningfully.Additional notes
No response
The text was updated successfully, but these errors were encountered: