-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PAX Check loss and perf #111
Conversation
maanug-nv
commented
Jul 7, 2023
- Adds baselines for 4 of the PAX mgmn tests
- Adds pytest files to compare tensorboard file result of mgmn tests against baselines
- Adds job to run above pytest in workflow
It would be neat to surface the results of this either in a GITHUB_STEP_SUMMARY or a badge somewhere. Any thoughts about adding that? |
Agreed, that's my next step 😄. Just wanted to split into two PRs. |
How are the baseline numbers in the JSON files derived? We need to document the source of the data so that when a significant change happens, we know where to look for the diffs. If they are from a particular Workflow run, maybe we can simply reference it and checkout the associated artifacts? This way, the source and methodology of the data is self-explanatory. |
@yhtang Raises a good point about baseline provenance. What about in addition to the python file that create the baseline jsons from artifacts, you add a script that takes jobids and pulls the artifacts and calls this python script? That way from the jobid, we can encode the URL of the workflow run and the date it ran in the JSON for inspection and repeatability? |
I like this idea. Essentially, we can wrap this up as a reusable workflow, say This way, we can reuse this workflow for comparing results for PAX, T5X, JAX, TE, etc... |