Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAX Check loss and perf #111

Merged
merged 22 commits into from
Jul 20, 2023
Merged

PAX Check loss and perf #111

merged 22 commits into from
Jul 20, 2023

Conversation

maanug-nv
Copy link
Contributor

  • Adds baselines for 4 of the PAX mgmn tests
  • Adds pytest files to compare tensorboard file result of mgmn tests against baselines
  • Adds job to run above pytest in workflow

@maanug-nv maanug-nv requested a review from terrykong July 7, 2023 17:28
@maanug-nv maanug-nv marked this pull request as ready for review July 8, 2023 00:40
@maanug-nv maanug-nv requested review from yhtang and ashors1 July 10, 2023 19:06
@terrykong
Copy link
Contributor

It would be neat to surface the results of this either in a GITHUB_STEP_SUMMARY or a badge somewhere. Any thoughts about adding that?

@maanug-nv
Copy link
Contributor Author

Agreed, that's my next step 😄. Just wanted to split into two PRs.

@yhtang
Copy link
Collaborator

yhtang commented Jul 17, 2023

How are the baseline numbers in the JSON files derived? We need to document the source of the data so that when a significant change happens, we know where to look for the diffs.

If they are from a particular Workflow run, maybe we can simply reference it and checkout the associated artifacts? This way, the source and methodology of the data is self-explanatory.

@terrykong
Copy link
Contributor

@yhtang Raises a good point about baseline provenance.

What about in addition to the python file that create the baseline jsons from artifacts, you add a script that takes jobids and pulls the artifacts and calls this python script? That way from the jobid, we can encode the URL of the workflow run and the date it ran in the JSON for inspection and repeatability?

@yhtang
Copy link
Collaborator

yhtang commented Jul 18, 2023

What about in addition to the python file that create the baseline jsons from artifacts, you add a script that takes jobids and pulls the artifacts and calls this python script? That way from the jobid, we can encode the URL of the workflow run and the date it ran in the JSON for inspection and repeatability?

I like this idea. Essentially, we can wrap this up as a reusable workflow, say _compare_metrics.yaml, which we can mentally regard as a function. This workflow/function will accept 2 workflow/job IDs, download the associated artifacts, and determine if the differences between certain metrics (can be specified via an input) is large.

This way, we can reuse this workflow for comparing results for PAX, T5X, JAX, TE, etc...

@maanug-nv maanug-nv merged commit f3b2168 into main Jul 20, 2023
12 checks passed
@yhtang yhtang deleted the maanug/pax-check-metrics branch July 21, 2023 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants