PAX Check loss and perf #111

maanug-nv · 2023-07-07T17:27:54Z

Adds baselines for 4 of the PAX mgmn tests
Adds pytest files to compare tensorboard file result of mgmn tests against baselines
Adds job to run above pytest in workflow

terrykong · 2023-07-11T23:03:01Z

It would be neat to surface the results of this either in a GITHUB_STEP_SUMMARY or a badge somewhere. Any thoughts about adding that?

maanug-nv · 2023-07-12T01:08:06Z

Agreed, that's my next step 😄. Just wanted to split into two PRs.

.github/workflows/baselines/pytest/test_pax_mgmn_metrics.py

yhtang · 2023-07-17T06:03:37Z

How are the baseline numbers in the JSON files derived? We need to document the source of the data so that when a significant change happens, we know where to look for the diffs.

If they are from a particular Workflow run, maybe we can simply reference it and checkout the associated artifacts? This way, the source and methodology of the data is self-explanatory.

terrykong · 2023-07-17T17:38:40Z

@yhtang Raises a good point about baseline provenance.

What about in addition to the python file that create the baseline jsons from artifacts, you add a script that takes jobids and pulls the artifacts and calls this python script? That way from the jobid, we can encode the URL of the workflow run and the date it ran in the JSON for inspection and repeatability?

yhtang · 2023-07-18T06:30:47Z

What about in addition to the python file that create the baseline jsons from artifacts, you add a script that takes jobids and pulls the artifacts and calls this python script? That way from the jobid, we can encode the URL of the workflow run and the date it ran in the JSON for inspection and repeatability?

I like this idea. Essentially, we can wrap this up as a reusable workflow, say _compare_metrics.yaml, which we can mentally regard as a function. This workflow/function will accept 2 workflow/job IDs, download the associated artifacts, and determine if the differences between certain metrics (can be specified via an input) is large.

This way, we can reuse this workflow for comparing results for PAX, T5X, JAX, TE, etc...

maanug-nv added 11 commits July 3, 2023 16:05

add baseline files

8f3e5dc

add pytest files

22e9e2c

add metrics check job in pax template

13f24d3

set needs

ea761ca

change order of steps and rename artifacts

6696ff8

remove unneeded imports

82fb1be

move debug prints

6fb8c6a

install tensorboard

53cd89a

fix dir structure

a6c50cb

ignore pytest failure

bee8bae

restore sandbox

701761d

maanug-nv requested a review from terrykong July 7, 2023 17:28

maanug-nv marked this pull request as ready for review July 8, 2023 00:40

maanug-nv requested review from yhtang and ashors1 July 10, 2023 19:06

maanug-nv added 3 commits July 13, 2023 18:37

add script for creating baseline/results json

93fb42f

write metrics to step summary

2a4803c

use stats instead of numpy

aad5718

terrykong reviewed Jul 14, 2023

View reviewed changes

.github/workflows/baselines/pytest/test_pax_mgmn_metrics.py Outdated Show resolved Hide resolved

terrykong reviewed Jul 14, 2023

View reviewed changes

.github/workflows/baselines/pytest/test_pax_mgmn_metrics.py Outdated Show resolved Hide resolved

maanug-nv added 3 commits July 14, 2023 14:09

move pip installs

e2f26fd

avoid heredoc to write step summary

b8d8b87

change delta to multiplier, fix conditions

fe37b04

maanug-nv added 2 commits July 18, 2023 15:07

reorg and rename files

02c0ece

baseline generation scripts for arbitrary workflow runs

c638e9a

maanug-nv added 3 commits July 18, 2023 15:55

update baselines

cef46d6

more detailed error message

ab19307

ensure metric step indexes match when averaging

6812537

terrykong assigned maanug-nv Jul 18, 2023

terrykong self-requested a review July 19, 2023 15:23

terrykong approved these changes Jul 19, 2023

View reviewed changes

maanug-nv merged commit f3b2168 into main Jul 20, 2023
12 checks passed

yhtang deleted the maanug/pax-check-metrics branch July 21, 2023 02:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PAX Check loss and perf #111

PAX Check loss and perf #111

maanug-nv commented Jul 7, 2023

terrykong commented Jul 11, 2023

maanug-nv commented Jul 12, 2023

yhtang commented Jul 17, 2023

terrykong commented Jul 17, 2023

yhtang commented Jul 18, 2023

PAX Check loss and perf #111

PAX Check loss and perf #111

Conversation

maanug-nv commented Jul 7, 2023

terrykong commented Jul 11, 2023

maanug-nv commented Jul 12, 2023

yhtang commented Jul 17, 2023

terrykong commented Jul 17, 2023

yhtang commented Jul 18, 2023