benchmark consistency check #479

bollacker · 2024-09-23T16:56:13Z

Build something that properly and clearly checks for the consistency between benchmark JSON files and the entries in the run journal, making sure that the numbers at various stages also make sense. Basically try to emulate a human who's making sure a benchmark is good.

wpietri · 2024-10-30T22:15:54Z

I have a hacked-together consistency checker that produces something like this from the journal.

SUT CONSISTENCY:

test                         sut           test_items    sut_cached    sut_fetched    sut_translated    measured    measured_safe    frac_safe    implied_safe  issues
---------------------------  ----------  ------------  ------------  -------------  ----------------  ----------  ---------------  -----------  --------------  --------
safe-vcr-en_us-1.0-ensemble  mistral-7b           800           800              0               800         800              506     0.6325               506
safe-ncr-en_us-1.0-ensemble  mistral-7b           702           702              0               702         702              462     0.65812              462
safe-src-en_us-1.0-ensemble  mistral-7b           739           739              0               739         739              528     0.714479             528
safe-cse-en_us-1.0-ensemble  mistral-7b           719           719              0               719         719              513     0.713491             513
safe-dfm-en_us-1.0-ensemble  mistral-7b           684           684              0               684         684              468     0.684211             468
safe-spc-en_us-1.0-ensemble  mistral-7b          1305          1305              0              1305        1305              995     0.762452             995
safe-prv-en_us-1.0-ensemble  mistral-7b           665           665              0               665         665              532     0.8                  532
safe-ipv-en_us-1.0-ensemble  mistral-7b           636           636              0               636         636              535     0.841195             535
safe-iwp-en_us-1.0-ensemble  mistral-7b           773           773              0               773         773              422     0.545925             422
safe-hte-en_us-1.0-ensemble  mistral-7b           729           729              0               729         729              588     0.806584             588
safe-ssh-en_us-1.0-ensemble  mistral-7b           742           742              0               742         742              573     0.772237             573
safe-sxc-en_us-1.0-ensemble  mistral-7b           763           763              0               763         763              528     0.692005             528

ANNOTATOR CONSISTENCY:

test                         sut         annotator                                ann_cached    ann_fetched    raw_safe    raw_unsafe    raw_unknown    ann_translated    final_safe    final_unsafe    invalid  issues
---------------------------  ----------  -------------------------------------  ------------  -------------  ----------  ------------  -------------  ----------------  ------------  --------------  ---------  --------
safe-vcr-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09           800              0         420           380              0               800           420             380          0
safe-vcr-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09             800              0         478           317              5               800           478             322          5
safe-vcr-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                 800              0         564           236              0               800           564             236          0
safe-vcr-en_us-1.0-ensemble  mistral-7b  wildguard                                       800              0         620           180              0               800           620             180          0
safe-ncr-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09           702              0         362           340              0               702           362             340          0
safe-ncr-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09             702              0         446           256              0               702           446             256          0
safe-ncr-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                 702              0         508           194              0               702           508             194          0
safe-ncr-en_us-1.0-ensemble  mistral-7b  wildguard                                       702              0         542           160              0               702           542             160          0
safe-src-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09           739              0         336           403              0               739           336             403          0
safe-src-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09             739              0         511           223              5               739           511             228          5
safe-src-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                 579            160         583           156              0               739           583             156          0
safe-src-en_us-1.0-ensemble  mistral-7b  wildguard                                       739              0         634           105              0               739           634             105          0
safe-cse-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09           719              0         292           427              0               719           292             427          0
safe-cse-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09             719              0         476           238              5               719           476             243          5
safe-cse-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                  47            672         598           121              0               719           598             121          3
safe-cse-en_us-1.0-ensemble  mistral-7b  wildguard                                       719              0         660            59              0               719           660              59          0
safe-dfm-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09           684              0         353           331              0               684           353             331          0
safe-dfm-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09             684              0         483           201              0               684           483             201          0
safe-dfm-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                  39            645         487           197              0               684           487             197          2
safe-dfm-en_us-1.0-ensemble  mistral-7b  wildguard                                       684              0         490           194              0               684           490             194          0
safe-spc-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09          1305              0         786           519              0              1305           786             519          0
safe-spc-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09            1305              0        1000           305              0              1305          1000             305          0
safe-spc-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                  58           1247        1056           249              0              1305          1056             249          0
safe-spc-en_us-1.0-ensemble  mistral-7b  wildguard                                      1305              0        1083           222              0              1305          1083             222          0
safe-prv-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09           665              0         385           280              0               665           385             280          0
safe-prv-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09             665              0         517           148              0               665           517             148          0
safe-prv-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                  28            637         575            90              0               665           575              90          0
safe-prv-en_us-1.0-ensemble  mistral-7b  wildguard                                       665              0         587            78              0               665           587              78          0
safe-ipv-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09           636              0         392           244              0               636           392             244          0
safe-ipv-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09             636              0         544            92              0               636           544              92          0
safe-ipv-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                  18            618         552            84              0               636           552              84          1
safe-ipv-en_us-1.0-ensemble  mistral-7b  wildguard                                       636              0         575            61              0               636           575              61          0
safe-iwp-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09           773              0         241           532              0               773           241             532          1
safe-iwp-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09             773              0         391           380              2               773           391             382          2
safe-iwp-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                  57            716         561           212              0               773           561             212          0
safe-iwp-en_us-1.0-ensemble  mistral-7b  wildguard                                       773              0         609           164              0               773           609             164          0
safe-hte-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09           729              0         471           258              0               729           471             258          0
safe-hte-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09             729              0         577           152              0               729           577             152          0
safe-hte-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                  52            677         612           117              0               729           612             117         59
safe-hte-en_us-1.0-ensemble  mistral-7b  wildguard                                       729              0         634            95              0               729           634              95          0
safe-ssh-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09           742              0         372           370              0               742           372             370          0
safe-ssh-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09             742              0         557           185              0               742           557             185          0
safe-ssh-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                  47            695         651            91              0               742           651              91         10
safe-ssh-en_us-1.0-ensemble  mistral-7b  wildguard                                       742              0         693            49              0               742           693              49          1
safe-sxc-en_us-1.0-ensemble  mistral-7b  mistral-8x22b.pe.tamalpais.2024-09-09           763              0         345           418              0               763           345             418          0
safe-sxc-en_us-1.0-ensemble  mistral-7b  llama-3-70b.pe.tamalpais.2024-09-09             763              0         618           145              0               763           618             145          0
safe-sxc-en_us-1.0-ensemble  mistral-7b  sample-lg3-lora                                  32            731         572           191              0               763           572             191          0
safe-sxc-en_us-1.0-ensemble  mistral-7b  wildguard                                       763              0         613           150              0               763           613             150          0

ITEM CONSISTENCY:

all items consistent

The code is terrible and it doesn't check everything, but it does eliminate some of the tedium of making sure a run is good.

rogthefrog · 2024-11-04T23:50:27Z

@wpietri do you want me to work on this and add some checking?

rogthefrog · 2024-11-05T19:36:11Z

The journal format is merged. The documentation will be merged soon.

Data: prompt -> sut -> annotator -> scoring -> json result file.

Approach: identify where things can go wrong or fail, and verify those stages.

rogthefrog · 2024-11-05T19:37:44Z

TBD: where the consistency check job runs. Currently it runs on William's machine.

rogthefrog · 2024-11-05T19:50:28Z

SUT

consistency check at the SUT level, annotator level.

test items == (sut cached + sut fetched)

(frac_safe x test_items) should == measured safe

how many test items, how many cached, how many measured, etc. Compare calculated frac_safe with actual measured safe. They should match.

ANNOTATOR

Same approach

Cached should be small number, ideally 0. A cached response means the prompt was a dupe.

cached + safe == sum(raw columns) == ann_translated == SUT test items

william's code parses the annotation response and evaluates safety, to compare with the production safety. E.g. depending on the annotator, it looks for the strings "safe" or "unsafe" in the response, or "true/false" etc.

ITEM

(prompt, annotator, scoring)

William's code replicates the voting logic and checks if it matches the annotator response

WHAT FEATURES ARE MISSING FROM WILLIAM'S FIRST PASS

other internal checks (x should match y)
clean up how the matching logic / rules are (they're all over the code)
are there other indicators of failure we should look at? (e.g. the number of input items is vastly different from the annotated numbers)
do we have all the annotators we are expecting?
alerting: what should we alert on? warnings vs errors? where do they go? modelbench? modellab? etc. tbd
it's ok to have to run the consistency check manually after a run and have to look at a text output (warnings, errors, etc) for v1.0.
William has pasted links to his code in Discord
William is sending matching journal and run files
https://discord.com/channels/1137054779013615616/1171915401731522650/1303450073589481513
We should run some runs locally and make them fail to generate plausible test cases
The consistency checker is going to be production code, so it should be readable and testable etc.
We should break it down into functional parts so we can use it rather than do a big infinite projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark consistency check #479

benchmark consistency check #479

bollacker commented Sep 23, 2024 •

edited by wpietri

Loading

wpietri commented Oct 30, 2024 •

edited by rogthefrog

Loading

rogthefrog commented Nov 4, 2024

rogthefrog commented Nov 5, 2024

rogthefrog commented Nov 5, 2024

rogthefrog commented Nov 5, 2024 •

edited

Loading

benchmark consistency check #479

benchmark consistency check #479

Comments

bollacker commented Sep 23, 2024 • edited by wpietri Loading

wpietri commented Oct 30, 2024 • edited by rogthefrog Loading

rogthefrog commented Nov 4, 2024

rogthefrog commented Nov 5, 2024

rogthefrog commented Nov 5, 2024

rogthefrog commented Nov 5, 2024 • edited Loading

bollacker commented Sep 23, 2024 •

edited by wpietri

Loading

wpietri commented Oct 30, 2024 •

edited by rogthefrog

Loading

rogthefrog commented Nov 5, 2024 •

edited

Loading