Skip big models on cpu test to fix CI #6197

YosuaMichael · 2022-06-23T13:35:30Z

Addressing issue #6189

It seems like the CI is broken in windows machine cpu due to the big model.
I have sorted all weights base on the number of parameters and here is what I got:

From this list, it seems that we should start by skipping the regnet_y_128 and vit_h_14, and these models are already listed in skipped_big_models but currently it only skipped for test in cuda device.

In this PR we will use the existing list of skipped_big_models but we also skip in cpu instead of just cuda.

NicolasHug · 2022-06-23T13:42:45Z

Since the default for SKIP_BIG_MODEL is 1, does that mean we'll never actually test these models anymore?

Also side note, probably not too important: the memory footprint of a model is related to the size of its weights but there are other factors like the size and number of the feature maps in conv layers. EDIT: maybe it's not really the case in eval() mode tho

datumbox

@YosuaMichael LGTM, thanks! May I propose one last test before turning them off? We could try using torch.inference_mode() on the tests to try to reduce the memory footprint. This might help with both models (until they break again). Would you be up for testing it?

@NicolasHug yes that's what it means. We already do this for CUDA but it seems now we need to do it for CPU. It's far from ideal but large models like this typically are flaky. We've been switching on/off ViT_H for a while now. If you have an alternative, I'll love to discuss it.

YosuaMichael · 2022-06-23T14:29:28Z

@YosuaMichael LGTM, thanks! May I propose one last test before turning them off? We could try using torch.inference_mode() on the tests to try to reduce the memory footprint. This might help with both models (until they break again). Would you be up for testing it?

will test this on different PR

YosuaMichael · 2022-06-23T14:48:05Z

Since the default for SKIP_BIG_MODEL is 1, does that mean we'll never actually test these models anymore?

Also side note, probably not too important: the memory footprint of a model is related to the size of its weights but there are other factors like the size and number of the feature maps in conv layers. EDIT: maybe it's not really the case in eval() mode tho

Yeah, I think for now we disable the test for the big model until we can find a better way (I plan to revamp test CI on H2 so this will definitely a consideration on how to enable testing on big model)

github-actions · 2022-06-23T14:49:05Z

Hey @YosuaMichael!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Reviewed By: NicolasHug Differential Revision: D37450354 fbshipit-source-id: 997deeced50eea5a3df00ea74f55b6ceadf21caa

Skip big models on cpu

4d726d5

facebook-github-bot added the cla signed label Jun 23, 2022

YosuaMichael self-assigned this Jun 23, 2022

YosuaMichael added the module: tests label Jun 23, 2022

YosuaMichael changed the title ~~Skip big models on cpu~~ Skip big models on cpu test to fix CI Jun 23, 2022

YosuaMichael requested a review from datumbox June 23, 2022 13:38

datumbox approved these changes Jun 23, 2022

View reviewed changes

datumbox mentioned this pull request Jun 23, 2022

Add MViT architecture in TorchVision #6198

Merged

Merge branch 'main' into test/skip-big-model-on-cpu

5290626

YosuaMichael merged commit 32e6341 into pytorch:main Jun 23, 2022

YosuaMichael added the enhancement label Jun 23, 2022

datumbox mentioned this pull request Jun 23, 2022

CI fails on windows: ci/circleci: unittest_windows_cpu_pyX.Y #6189

Closed

facebook-github-bot pushed a commit that referenced this pull request Jun 27, 2022

[fbsync] Skip big models on both cpu and gpu test to fix CI(#6197)

135f6ec

Reviewed By: NicolasHug Differential Revision: D37450354 fbshipit-source-id: 997deeced50eea5a3df00ea74f55b6ceadf21caa

YosuaMichael mentioned this pull request Sep 5, 2022

Skip big models per platform/device #6539

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip big models on cpu test to fix CI #6197

Skip big models on cpu test to fix CI #6197

YosuaMichael commented Jun 23, 2022 •

edited

Loading

NicolasHug commented Jun 23, 2022 •

edited

Loading

datumbox left a comment

YosuaMichael commented Jun 23, 2022

YosuaMichael commented Jun 23, 2022

github-actions bot commented Jun 23, 2022

Skip big models on cpu test to fix CI #6197

Skip big models on cpu test to fix CI #6197

Conversation

YosuaMichael commented Jun 23, 2022 • edited Loading

NicolasHug commented Jun 23, 2022 • edited Loading

datumbox left a comment

Choose a reason for hiding this comment

YosuaMichael commented Jun 23, 2022

YosuaMichael commented Jun 23, 2022

github-actions bot commented Jun 23, 2022

YosuaMichael commented Jun 23, 2022 •

edited

Loading

NicolasHug commented Jun 23, 2022 •

edited

Loading