Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip big models on cpu test to fix CI #6197

Merged
merged 2 commits into from
Jun 23, 2022

Conversation

YosuaMichael
Copy link
Contributor

@YosuaMichael YosuaMichael commented Jun 23, 2022

Addressing issue #6189

It seems like the CI is broken in windows machine cpu due to the big model.
I have sorted all weights base on the number of parameters and here is what I got:
image
From this list, it seems that we should start by skipping the regnet_y_128 and vit_h_14, and these models are already listed in skipped_big_models but currently it only skipped for test in cuda device.

In this PR we will use the existing list of skipped_big_models but we also skip in cpu instead of just cuda.

@YosuaMichael YosuaMichael self-assigned this Jun 23, 2022
@YosuaMichael YosuaMichael changed the title Skip big models on cpu Skip big models on cpu test to fix CI Jun 23, 2022
@NicolasHug
Copy link
Member

NicolasHug commented Jun 23, 2022

Since the default for SKIP_BIG_MODEL is 1, does that mean we'll never actually test these models anymore?

Also side note, probably not too important: the memory footprint of a model is related to the size of its weights but there are other factors like the size and number of the feature maps in conv layers. EDIT: maybe it's not really the case in eval() mode tho

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YosuaMichael LGTM, thanks! May I propose one last test before turning them off? We could try using torch.inference_mode() on the tests to try to reduce the memory footprint. This might help with both models (until they break again). Would you be up for testing it?

@NicolasHug yes that's what it means. We already do this for CUDA but it seems now we need to do it for CPU. It's far from ideal but large models like this typically are flaky. We've been switching on/off ViT_H for a while now. If you have an alternative, I'll love to discuss it.

@YosuaMichael
Copy link
Contributor Author

@YosuaMichael LGTM, thanks! May I propose one last test before turning them off? We could try using torch.inference_mode() on the tests to try to reduce the memory footprint. This might help with both models (until they break again). Would you be up for testing it?

will test this on different PR

@YosuaMichael
Copy link
Contributor Author

Since the default for SKIP_BIG_MODEL is 1, does that mean we'll never actually test these models anymore?

Also side note, probably not too important: the memory footprint of a model is related to the size of its weights but there are other factors like the size and number of the feature maps in conv layers. EDIT: maybe it's not really the case in eval() mode tho

Yeah, I think for now we disable the test for the big model until we can find a better way (I plan to revamp test CI on H2 so this will definitely a consideration on how to enable testing on big model)

@YosuaMichael YosuaMichael merged commit 32e6341 into pytorch:main Jun 23, 2022
@github-actions
Copy link

Hey @YosuaMichael!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request Jun 27, 2022
Reviewed By: NicolasHug

Differential Revision: D37450354

fbshipit-source-id: 997deeced50eea5a3df00ea74f55b6ceadf21caa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants