Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too few Fps #7

Open
nagadit opened this issue Oct 18, 2020 · 9 comments
Open

Too few Fps #7

nagadit opened this issue Oct 18, 2020 · 9 comments

Comments

@nagadit
Copy link

nagadit commented Oct 18, 2020

The speed of work does not correspond to the declared one, or I am doing something wrong.
GPU - 2080TI
Help with this pls.

Снимок экрана 2020-10-19 в 02 10 12

@noyami2033
Copy link

@nagadit Me too, I test the runner in the InferenceWrapper class and got about 380 ms per frame on gtx 1060.

@nagadit
Copy link
Author

nagadit commented Oct 22, 2020

@egorzakharov need your intervention

@egorzakharov
Copy link
Collaborator

egorzakharov commented Oct 24, 2020

Hi! @nagadit

First of all, in this pipeline, you are evaluating the full model (initialization + inference) and external cropping function, not just inference. Cropping function consists of a face detector and landmarks detector (face-alignment library), which can be optimized further, we just did not do it in this repository. For a real-time application, you need to train the model from scratch using a face and landmarks detector that works in real-time (like Google's MediaPipe). Note that this issue is common across all methods which utilize keypoints as their pose representation.

You can crop data externally via the infer.InferenceWrapper.preprocess_data function and call forward with crop_data=False, then you will only measure initialization + inference speed. Moreover, I would recommend running one basic optimization, which I simply forgot to include in this inference example: module.apply(runner.utils.remove_spectral_norm).

Lastly, if you want to measure the speed of the inference generator only, then you need to perform a forward pass of only this network. as mentioned in our article. We additionally speed it up by calling the runner.utils.prepare_for_mobile_inference function after it has been initialized with adaptive parameters.

Hope this helps!

@egorzakharov
Copy link
Collaborator

This closely follows a pipeline that we have developed in our mobile application: a computationally heavy initialization part runs separately in the PyTorch Mobile framework, and then we optimize a personalized inference generator by converting it into ONNX followed by SNPE for the real-time frame-by-frame inference.

By the way, I have pushed the remove_spectral_norm hack into master.

@ak9250
Copy link

ak9250 commented Oct 24, 2020

@egorzakharov could you also share the onnx weights?

@nagadit
Copy link
Author

nagadit commented Oct 24, 2020

@egorzakharov

Thank you very much for such an informative answer, I will try to do something about it.

@egorzakharov
Copy link
Collaborator

@ak9250 I will ask my colleagues for approval, but I believe the conversion to ONNX was very simple, ONNX -> SNPE was much trickier.

@noyami2033
Copy link

@egorzakharov
Thank you very much! I tested use remove_spectral_norm function and do speed up the process from 380ms to 260ms.
But when using "runner.apply(rn_utils.prepare_for_mobile_inference)", I got some problems below:

in prepare_for_mobile_inference gamma = module.weight.data.squeeze().detach().clone() AttributeError: 'NoneType' object has no attribute 'data'

in prepare_for_mobile_inference mod.weight.data = module.weight.data[0].detach().clone() torch.nn.modules.module.ModuleAttributeError: 'AdaptiveConv2d' object has no attribute 'weight'

could you please also push the prepare_for_mobile_inference to master or give some suggestions?

@noyami2033
Copy link

I tested G_inf on Nvidia 1060, got about 15ms per frame. Thanks for the advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants