About the width and something else #848

wanglu4042 · 2023-07-07T12:12:07Z

wanglu4042
Jul 7, 2023

Hi, I just finished a net with bit width=4, and I changed it into 2 and run it on the PYNQ-z2, but the runtime in the throughput_test did not change, they are basically equal as 1.37ms.
And I deleted a Conv layer, and it still did not change.
Could you tell me the reason?
Another question is that, when bit width came to 2, the weights were -1, 0 and 1, should I use the API Absorb1BitMulIntoConv and Absorb1BitMulIntoMatMul, as well as the InferBinaryMatrixVectorActivation?
Looking forward to your reply!
Thanks

Answered by auphelia

Jul 13, 2023

Hi @wanglu4042 ,

If you have a 2-bit quantized network, you will not need transformations that are specialized for 1-bit compute.

For the observation during throughput test:

FINN implements the network in a dataflow style. That means that each layer is implemented individually and the data is pushed through the layers like a pipeline. The throughput is thus determined by the slowest layer and not necessarily influenced by the number of layers. By increasing the parallelism per layer you might be able to achieve higher throughput. Please have a look at this tutorial about folding factors.

View full answer

auphelia · 2023-07-13T14:49:31Z

auphelia
Jul 13, 2023
Maintainer

Hi @wanglu4042 ,

If you have a 2-bit quantized network, you will not need transformations that are specialized for 1-bit compute.

For the observation during throughput test:

FINN implements the network in a dataflow style. That means that each layer is implemented individually and the data is pushed through the layers like a pipeline. The throughput is thus determined by the slowest layer and not necessarily influenced by the number of layers. By increasing the parallelism per layer you might be able to achieve higher throughput. Please have a look at this tutorial about folding factors.

2 replies

wanglu4042 Jul 14, 2023
Author

Hi, thanks for your reply!
The reason I asked this question is that I want to build a net with one bit width weight, but I can't train the net when I set the bit width to 1.
I do have an another question about the net, I tried to change lots of data about the folded model, for example change the SIMD of one layer from 8 to 200, delete one layer and so on. But the run time of the model on FPGA are all the same , 1.37 ms.(batch size =1)
Why was that? Can I try some other things to reduce it ?
Looking forward to your reply!
Thanks again

auphelia Jul 18, 2023
Maintainer

Hi @wanglu4042 ,

For questions about quantized training, it might be better to ask on the Brevitas GitHub page.

About your question for the runtime:
1.) As described in the previous answer, changing the parallelism of one layer will not necessarily influence the overall throughput. Please refer to the tutorial I've linked above.
2.) If you are using batch size = 1 for the throughput test, your measurement will be strongly biased by the overhead coming from the Python runtime. I would suggest increasing the batch size in steps (of e.g. 200) and plotting the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the width and something else #848

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

About the width and something else #848

wanglu4042 Jul 7, 2023

Replies: 1 comment · 2 replies

auphelia Jul 13, 2023 Maintainer

wanglu4042 Jul 14, 2023 Author

auphelia Jul 18, 2023 Maintainer

wanglu4042
Jul 7, 2023

Replies: 1 comment 2 replies

auphelia
Jul 13, 2023
Maintainer

wanglu4042 Jul 14, 2023
Author

auphelia Jul 18, 2023
Maintainer