FINN v0.10 released! #1026
auphelia
announced in
Announcements
Replies: 1 comment 1 reply
-
Does finn support models with input batchsize (N) >1 for CNNs? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
FINN release v0.10
FINN v0.10 is finally here!
Over the last year, we have invested a lot of time refactoring FINN and implementing exciting new features.
As already indicated in the last release, we have continued to work on operator hardening (RTL variants of important HLS layers) and, in the process, made the custom layer integration more flexible. This is the most disruptive change in this release, and you can find more details in the section on Refactoring of Custom Operator Infrastructure.
In addition, we have updated the contributions guidelines. The Docker setup was updated to Ubuntu 22.04 and Python 3.10, and we recommend using FINN with Vivado/Vitis 2022.2. You can read more about this in this blog post.
But now let's talk about the improvements and highlights in more detail:
Refactoring of Custom Operator Infrastructure
The FINN compiler was developed with the assumption that the hardware blocks corresponding to the neural network layers are developed based on HLS. Although we do not want to abolish this HLS implementation, it has become apparent over the years that for certain modules, it makes sense to implement them in RTL. This gives us greater control over the resulting hardware, and we can make optimal use of FPGA resources.
So, with the growth of more and more RTL variants of common FINN hardware building blocks, we decided to refactor the custom operator class structure and to modify the builder steps.
New Class Hierarchy
Previously, fpgadataflow nodes were derived from the HLSCustomOp class, which was derived from the CustomOp class coming from the qonnx toolkit. We have split the HLSCustomOp class into three classes:
Every fpgadataflow node now has up to three representations. Let’s have a look at an example:
The FMPadding node is used to implement padding in a convolution. With the new structure there are 3 Python classes related to FMPadding:
Here is a class diagram of the new fpgadataflow custom op class hierarchy
Please, note that not all layers have an HLS and RTL variant, sometimes they have only one variant.
Updated FINN Flow
Since the new class hierarchy introduced an additional layer of expressing the model (HW abstraction nodes), the previous
step_convert_to_hls
was replaced bystep_convert_to_hw
and now converts standard ONNX layers to HW abstraction layers. We then introduced an additional builder step calledstep_specialize_layers
. In this step HW nodes get specialized to either an HLS or RTL variant of the node.They get converted either based on pre-determined rules or the user provides a configuration file which contains the desired setting. If the user preference cannot be fulfilled, a warning will be printed and the implementation style will be set to the default.
You can learn more about how to use this step in the 4_advanced_builder_settings notebook. Thanks to @jmonks-amd, we have a guide on how to convert your current FINN flow to the new builder flow: #1020
New RTL Components
We are excited to announce that after providing RTL variants for the ConvolutionInputGenerator and the FMPadding component (thanks to @fpjentzsch and @maltanar), we now also offer optimized RTL implementations for the key layers: Thresholding and MatrixVectorActivation/VectorVectorActivation.
If you would like to find out more, please have a look at the dedicated Show & Tell posts about these components:
Thanks to @preusser, @azizb-xlnx, @mmrahorovic and @fionnodonohoe-xlnx for your great contributions on these features.
Accumulator Width and Weight Bit Width Minimization
The FINN building blocks have long been capable of automatically reducing the accumulator bit width for individual layers. With this release, we have improved FINN’s automated accumulator bit width reduction methods. We have also added a new method to automatically reduce the weight bit width of a layer based on known weight values (assuming the weights are not runtime-writeable). We have packaged both transformations into a new dataflow step:
step_minimize_bit_width
that can be easily inserted into any FINN flow!Thanks a lot to @i-colbert for your contributions on this! This work was in context of our research about Accumulator-Aware quantization (A2Q) which is a new quantization-aware training technique that enables users to train models for a target accumulator bit width during inference Colbert et al., 2023; “A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance”.
This technique is fully integrated into the Brevitas framework and for an example on how to train a model for a specified accumulator bit width, please see this example in Brevitas. We are also working on an end-to-end example of how to use A2Q with FINN. So, please, stay tuned!
Other Improvements and New Features
Next to these highlights, we also invested in other improvements and new features. Please find a list below. We thank all contributors. You’ll find kudos in parentheses after each contribution identifying external contributors by their GitHub account names.
o Tutorial about QONNX export and QONNX -> FINN-ONNX conversion (@heborras),
o Tutorial about Folding Factors (@shashwat1198),
o Tutorial about Advanced Builder Settings .
This has been a major release, and we would like to thank all contributors for their amazing work!
Have fun trying out the new flow and features!
The FINN Team
Beta Was this translation helpful? Give feedback.
All reactions