SAMO: Automatic SIMD and PE optimization for FINN #693

AlexMontgomerie · 2022-09-27T12:37:31Z

AlexMontgomerie
Sep 27, 2022

SAMO: Streaming-Architecture to FPGA Mapping Optimiser

When designing a DNN Accelerator for an FPGA device, there is a constant trade-off between exploiting parallelism for extra performance, and the resource cost associated with it. For Systolic Array Architectures, this trade-off is a lot more straightforward, where the only tunable parallelism dimensions are the PE array size. However, for Streaming Architectures such as FINN, HLS4ML and fpgaConvNet, the design space is a lot greater. Streaming Architectures tend to map each layer of the DNN model to an individual hardware block, which has its own tunable performance parameters. This large design space has no straightforward approach for solving, thus we have provided a toolbox, samo which utilises existing optimisation solvers to address this problem.

Our tool solves the optimisation problem of getting the best performance out of a design whilst staying within resource limits. It removes the complicated and tedious task of tuning performance for a given platform-network pair. The rapid design space exploration performed by SAMO gets out an optimal hardware configuration for the DNN model, leaving the designer to solely focus on their application.

For FINN in particular, there are generally two parallelism dimensions for each layer: the input channel parallelism $\mathbf{s}^i$ refers to the number of SIMD lanes for all convolution layers; and the output channel parallelism $\mathbf{s}^o$, which represents the number of PEs for the same convolution layers. Given the objective of minimising latency $L$, and the constraint on resource $\mathcal{R}$ we can define an optimisation problem:

$$ \min L(\mathbf{s}^i, \mathbf{s}^o) \; s.t. \; \mathcal{R}( \mathbf{s}^i, \mathbf{s}^o) < \mathcal{R}_{platform} $$

We also introduce further constraints on the parallelism values, such as them being factors of the channel dimension, and so on. Details of which can be found in our paper as well as the code.

Setup

Currently, samo is executed within the FINN docker image. The first step is to clone the samo project:

git clone https://github.com/AlexMontgomerie/samo.git

Then clone a fork of finn which is compatible with the samo tool. You may want to merge in the version of FINN you are currently using.

git clone https://github.com/Yu-Zhewen/finn.git
cd finn
git checkout 4cc0b6fdae2f5c06f0b5bcc6fa45fba4d8b69111

Finally, set SAMO_DIR to the path of the downloaded samo repo in your run-docker.sh, before entering the docker.

export SAMO_DIR = CHOOSE YOUR PATH
bash run-docker.sh

Usage

The original FINN compiler contains many transformation passes that modify the ONNX representation all the way to hardware. SAMO is integrated into this transformation flow by pausing the compiler at the pass when a "dataflow partition" is generated.

The transformation passes before the "dataflow partition" stage are referred to as "pre_optimiser_steps" and they produce the "${network}_pre_optimiser.onnx" for optimisation.

cd ../samo
cp models/${network}.onnx outputs/saved/finn/${network}.onnx
cp ../finn/notebooks/samo/config/${network}.json ../finn/notebooks/samo/config.json
jupyter nbconvert --to notebook --execute ../finn/notebooks/samo/pre_optimiser_steps.ipynb
mv ../finn/notebooks/samo/pre_optimiser_steps.nbconvert.ipynb outputs/saved/finn/${network}_pre_optimiser_steps.nbconvert.ipynb

SAMO then takes over the optimsation of FINN-ONNX, performing the Design Space Exploration, and setting the appropriate SIMD and PE numbers. SAMO exports the optimised FINN-ONNX in "${network}_post_optimiser.onnx"

python -m samo --optimiser annealing --model outputs/saved/finn/${network}_pre_optimiser.onnx  \
    --backend finn --platform platforms/zedboard.json \
    --output-path outputs/saved/finn/${network}_post_optimiser.onnx

Finally, the following command is used to resume the compilation of FINN and generate the hardware.

jupyter nbconvert --to notebook --execute ../finn/notebooks/samo/post_optimiser_steps.ipynb

Compatibility

Using the provided fork of FINN is not mandatory, in case you would like to try SAMO on the latest version of FINN. All you need is to mount the SAMO folder in docker, break the FINN compiltation after the "dataflow partition" pass, and feed the corresponding ONNX files into SAMO.

Citation

@inproceedings{montgomerie-corcoran_samo_2022,
   title = {SAMO: Optimised Mapping of Convolutional Neural Networks to Streaming Architectures},
   booktitle = {32nd International Conference on Field Programmable Logic and Applications, FPL 2022},
   author = {Montgomerie-Corcoran, Alexander and Yu, Zhewen and Bouganis, Christos-Savvas},
   year = {2022},
}

Please feel free to ask any questions about the tool, or how to use it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAMO: Automatic SIMD and PE optimization for FINN #693

{{title}}

Replies: 0 comments

Select a reply

SAMO: Automatic SIMD and PE optimization for FINN #693

AlexMontgomerie Sep 27, 2022

SAMO: Streaming-Architecture to FPGA Mapping Optimiser

Setup

Usage

Compatibility

Citation

Replies: 0 comments

AlexMontgomerie
Sep 27, 2022