Get performance estimates without deployment #428

hossein1387 · 2021-11-16T04:37:35Z

hossein1387
Nov 16, 2021

Hi,

I am trying to get some performance estimate of running quantized models on FINN. What I want to do is given an onnx model (a classification model), what would be the throughput of it using FINN? I cam across these code across finn repo: https://github.com/Xilinx/finn-base/blob/52ce94b9f39179189dd44179c460d0b145cafcf1/src/finn/core/throughput_test.py#L115

finn/tests/fpgadataflow/test_fpgadataflow_globalaccpool.py

Line 127 in d1cc9cf

exp_cycles_dict = model.analysis(exp_cycles_per_layer)

https://github.com/daiki98/finn/blob/b8f1635f557891f60ede71a54ee1f4385f5e1c6a/tests/fpgadataflow/test_fpgadataflow_channelwise_ops.py#L159
However, I am not able to run it. Is there any sample code that I can follow?

rpitonak · 2021-11-16T08:08:46Z

rpitonak
Nov 16, 2021

I do not know if it is helpful but you can try to use AllocateResources transformation from FINN experimental.

https://github.com/Xilinx/finn-experimental/blob/main/src/finn/transformation/fpgadataflow/allocate_resources.py

It utilizes SetFolding transformation

https://github.com/Xilinx/finn/blob/main/src/finn/transformation/fpgadataflow/set_folding.py

which according to docstring:

In the returned model, each node's
cycles_estimate attribute will be set to its estimated number of cycles

Example code:

from finn.transformation.fpgadataflow.allocate_resources import AllocateResources
from finn.core.modelwrapper import ModelWrapper

model = ModelWrapper(f"{MODEL_PATH}-dataflow_model.onnx")

model = model.transform(AllocateResources(clk_ns=10, fps_target=20, platform="Pynq-Z1"))

model.save(f"{MODEL_PATH}-dataflow_model-folded.onnx")

Then you can inspect the resulting .onnx model using netron or with the code similar to this:

model = ModelWrapper(f"{MODEL_PATH}-dataflow_model-folded.onnx")

fc_layers = model.get_nodes_by_op_type("StreamingFCLayer_Batch")
for fcl in fc_layers:
    for attr in fcl.attribute:
        if attr.name == "cycles_estimate":
            print(f"{fcl.name} {attr.name} {attr.i}")

The output will be something like this:

StreamingFCLayer_Batch_0 cycles_estimate 27648
StreamingFCLayer_Batch_1 cycles_estimate 20736
StreamingFCLayer_Batch_2 cycles_estimate 524288
...

I am not sure if this is the best possible solution. Let me know if something is unclear.

13 replies

fpjentzsch Nov 17, 2021
Collaborator

Also, you can get rough performance estimates without access to Xilinx tools, namely the FINN-calculated resource estimation and expected cycle count per HLS block (for the applied folding configuration), from which you can derive bottlenecks in the pipeline and thus an upper bound on throughput.

The best performance "estimate" without actual deployment would be RTL simulation of the whole accelerator, which yields accurate throughput measurements. Naturally, this step requires Vivado HLS.

hossein1387 Nov 17, 2021
Author

Thanks for the reply. To answer your questions:

Did you try with the latest FINN dev branch? Yes I am using the latest FINN dev branch.
You should only need to set FINN_XILINX_PATH and FINN_XILINX_VERSION, and you should definitely not run ./run-docker.sh as root. After your suggestion, I am now back to my Mac just to get the estimates and not the rtl estimates.
FINN should be cloned to a directory where your user has full access (e.g. under your home) I am running Finn in my home directory.

As I said, I am using this notebook: https://github.com/Xilinx/finn/blob/main/notebooks/end2end_example/cybersecurity/3-build-accelerator-with-finn.ipynb

However, I am unable to run build.build_dataflow_cfg(model_file, cfg_estimates) even for simple models. For instance, I created a simple 1 Conv layer

but I get the following error:

Building dataflow accelerator from /Users/hossein/MyRepos/Experiments/finn/SimpleConv.onnx
Intermediate outputs will be generated in $(/Users/hossein/finn_output)
Final outputs will be generated in output_estimates_only
Build log is at output_estimates_only/build_dataflow.log
Running step: step_qonnx_to_finn [1/8]
Running step: step_tidy_up [2/8]
Running step: step_streamline [3/8]
Running step: step_convert_to_hls [4/8]
Running step: step_create_dataflow_partition [5/8]
Traceback (most recent call last):
  File "/Users/hossein/MyRepos/finn/src/finn/builder/build_dataflow.py", line 166, in build_dataflow_cfg
    model = transform_step(model, cfg)
  File "/Users/hossein/MyRepos/finn/src/finn/builder/build_dataflow_steps.py", line 317, in step_create_dataflow_partition
    assert len(sdp_nodes) == 1, "Only a single StreamingDataflowPartition supported."
AssertionError: Only a single StreamingDataflowPartition supported.
> /Users/hossein/MyRepos/finn/src/finn/builder/build_dataflow_steps.py(317)step_create_dataflow_partition()
    315     )
    316     sdp_nodes = parent_model.get_nodes_by_op_type("StreamingDataflowPartition")
--> 317     assert len(sdp_nodes) == 1, "Only a single StreamingDataflowPartition supported."
    318     sdp_node = sdp_nodes[0]
    319     sdp_node = getCustomOp(sdp_node)

As I mentioned before, I even test it with your own onnx models and I cannot pass build.build_dataflow_cfg function. Could you please let me any possible solution?

hossein1387 Nov 18, 2021
Author

I did a simple experiment. Assuming that Brevitas can generate models compatible for FINN, I wrote the kernel above in Brevitas and export it to FINN as below:

from torch import nn
from torch.nn import Module
import torch.nn.functional as F
import brevitas.nn as qnn
from brevitas.export import FINNManager

class SimpleConv(Module):
    def __init__(self):
        super(SimpleConv, self).__init__()
        self.training = False
        self.conv1 = qnn.QuantConv2d(64, 64, 3, weight_bit_width=2)

    def forward(self, x):
        out = self.conv1(x)
        return out

simple_conv = SimpleConv()

FINNManager.export(simple_conv, input_shape=(1, 64, 32, 32), export_path='SimpleConv.onnx')

I used then used SimpleConv.onnx with this FINN-example: https://github.com/Xilinx/finn-examples/blob/main/build/resnet50/build.py

where I got the following performance estimate:

  "critical_path_cycles": 0,
  "max_cycles": 1,
  "max_cycles_node_name": "",
  "estimated_throughput_fps": 250000000.0,
  "estimated_latency_ns": 0.0
}
"ConvDoublePacked_Batch_0": {
    "BRAM_18K": 43,
    "BRAM_efficiency": 0.37209302325581395,
    "LUT": 7762,
    "URAM": 0,
    "URAM_efficiency": 1,
    "DSP": 1536
  },
  "total": {
    "BRAM_18K": 43.0,
    "LUT": 7762.0,
    "URAM": 0.0,
    "DSP": 1536.0
  }
}

However, if i do the same thing with other models, like resnet18. resnet50 etc, this method does not work and I get an error like this:

Analysing models/resnet9.onnx...
Building dataflow accelerator from models/resnet9.onnx
Intermediate outputs will be generated in $(/Users/hossein/finn_output)
Final outputs will be generated in output_resnet9_U250
Build log is at output_resnet9_U250/build_dataflow.log
Running step: step_resnet50_tidy [1/6]
Running step: step_resnet50_streamline [2/6]
Traceback (most recent call last):
  File "/Users/hossein/MyRepos/finn/src/finn/builder/build_dataflow.py", line 166, in build_dataflow_cfg
    model = transform_step(model, cfg)
  File "/Users/hossein/MyRepos/finn-examples/build/resnet50/custom_steps.py", line 204, in step_resnet50_streamline
    model = step_resnet50_streamline_nonlinear(model, cfg)
  File "/Users/hossein/MyRepos/finn-examples/build/resnet50/custom_steps.py", line 195, in step_resnet50_streamline_nonlinear
    model = model.transform(trn)
  File "/Users/hossein/MyRepos/finn-base/src/finn/core/modelwrapper.py", line 141, in transform
    (transformed_model, model_was_changed) = transformation.apply(
  File "/Users/hossein/MyRepos/finn/src/finn/transformation/streamline/reorder.py", line 558, in apply
    init1 = model.get_initializer(prod1.input[1])
  File "/Users/hossein/miniforge3/lib/python3.9/site-packages/google/protobuf/internal/containers.py", line 209, in __getitem__
    return self._values[key]
IndexError: list index out of range
> /Users/hossein/miniforge3/lib/python3.9/site-packages/google/protobuf/internal/containers.py(209)__getitem__()
-> return self._values[key]

@rpitonak @maltanar
I am still hoping someone could explain to me how I can get estimate of my models and not the ones that are provided in FINN zoo.

maltanar Nov 18, 2021
Maintainer

@hossein1387 to provide estimates for a network, the FINN compiler needs to be able to first convert everything into HLS layers, and to be able to do there are a number of things that need to match up (e.g. data layout, quantization, certain other restrictions for other layers, successful streamlining). Currently, the out-of-the-box FINN flow is only able to process relatively simple linear topologies, adding custom steps is required to make it work for ResNets and similar. We hope to increase FINN's flexibility in the coming releases, but for now, making custom models (e.g. with non-linear topologies) requires some familiarity with FINN's internals for writing custom steps.

hossein1387 Nov 18, 2021
Author

@maltanar thanks a lot for your answer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get performance estimates without deployment #428

{{title}}

Replies: 1 comment 13 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Get performance estimates without deployment #428

hossein1387 Nov 16, 2021

Replies: 1 comment · 13 replies

rpitonak Nov 16, 2021

fpjentzsch Nov 17, 2021 Collaborator

hossein1387 Nov 17, 2021 Author

hossein1387 Nov 18, 2021 Author

maltanar Nov 18, 2021 Maintainer

hossein1387 Nov 18, 2021 Author

hossein1387
Nov 16, 2021

Replies: 1 comment 13 replies

rpitonak
Nov 16, 2021

fpjentzsch Nov 17, 2021
Collaborator

hossein1387 Nov 17, 2021
Author

hossein1387 Nov 18, 2021
Author

maltanar Nov 18, 2021
Maintainer

hossein1387 Nov 18, 2021
Author