-
Hey all, I am currently trying to fully unfold a small fully-connected network very similar to the tfc-w2a2 example but with reduced number of neurons. Building does not produce any errors, but the resulting stitched IP seems to be missing parts. According to the estimation the design should require 24721 LUT, the OOC synthesis shows only 242 LUT. The network has 16 neurons in each layer and I set the folding config as shown below. { EDIT: And this is the onnx for the network |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi @Sekijoju -- can you clarify what you mean by "the resulting stitched IP has all its layers fully folded instead of unfolded"? Just to be sure: in FINN, the amount of parallelism inside each layer will not be visible at the IP block level. So even by going to max PE and SIMD, you will still see one IP block per layer, but internally each block will have a great deal of parallelism. One extra recommendation I can give for fully unfolded FC layers is to use |
Beta Was this translation helpful? Give feedback.
Hi @Sekijoju -- can you clarify what you mean by "the resulting stitched IP has all its layers fully folded instead of unfolded"? Just to be sure: in FINN, the amount of parallelism inside each layer will not be visible at the IP block level. So even by going to max PE and SIMD, you will still see one IP block per layer, but internally each block will have a great deal of parallelism.
One extra recommendation I can give for fully unfolded FC layers is to use
"ram_style" : "distributed"
for all layers and"mem_mode" : "const"
. You may also get somewhat higher latency than expected, because FINN's HLS library of layers are optimized for some degree of folding and not full unfolding. We've s…