-
Notifications
You must be signed in to change notification settings - Fork 5
FFT in SDSoC (C callable IP)
Let's migrate fft_single example in Vivado HLS to SDSoC...
-
SDSoC version used: 2016.3
-
Target Platform: Digilent Zybo (XC7Z010-1CLG400C) with Linux as OS
First off, create an SDSoC project named fft_single:
Choose Zybo as platform:
... and Linux SMP as Software platform:
... we do not use Templates (select Empty Application):
Locate fft_single example files in: < SDSoC installation path >/2016.3/Vivado_HLS/examples/design/FFT/fft_single
Then copy the source files (*.cpp & *.h) into "src" folder of the project.
You may also want to copy data files into the project:
In SDx Project Settings, select fft_top() as HW functions, leaving clock frequency as default:
For quick iteration, uncheck Generate bitsream & Generate SD card image and check Estimate Performance
At this point, the project would look like below:
See whether the project compiles without any modification: Project -> Build
The compilation log (sds_fft_top.log) should look like this with some WARNINGs and ERRORs:
We now have to resolve these WARNINGs & ERRORs...
According to page 58 of UG1027 (SDSoC Environment User Guide, 2016.3), #pragma HLS interface for a top-level function argument is ignored, so we have to comment those pragmas out in fft_top.cpp:
//#pragma HLS interface ap_hs port=direction //#pragma HLS interface ap_fifo depth=1 port=ovflo //#pragma HLS interface ap_fifo depth=FFT_LENGTH port=in,out
We also comment out the following pragmas, which will lose effect after the following code change:
//#pragma HLS data_pack variable=in //#pragma HLS data_pack variable=out
Since there is also a restriction on the data width of the arguments of top level function, i.e. arguments' data width must be of 8/16/32/64 bits, we also need to change data type of complex<data_in_t> in[FFT_LENGTH] & complex<data_out_t> out[FFT_LENGTH]. Here, for the sake of simplicity, we define the arguments (in & out) as 32-bit floating point (float), and convert data type in the HW function.
- Let's redefine fft_top(). Note we have to define an argument for real & imaginary part separately. The .h file :
// Use generic C type for HW function arguments void fft_top( bool direction, // cmpxDataIn in[FFT_LENGTH], // cmpxDataOut out[FFT_LENGTH], float in_re[FFT_LENGTH], float in_im[FFT_LENGTH], float out_re[FFT_LENGTH], float out_im[FFT_LENGTH], bool* ovflo);
... and .cpp file:
void fft_top( ... ) { ... // dummy_proc_fe(direction, &fft_config, in, xn); dummy_proc_fe(direction, &fft_config, in_re, in_im, xn); ... // dummy_proc_be(&fft_status, ovflo, xk, out); dummy_proc_be(&fft_status, ovflo, xk, out_re, out_im); }
- We rewrite HW sub functions accordingly:
void dummy_proc_fe( bool direction, config_t* config, // cmpxDataIn in[FFT_LENGTH], float in_re[FFT_LENGTH], float in_im[FFT_LENGTH], cmpxDataIn out[FFT_LENGTH]) { int i; config->setDir(direction); config->setSch(0x2AB); for (i=0; i< FFT_LENGTH; i++){ // out[i] = in[i]; out[i].real(in_re[i]); out[i].imag(in_im[i]); } }
void dummy_proc_be( status_t* status_in, bool* ovflo, cmpxDataOut in[FFT_LENGTH], float out_re[FFT_LENGTH], float out_im[FFT_LENGTH] /*cmpxDataOut out[FFT_LENGTH]*/) { int i; for (i=0; i< FFT_LENGTH; i++){ // out[i] = in[i]; out_re[i] = in[i].real(); out_im[i] = in[i].imag(); } *ovflo = status_in->getOvflo() & 0x1; }
and main():
// static cmpxDataIn xn_input[SAMPLES]; // static cmpxDataOut xk_output[SAMPLES]; float in_re[SAMPLES] = {0}; float in_im[SAMPLES] = {0}; float out_re[SAMPLES] = {0}; float out_im[SAMPLES] = {0}; ... // xn_input[line_no-5] = cmpxDataIn(input_data_re, input_data_im); in_re[line_no - 5] = input_data_re; in_im[line_no - 5] = input_data_im; ... // fft_top(FWD_INV, xn_input, xk_output, &ovflo); fft_top(FWD_INV, in_re, in_im, out_re, out_im, &ovflo); ... //if (golden != xk_output[i].real()) if (golden.to_float() != out_re[i]) { ... cout << "Frame:" << frame << " index: " << i << " Golden: " << golden.to_float() << " vs. RE Output: " << setprecision(14) << out_re[i] /*xk_output[i].real().to_float()*/ << endl; } ... //if (golden != xk_output[i].imag()) if (golden.to_float() != out_im[i]) { error_num++; cout << "Frame:" << frame << " index: " << i << " Golden: " << golden.to_float() << " vs. IM Output: " << setprecision(14) << out_im[i] /*xk_output[i].imag().to_float()*/ << endl; }
After building the project again, we will encounter new errors (linker errors) as below:
This is because
So, we need different definitions of fft_top() for SW part and HW part respectively. To do that, we can use SDSVHLS macro:
#ifdef __SDSVHLS__ void fft_top(...) { //#pragma HLS interface ap_hs port=direction ... } #else #include <stdio.h> void fft_top( bool direction, float in_re[FFT_LENGTH], float in_im[FFT_LENGTH], float out_re[FFT_LENGTH], float out_im[FFT_LENGTH], bool* ovflo) { printf("SDSoC Stub Function %s() ...\n", __FUNCTION__); } #endif
To reduce data transaction time, we use sds_alloc() & sds_free() to allocate/release memory for input/output data:
First, we have to include header:
#include <sstream> #include "sds_lib.h" // for sds_***() using namespace std;
... then allocate memories using sds_alloc():
// float in_re[SAMPLES] = {0}; // float in_im[SAMPLES] = {0}; // float out_re[SAMPLES] = {0}; // float out_im[SAMPLES] = {0}; float* in_re = (float*) sds_alloc(SAMPLES*sizeof(float)); float* in_im = (float*) sds_alloc(SAMPLES*sizeof(float)); float* out_re = (float*) sds_alloc(SAMPLES*sizeof(float)); float* out_im = (float*) sds_alloc(SAMPLES*sizeof(float));
... and remember to release those memories using sds_free().
sds_free(in_re); sds_free(in_im); sds_free(out_re); sds_free(out_im);
Optionally (*1), in fft_top.h, we can add SDSoC #pragma to force SDSoC to estimate simple DMA (AXI_DMA_SIMPLE) for faster data transfer:
#pragma SDS data mem_attribute(in_re:PHYSICAL_CONTIGUOUS) #pragma SDS data mem_attribute(in_im:PHYSICAL_CONTIGUOUS) #pragma SDS data mem_attribute(out_re:PHYSICAL_CONTIGUOUS) #pragma SDS data mem_attribute(out_im:PHYSICAL_CONTIGUOUS) void fft_top( ...
*1 In this case, SDSoC automatically estimates simple DMA so we actually don't have to add those #pragmas...
We also want to apply inlining to HW sub functions:
void dummy_proc_fe( ... ) { #pragma HLS INLINE ... } void dummy_proc_be( ... ) { #pragma HLS INLINE ... }
... and loop pipelining as usual:
void dummy_proc_fe( ... ) { ... for (i=0; i< FFT_LENGTH; i++){ #pragma HLS PIPELINE ... } } void dummy_proc_be( ... ) { ... for (i=0; i< FFT_LENGTH; i++){ #pragma HLS PIPELINE ... } }
Let's estimate performance again...
We got much better performance. Seems O.K...
- Make sure Generate bitsream & Generate SD card image are checked in Options section of SDx Project Setting.
- Build the project (Project -> Build). Build will finish successfully.
- Copy the contents of sd_card folder & data folder into an SD card.
-
Insert the SD card to your Zybo & power on to boot Linux.
-
cd to /mnt (where the program is located) & run the program as follows: cd /mnt && /mnt/fft_single.elf
- TEST passed! We are now able to accelerate FFT on FPGA.
The SD Card files will be available in the repo: fft_single/sd_card