FFT in SDSoC (C callable IP)

Let's migrate fft_single example in Vivado HLS to SDSoC...

SDSoC version used: 2016.3
Target Platform: Digilent Zybo (XC7Z010-1CLG400C) with Linux as OS

1. Import fft_single example into SDSoC

First off, create an SDSoC project named fft_single:

Create SDSoC Project

Choose Zybo as platform:

Select Zybo

... and Linux SMP as Software platform:

Select Linux SMP

... we do not use Templates (select Empty Application):

Select Empty Application

Locate fft_single example files in: < SDSoC installation path >/2016.3/Vivado_HLS/examples/design/FFT/fft_single

Example source files

Then copy the source files (*.cpp & *.h) into "src" folder of the project.

Copy source files

You may also want to copy data files into the project:

Copy data folder

In SDx Project Settings, select fft_top() as HW functions, leaving clock frequency as default:

Specify HW function

For quick iteration, uncheck Generate bitsream & Generate SD card image and check Estimate Performance

Project settings

At this point, the project would look like below:

Project overview

2. Optimize

2.1 1st try

See whether the project compiles without any modification: Project -> Build

The compilation log (sds_fft_top.log) should look like this with some WARNINGs and ERRORs:

Part of SDS log

2.2 Modify code

We now have to resolve these WARNINGs & ERRORs...

2.2.1 WARNINGs

According to page 58 of UG1027 (SDSoC Environment User Guide, 2016.3), #pragma HLS interface for a top-level function argument is ignored, so we have to comment those pragmas out in fft_top.cpp:

//#pragma HLS interface ap_hs port=direction
//#pragma HLS interface ap_fifo depth=1 port=ovflo
//#pragma HLS interface ap_fifo depth=FFT_LENGTH port=in,out

We also comment out the following pragmas, which will lose effect after the following code change:

//#pragma HLS data_pack variable=in
//#pragma HLS data_pack variable=out

2.2.2 ERRORs

Since there is also a restriction on the data width of the arguments of top level function, i.e. arguments' data width must be of 8/16/32/64 bits, we also need to change data type of complex<data_in_t> in[FFT_LENGTH] & complex<data_out_t> out[FFT_LENGTH]. Here, for the sake of simplicity, we define the arguments (in & out) as 32-bit floating point (float), and convert data type in the HW function.

Let's redefine fft_top(). Note we have to define an argument for real & imaginary part separately. The .h file :

// Use generic C type for HW function arguments
void fft_top(
bool direction,
// cmpxDataIn in[FFT_LENGTH],
// cmpxDataOut out[FFT_LENGTH],
float in_re[FFT_LENGTH], float in_im[FFT_LENGTH],
float out_re[FFT_LENGTH], float out_im[FFT_LENGTH],
bool* ovflo);

... and .cpp file:

void fft_top( ... )
{
...
    // dummy_proc_fe(direction, &fft_config, in, xn);
    dummy_proc_fe(direction, &fft_config, in_re, in_im, xn);
    ...
    // dummy_proc_be(&fft_status, ovflo, xk, out);
    dummy_proc_be(&fft_status, ovflo, xk, out_re, out_im);
}

We rewrite HW sub functions accordingly:

void dummy_proc_fe(
    bool direction,
    config_t* config, 
    // cmpxDataIn in[FFT_LENGTH],
    float in_re[FFT_LENGTH],
    float in_im[FFT_LENGTH],
    cmpxDataIn out[FFT_LENGTH])
{
int i;
config->setDir(direction);
config->setSch(0x2AB);
    for (i=0; i< FFT_LENGTH; i++){
        // out[i] = in[i];
        out[i].real(in_re[i]);
        out[i].imag(in_im[i]);
    }
}

void dummy_proc_be(
    status_t* status_in, 
    bool* ovflo,
    cmpxDataOut in[FFT_LENGTH],
    float out_re[FFT_LENGTH],
    float out_im[FFT_LENGTH]
    /*cmpxDataOut out[FFT_LENGTH]*/)
{
    int i; 
    for (i=0; i< FFT_LENGTH; i++){
        // out[i] = in[i];
	      out_re[i] = in[i].real();
	      out_im[i] = in[i].imag();
     }
     *ovflo = status_in->getOvflo() & 0x1;
 }

and main():

// static cmpxDataIn xn_input[SAMPLES];
// static cmpxDataOut xk_output[SAMPLES];
float in_re[SAMPLES] = {0};
float in_im[SAMPLES] = {0};
float out_re[SAMPLES] = {0};
float out_im[SAMPLES] = {0};
...
    // xn_input[line_no-5] = cmpxDataIn(input_data_re, input_data_im);
    in_re[line_no - 5] = input_data_re;
    in_im[line_no - 5] = input_data_im;
...
// fft_top(FWD_INV, xn_input, xk_output, &ovflo);
fft_top(FWD_INV, in_re, in_im, out_re, out_im, &ovflo);
...
        //if (golden != xk_output[i].real())
        if (golden.to_float() != out_re[i])
        {
...
            cout << "Frame:" << frame << " index: " << i 
                 << "  Golden: " <<  golden.to_float()
  			 << " vs. RE Output: " << setprecision(14)
  			 << out_re[i] /*xk_output[i].real().to_float()*/ << endl;
        }
...
        //if (golden != xk_output[i].imag())
        if (golden.to_float() != out_im[i])
        {
            error_num++;
            cout << "Frame:" << frame << " index: " << i 
                 << "  Golden: " << golden.to_float()
  			 << " vs. IM Output: " << setprecision(14)
  			 << out_im[i] /*xk_output[i].imag().to_float()*/ << endl;
        }

2.2.3 Try build again

After building the project again, we will encounter new errors (linker errors) as below:

2nd try

This is because

So, we need different definitions of fft_top() for SW part and HW part respectively. To do that, we can use SDSVHLS macro:

#ifdef __SDSVHLS__

void fft_top(...)
{
//#pragma HLS interface ap_hs port=direction
...
}

#else

#include <stdio.h>
void fft_top(
    bool direction,
float in_re[FFT_LENGTH], float in_im[FFT_LENGTH],
float out_re[FFT_LENGTH], float out_im[FFT_LENGTH],
    bool* ovflo)
{
    printf("SDSoC Stub Function %s() ...\n", __FUNCTION__);
}

#endif

2.2.4 Build succeeded

2nd try

2.3 Optimize for performance

To reduce data transaction time, we use sds_alloc() & sds_free() to allocate/release memory for input/output data:

First, we have to include header:

#include <sstream>
#include "sds_lib.h"    // for sds_***()
using namespace std;

... then allocate memories using sds_alloc():

// float in_re[SAMPLES] = {0};
// float in_im[SAMPLES] = {0};
// float out_re[SAMPLES] = {0};
// float out_im[SAMPLES] = {0};
float* in_re = (float*) sds_alloc(SAMPLES*sizeof(float));
float* in_im = (float*) sds_alloc(SAMPLES*sizeof(float));
float* out_re = (float*) sds_alloc(SAMPLES*sizeof(float));
float* out_im = (float*) sds_alloc(SAMPLES*sizeof(float));

... and remember to release those memories using sds_free().

sds_free(in_re);
sds_free(in_im);
sds_free(out_re);
sds_free(out_im);

Optionally (*1), in fft_top.h, we can add SDSoC #pragma to force SDSoC to estimate simple DMA (AXI_DMA_SIMPLE) for faster data transfer:

#pragma SDS data mem_attribute(in_re:PHYSICAL_CONTIGUOUS)
#pragma SDS data mem_attribute(in_im:PHYSICAL_CONTIGUOUS)
#pragma SDS data mem_attribute(out_re:PHYSICAL_CONTIGUOUS)
#pragma SDS data mem_attribute(out_im:PHYSICAL_CONTIGUOUS)
void fft_top( 
...

*1 In this case, SDSoC automatically estimates simple DMA so we actually don't have to add those #pragmas...

We also want to apply inlining to HW sub functions:

void dummy_proc_fe( ... )
{
#pragma HLS INLINE
...
}

void dummy_proc_be( ... )
{
#pragma HLS INLINE
...
}

... and loop pipelining as usual:

void dummy_proc_fe( ... )
{
  ...
    for (i=0; i< FFT_LENGTH; i++){
#pragma HLS PIPELINE
	...
    }
}

void dummy_proc_be( ... )
{
  ...
    for (i=0; i< FFT_LENGTH; i++){
#pragma HLS PIPELINE
	...
    }
}

2.4 Estimate performance

Let's estimate performance again...

Final estimation

We got much better performance. Seems O.K...

3. Build HW & Run it!

Make sure Generate bitsream & Generate SD card image are checked in Options section of SDx Project Setting.

Project setting

Build the project (Project -> Build). Build will finish successfully.

Build result

Copy the contents of sd_card folder & data folder into an SD card.

SD Card image

Insert the SD card to your Zybo & power on to boot Linux.
cd to /mnt (where the program is located) & run the program as follows: cd /mnt && /mnt/fft_single.elf

Invoke program

TEST passed! We are now able to accelerate FFT on FPGA.

Final log

The SD Card files will be available in the repo: fft_single/sd_card

Provide feedback

Saved searches

Use saved searches to filter your results more quickly