Skip to content

HLS Modules

Tiziano De Matteis edited this page Nov 18, 2020 · 7 revisions

HLS Modules implement BLAS routines (DOT, GEMV, GEMM, etc.) HLS Modules are delivered as OpenCL Kernel and their templates are contained in the templates/ subfolder.

Modules receive input data and produce output data through channels:

  • scalar data is accepted as a parameter;
  • vectors are streamed in tiles of a given size;
  • matrices are tiled in 2D, where both the tile elements and the order of tiles can be scheduled by rows or by columns. This results in 4 possible modes of streaming across a matrix interface.

Since modules may receive data in different ways, there will be different module versions (usually marked with an increasing number in their template name). Their interface (detailed in the source file) describes how data must be received and how it is produced. The programmer can re-use HLS modules into his own HLS program as well as by composing them. In this case, she has to take into account that channel names must matches (as required by Intel SDK).

The subfolder templates/helpers contains different examples of interface modules that can be used to inject data to/from Device RAM.

For helping HLS programmers in obtaining a module version, we devised a modules code generator.

You can find examples on how to use it under tests/hls_modules or in the evaluation/modules_composition/streaming subfolders.

Modules code generator

The modules code generator produces OpenCL code containing one (or more) modules on the basis of the user indications. Please note: the modules code generator currently supports only a subset of the modules.

To do this, the programmer has to write a modules specification file, which details all the characteristics of the desired modules and, if present) of helpers. The routines specification file is a JSON file having the following structure:

{
    "routine": [
        {
            "blas_name" : "gemv",
            "type" : "float",
            "user_name" : "sgemv",
            "width" : 16,
            "in_x" : "channel_x",
            "in_y" : "channel_y",
            "out_res":"channel_vect_out",
            "in_A" : "channel_matrix",
            "trans" : "N",
            "A tiles order" : "row",
            "A elements order" : "row",
            "tile N size" :128,
            "tile M size" :128
        }
    ],
    "helper":[
        {
            "helper_name" : "read vector x",
            "user_name" : "sgemv_read_x",
            "channel_name" : "channel_x",
            "type" : "float"

        },
        {
            "helper_name" : "read vector y",
            "user_name" : "sgemv_read_y",
            "channel_name" : "channel_y",
            "type" : "float"
        },
        {
            "helper_name" : "write vector",
            "user_name" : "sgemv_write_vector",
            "channel_name" : "channel_vect_out",
            "type" : "float"
        },
        {
            "helper_name" : "read matrix",
            "user_name" : "sgemv_read_matrix",
            "channel_name" : "channel_matrix",
            "type" : "float",
            "tiles order" :"row",
            "tile N size" :128,
            "tile M size" :128
        }

    ]
}

The file has an item routine, which is an array of modules routine specifications (one or more), an optional parameter helper that specifies a set of helpers, and an optional item platform that specifies the target FPGA.

Similarly to the host code generator, each routine is characterized by different properties:

  • mandatory properties: these are the same for all the routines and are:

    • blas_name a string indicating the routine, according to the BLAS library nomenclature (in the example dot and scal, two Level 1 routines);
    • user_name a string indicating the name that the user will use for calling this routine. It must be unique and it can be used to have multiple routines of the same type that are simultaneously in execution;
    • type a string (float or double) that indicate the numerical precision that will be used for the routine;
    • for each input/output channel, the user must indicate the corresponding channel names and direction (in the example in_x, in_y, in_A, and out_res).
  • optional properties: these characterize the routine behavior. If unspecified, a default value will be considered. These properties may change according to the particular routine. In general, they can be:

    • functional properties, specify the logic of the routine. These are usually BLAS parameters or specification about the incoming data. In the example, the matrix A is received in tiles by row and elements by row
    • non-functional properties, affect the performance of the routine. For example, the width properties specify the spatial parallelism that will be used.

For the helpers, the user must indicate the helper_name (currently supported read vector x, read vector y, write scalar, write vector, read matrix, write matrix), the channel name on which they send/receive data (must match the channel used by the routine module), and, in the case of helpers that operate on matrices, the ordering of tiles and elements. In the example, there are specified all the helpers needed by the gemv module. Optionally, the user can specify also the width of the helper thread. This must match the width of the receiving module, otherwise unrolling can be prevented.

To generate the OpenCL code, the user can invoke the code generator by passing as argument the JSON file containing the routine specifications:

[user@host fblas]$ python codegen/modules_codegen.py routines.json [output directory]

The code generator will produce a set of OpenCL files (one per each routine/helpers) that can be used inside other HLS programs or compiled.

Clone this wiki locally