-
Notifications
You must be signed in to change notification settings - Fork 24
HLS Modules
HLS Modules implement BLAS routines (DOT
, GEMV
, GEMM
, etc.)
HLS Modules are delivered as OpenCL Kernel and their templates are contained in the templates/
subfolder.
Modules receive input data and produce output data through channels
:
- scalar data is accepted as a parameter;
- vectors are streamed in tiles of a given size;
- matrices are tiled in 2D, where both the tile elements and the order of tiles can be scheduled by rows or by columns. This results in 4 possible modes of streaming across a matrix interface.
Since modules may receive data in different ways, there will be different module versions (usually marked with an increasing number in their template name). Their interface (detailed in the source file) describes how data must be received and how it is produced. The programmer can re-use HLS modules into his own HLS program as well as by composing them. In this case, she has to take into account that channel names must matches (as required by Intel SDK).
The subfolder templates/helpers
contains different examples of interface modules that can be used to inject data to/from Device RAM.
For helping HLS programmers in obtaining a module version, we devised a modules code generator.
You can find examples on how to use it under tests/hls_modules
or in the evaluation/modules_composition/streaming
subfolders.
The modules code generator produces OpenCL code containing one (or more) modules on the basis of the user indications. Please note: the modules code generator currently supports only a subset of the modules.
To do this, the programmer has to write a modules specification file, which details all the characteristics of the desired modules and, if present) of helpers. The routines specification file is a JSON file having the following structure:
{
"routine": [
{
"blas_name" : "gemv",
"type" : "float",
"user_name" : "sgemv",
"width" : 16,
"in_x" : "channel_x",
"in_y" : "channel_y",
"out_res":"channel_vect_out",
"in_A" : "channel_matrix",
"trans" : "N",
"A tiles order" : "row",
"A elements order" : "row",
"tile N size" :128,
"tile M size" :128
}
],
"helper":[
{
"helper_name" : "read vector x",
"user_name" : "sgemv_read_x",
"channel_name" : "channel_x",
"type" : "float"
},
{
"helper_name" : "read vector y",
"user_name" : "sgemv_read_y",
"channel_name" : "channel_y",
"type" : "float"
},
{
"helper_name" : "write vector",
"user_name" : "sgemv_write_vector",
"channel_name" : "channel_vect_out",
"type" : "float"
},
{
"helper_name" : "read matrix",
"user_name" : "sgemv_read_matrix",
"channel_name" : "channel_matrix",
"type" : "float",
"tiles order" :"row",
"tile N size" :128,
"tile M size" :128
}
]
}
The file has an item routine
, which is an array of modules routine specifications (one or more), an optional parameter helper
that specifies a set of helpers, and an optional item platform
that specifies the target FPGA.
Similarly to the host code generator, each routine is characterized by different properties:
-
mandatory properties: these are the same for all the routines and are:
-
blas_name
a string indicating the routine, according to the BLAS library nomenclature (in the exampledot
andscal
, two Level 1 routines); -
user_name
a string indicating the name that the user will use for calling this routine. It must be unique and it can be used to have multiple routines of the same type that are simultaneously in execution; -
type
a string (float or double) that indicate the numerical precision that will be used for the routine; - for each input/output channel, the user must indicate the corresponding channel names and direction (in the example
in_x
,in_y
,in_A
, andout_res
).
-
-
optional properties: these characterize the routine behavior. If unspecified, a default value will be considered. These properties may change according to the particular routine. In general, they can be:
- functional properties, specify the logic of the routine. These are usually BLAS parameters or specification about the incoming data. In the example, the matrix A is received in tiles by row and elements by row
-
non-functional properties, affect the performance of the routine. For example, the
width
properties specify the spatial parallelism that will be used.
For the helpers, the user must indicate the helper_name
(currently supported read vector x
, read vector y
, write scalar
, write vector
, read matrix
, write matrix
), the channel name on which they send/receive data (must match the channel used by the routine module), and, in the case of helpers that operate on matrices, the ordering of tiles and elements. In the example, there are specified all the helpers needed by the gemv
module.
Optionally, the user can specify also the width of the helper thread. This must match the width of the receiving module, otherwise unrolling can be prevented.
To generate the OpenCL code, the user can invoke the code generator by passing as argument the JSON file containing the routine specifications:
[user@host fblas]$ python codegen/modules_codegen.py routines.json [output directory]
The code generator will produce a set of OpenCL files (one per each routine/helpers) that can be used inside other HLS programs or compiled.