Skip to content

Commit

Permalink
Add section 2c. Update tiles in sections 2a and 2b.
Browse files Browse the repository at this point in the history
  • Loading branch information
abisca authored and fifield committed Apr 12, 2024
1 parent 70d5cc6 commit 9f04e20
Show file tree
Hide file tree
Showing 3 changed files with 113 additions and 17 deletions.
10 changes: 5 additions & 5 deletions programming_guide/section-2/section-2a/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ First of all, an Object FIFO has a unique `name`. It functions as an ordered buf

An Object FIFO is created between a producer, or source tile, and a consumer, or destination tile. The tiles are where producer and consumer processes accessing the Object FIFO will be executed. Below, you can see an example of an Object FIFO created between producer tile A and consumer tile B:
```
A = tile(1, 2)
B = tile(1, 3)
A = tile(1, 3)
B = tile(2, 4)
of0 = object_fifo("objfifo0", A, B, 3, T.memref(256, T.i32()))
```
The created Object FIFO is stored in the `0f0` variable and is named `objfifo0`. It has a depth of `3` objects of datatype `<256xi32>`.
Expand Down Expand Up @@ -66,8 +66,8 @@ The `port` input of both the acquire and the release functions represents whethe

Below you can see an example of two processes that are <u>iterating over the objects of the Object FIFO</u> `of0` that we initialized in the previous section, one running on the producer tile and the other on the consumer tile. To do this, the producer process runs a loop of three iterations, equal to the depth of `of0`, and during each iteration it acquires one object from `of0`, calls a `test_func` function on the acquired object, and releases the object. The consumer process only runs once and acquires all three objects from `of0` at once and stores them in the `elems` array, from which it can <u>access each object individually in any order</u>. It then calls a `test_func2` function three times and in each call it gives as input one of the objects it acquired, before releasing all three objects at the end.
```
A = tile(1, 2)
B = tile(1, 3)
A = tile(1, 3)
B = tile(2, 4)
of0 = object_fifo("objfifo0", A, B, 3, T.memref(256, T.i32()))
@core(A)
Expand All @@ -91,7 +91,7 @@ def core_body():

An Object FIFO can be created with the same tile as both its producer and consumer tile. This is mostly done in order to ensure proper synchronization within the process itself, as opposed to synchronization across multiple processes running on different tiles as we've seen in examples up until this point. All of the functionalities described up until this point apply in the same way. Below is an example of how such an Object FIFO can be initialized and accessed:
```
A = tile(1, 2)
A = tile(1, 3)
of0 = object_fifo("objfifo0", A, A, 3, T.memref(256, T.i32()))
@core(A)
Expand Down
22 changes: 11 additions & 11 deletions programming_guide/section-2/section-2b/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ It is important to note that each new acquire function will return a new object

In the example below `of0` is created between producer A and consumer B with a depth of 3 objects: object0, object1, and object2. Consumer B first acquires 2 elements from `of0` in the variable `elems`. As this is the first time that B acquires, it will have access to object0 and object1. B releases the oldest acquired object, object0, and keeps object1. The next time B acquires 2 elements in the variable `elems_2` it will have access to object1 from before and to the newly acquired object2. B again only releases a single object and keeps object2. Finally, the third time B acquires in `elems_3` it will have access to object2 and object0.
```
A = tile(1, 2)
B = tile(1, 3)
A = tile(1, 3)
B = tile(2, 4)
of0 = object_fifo("objfifo0", A, B, 3, T.memref(256, T.i32())) # 3 objects: object0, object1, object2
@core(B)
Expand All @@ -45,7 +45,7 @@ For more low-level details regarding how the objects in the Object FIFO are tran

Below is an example of an Object FIFO of depth 3 with one producer tile A and three consumer tiles B, C and D:
```
A = tile(1, 2)
A = tile(1, 1)
B = tile(1, 3)
C = tile(2, 3)
D = tile(3, 3)
Expand All @@ -56,9 +56,9 @@ The `depth` input of an Object FIFO can also be specified as an array of integer

The main advantage of being able to specify the individual depths comes during a situation like the one showcased in the example below, which we refer to as a broadcast with a <u>skip-connection</u>. In the example below two Object FIFOs are created: `of0` is a broadcast from producer tile A to consumer tiles B and C, while `of1` is a 1-to-1 data movement from producer tile B to consumer tile C. We refer to `of1` as a skip-connection because it is a dependency between the two consumer tiles of the same broadcast connection. Furthermore, we can see in the code that is executing on its core that C requires one object from both `of0` and `of1` before it can proceed with its execution. However, B also requires an object from `of0` before it can produce the data for `of1`.
```
A = tile(1, 2)
B = tile(1, 3)
C = tile(2, 3)
A = tile(1, 3)
B = tile(2, 3)
C = tile(2, 4)
of0 = object_fifo("objfifo0", A, [B, C], 1, T.memref(256, T.i32()))
of1 = object_fifo("objfifo1", B, C, 1, T.memref(256, T.i32()))
Expand Down Expand Up @@ -103,7 +103,7 @@ Below is an example of a link created between two FIFOs `of0` and `of1`, where t
```
A = tile(1, 0)
B = tile(1, 1)
C = tile(1, 2)
C = tile(1, 3)
of0 = object_fifo("objfifo0", A, B, 2, T.memref(256, T.i32()))
of1 = object_fifo("objfifo1", B, C, 2, T.memref(256, T.i32()))
object_fifo_link(of0, of1)
Expand All @@ -125,8 +125,8 @@ The example below shows three Object FIFOs: `of0` has a producer tile A and a co
```
A = tile(1, 0)
B = tile(1, 1)
C = tile(1, 2)
D = tile(2, 2)
C = tile(1, 3)
D = tile(2, 3)
of0 = object_fifo("objfifo0", A, B, 2, T.memref(256, T.i32()))
of1 = object_fifo("objfifo1", B, C, 2, T.memref(128, T.i32()))
of2 = object_fifo("objfifo2", B, D, 2, T.memref(128, T.i32()))
Expand All @@ -143,8 +143,8 @@ The example below shows three Object FIFOs: `of2` has a producer tile B and a co
```
A = tile(1, 0)
B = tile(1, 1)
C = tile(1, 2)
D = tile(2, 2)
C = tile(1, 3)
D = tile(2, 3)
of0 = object_fifo("objfifo0", C, B, 2, T.memref(128, T.i32()))
of1 = object_fifo("objfifo1", D, B, 2, T.memref(128, T.i32()))
of2 = object_fifo("objfifo2", B, A, 2, T.memref(256, T.i32()))
Expand Down
98 changes: 97 additions & 1 deletion programming_guide/section-2/section-2c/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,103 @@

# <ins>Section 2c - Data Layout Transformations</ins>

* dimensionsToStream, dimensionsFromStreamPerConsumer
While the Object FIFO primitive aims to reduce the complexity tied to data movement configuration on the AI Engine array, it also gives the user control over some of the advanced features of the underlying architecture. One such feature is the ability to do data layout transformations on the fly using the tile's dedicated hardware: the Data Movement Accelerators (DMAs). <u>This is available on AIE-ML devices.</u>

Tile DMAs interact directly with the memory modules of their tiles and are responsible for pushing and retrieving data to and from the AXI stream interconnect. When data is pushed onto the stream, the user can program the DMA's n-dimensional address generation scheme such that the data's layout is pushed differently than how it is stored in the tile's local memory. In the same way, a user can also specify how a DMA should store the data retrieved from the AXI stream.

DMA blocks contain buffer descriptor operations that summarize what data is being moved, from what offset, how much of it, and in what layout. These buffer descriptors are the `AIE_DMABDOp` operations in MLIR and have their own auto-generated python binding (available under `/mlir-aie/install/python/aie/dialects/_aie_ops_gen.py` when the repository is built):
```
def dma_bd
(
buffer,
*,
offset=None,
len=None,
dimensions=None,
bd_id=None,
next_bd_id=None,
loc=None,
ip=None
)
```
It is not necessary to understand these low-level operations in order to use the data layout transformations with the Object FIFO primitive.

A data layout transformation is presented as a list of pairs, where each pair represents a `size` and a `stride` for a particular dimension of the data:
```
[<size_2, stride_2>, <size_1, stride_1>, <size_0, stride_0>]
```
Transformations can be expressed in up to three dimensions on each compute and Shim tile, and in up to four dimensions on Mem tiles. The first element of this array gives the outer-most dimension's stride and size, while the last element of the array gives the inner-most dimension's stride and size. All strides are expressed in <u>multiples of the element width</u>.

Data layout transformations can be viewed as a way to specify to the hardware which location in the data to access next and it is possible to model the access pattern using a series of nested loops. For example, the transformation above can be expressed as:
```
int *buffer; # i32
for(int i = 0; i < size_2; i++)
for(int j = 0; j < size_1; j++)
for(int k = 0; k < size_0; k++)
# access/store element at/to buffer[ i * stride_2
# + j * stride_1
# + k * stride_0]
```

As another example, here is an access pattern that corresponds to alternating between even and odd elements of the buffer/stream every 8 elements:
```
aie.dma_bd(%buf : memref<128xi32>, 0, 128, [<8, 16>, <2, 1>, <8, 2>])
```
which translates to:
```
for(int i = 0; i < 8; i++) # size_2
for(int j = 0; j < 2; j++) # size_1
for(int k = 0; k < 8; k++) # size_0
# access/store element at/to index:
(
i * 16 # stride_2
+ j * 1 # stride_1
+ k * 2 # stride_0
)
```

*Important Note: the inner-most dimension's stride must be 1 by design.*

### Data Layout Transformations with the Object FIFO

Reminder that the Object FIFO class constructor has two default valued inputs: `dimensionsToStream` and `dimensionsFromStreamPerConsumer`.
```
class object_fifo:
def __init__(
self,
name,
producerTile,
consumerTiles,
depth,
datatype,
dimensionsToStream=None,
dimensionsFromStreamPerConsumer=None,
)
```

The Object FIFO directly lowers to `AIE_DMABDOp` operations described above that can leverage data layout transformations expressed as pairs of strides and sizes. It uses the `dimensionsToStream` input in relation to the `producerTile` to describe in what layout that tile's DMA should push the objects onto the stream. Similarly, the `dimensionsFromStreamPerConsumer` input describes to the DMA's of each individual tile in the `consumerTiles` in what layout to retrieve the objects from the stream.

```
A = tile(1, 1)
B = tile(1, 3)
of0 = object_fifo
(
"objfifo0",
A,
B,
3,
T.memref(256, T.i32()),
[
(m, k),
(mtk // k, m * k),
(k, 1),
],
[
[
(m, k),
(mtk // k, m * k),
(k, 1),
]
],
)
```

0 comments on commit 9f04e20

Please sign in to comment.