Skip to content

Commit

Permalink
Updated prog examples readmes (#1422)
Browse files Browse the repository at this point in the history
Co-authored-by: Joseph Melber <jgmelber@gmail.com>
  • Loading branch information
jackl-xilinx and jgmelber authored Apr 25, 2024
1 parent 5c16451 commit fab1d72
Show file tree
Hide file tree
Showing 11 changed files with 36 additions and 730 deletions.
4 changes: 4 additions & 0 deletions programming_examples/basic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,14 @@ These programming examples provide a good starting point to illustrate how to bu

* [Passthrough DMAs](./passthrough_dmas) - This design demonstrates data movement to implement a memcpy operation using object FIFOs just using DMAs without involving the AIE core.
* [Passthrough Kernel](./passthrough_kernel) - This design demonstrates a simple AIE implementation for vectorized memcpy on a vector of integer involving AIE core kernel programming.
* [DMA Transpose](./dma_transpose) - Transposes a matrix with the Shim DMA using `npu_dma_memcpy_nd`
* [Vector Scalar Add](./vector_scalar_add) - Single tile performs a very simple `+` operation where the kernel loads data from local memory, increments the value by `1` and stores it back.
* [Vector Scalar Mul](./vector_scalar_mul) - Single tile performs `vector * scalar` of size `4096`. The kernel does a `1024` vector multiply and is invoked multiple times to complete the full `vector * scalar` compute.
* [Vector Vector Add](./vector_vector_add) - Single tile performs `vector + vector` of size `1024`.
* [Vector Vector Multiply](./vector_vector_mul) - Single tile performs `vector * vector` of size `1024`.
* [Vector Reduce Add](./vector_reduce_add) - Single tile performs a reduction of a vector to return the `sum` of the elements.
* [Vector Reduce Max](./vector_reduce_max) - Single tile performs a reduction of a vector to return the `max` of the elements.
* [Vector Reduce Min](./vector_reduce_min) - Single tile performs a reduction of a vector to return the `min` of the elements.
* [Vector Exp](./vector_exp) - A simple element-wise exponent function, using the look up table capabilities of the AI Engine.
* [Matrix Scalar Add](./matrix_scalar_add) - Single tile performs `matrix * vector` with matrix size of `16x8`.
* [Matrix Multiplication](./matrix_multiplication) - This directory contains multiple designs spanning: single core and multi-core (whole array) matrix-matrix multiplication, and matrix-vector multiplication designs. It also contains sweep infrastructure for benchmarking.
23 changes: 23 additions & 0 deletions programming_examples/ml/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<!---//===- README.md --------------------------*- Markdown -*-===//
//
// This file is licensed under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// Copyright (C) 2024, Advanced Micro Devices, Inc.
//
//===----------------------------------------------------------------------===//-->

# <ins>Machine Learning Examples</ins>

| Design name | Data type | Description |
|-|-|-|
| [Eltwise Add](../../programming_examples/ml/eltwise_add/) | bfloat16 | An element by element addition of two vectors |
| [Eltwise Mul](../../programming_examples/ml/eltwise_mul/) | i32 | An element by element multiplication of two vectors |
| [ReLU](../../programming_examples/ml/relu/) | bfloat16 | Rectified linear unit (ReLU) activation function on a vector|
| [Softmax](../../programming_examples/ml/softmax/) | bfloat16 | Softmax operation on a matrix |
| [Conv2D](../../programming_examples/ml/conv2d) | i8 | A single core 2D convolution for CNNs |
| [Conv2D+ReLU](../../programming_examples/ml/conv2d_fused_relu) | i8 | A Conv2D with a ReLU fused at the vector register level |
|[Bottleneck](../../programming_examples/ml/bottleneck/)|ui8|A Bottleneck Residual Block is a variant of the residual block that utilizes three convolutions, using 1x1, 3x3, and 1x1 filter sizes, respectively. The implementation features fusing of multiple kernels and dataflow optimizations, highlighting the unique architectural capabilities of AI Engines|
|[ResNet](../../programming_examples/ml/resnet/)|ui8|ResNet with offloaded conv2_x layers. The implementation features depth-first implementation of multiple bottleneck blocks across multiple NPU columns.|

75 changes: 0 additions & 75 deletions programming_examples/ml/weight_expand/CMakeLists.txt

This file was deleted.

50 changes: 0 additions & 50 deletions programming_examples/ml/weight_expand/Makefile

This file was deleted.

74 changes: 0 additions & 74 deletions programming_examples/ml/weight_expand/README.md

This file was deleted.

105 changes: 0 additions & 105 deletions programming_examples/ml/weight_expand/aie2.py

This file was deleted.

Loading

0 comments on commit fab1d72

Please sign in to comment.