Updated prog examples readmes (#1422)

Co-authored-by: Joseph Melber <jgmelber@gmail.com>
Xilinx · Apr 25, 2024 · fab1d72 · fab1d72
1 parent 5c16451
commit fab1d72
Show file tree

Hide file tree

Showing 11 changed files with 36 additions and 730 deletions.
diff --git a/programming_examples/basic/README.md b/programming_examples/basic/README.md
@@ -14,10 +14,14 @@ These programming examples provide a good starting point to illustrate how to bu
 
 * [Passthrough DMAs](./passthrough_dmas) - This design demonstrates data movement to implement a memcpy operation using object FIFOs just using DMAs without involving the AIE core. 
 * [Passthrough Kernel](./passthrough_kernel) - This design demonstrates a simple AIE implementation for vectorized memcpy on a vector of integer involving AIE core kernel programming.
+* [DMA Transpose](./dma_transpose) - Transposes a matrix with the Shim DMA using `npu_dma_memcpy_nd` 
 * [Vector Scalar Add](./vector_scalar_add) - Single tile performs a very simple `+` operation where the kernel loads data from local memory, increments the value by `1` and stores it back.
 * [Vector Scalar Mul](./vector_scalar_mul) - Single tile performs `vector * scalar` of size `4096`. The kernel does a `1024` vector multiply and is invoked multiple times to complete the full `vector * scalar` compute.
+* [Vector Vector Add](./vector_vector_add) - Single tile performs `vector + vector` of size `1024`.
+* [Vector Vector Multiply](./vector_vector_mul) - Single tile performs `vector * vector` of size `1024`.
 * [Vector Reduce Add](./vector_reduce_add) - Single tile performs a reduction of a vector to return the `sum` of the elements.
 * [Vector Reduce Max](./vector_reduce_max) - Single tile performs a reduction of a vector to return the `max` of the elements.
 * [Vector Reduce Min](./vector_reduce_min) - Single tile performs a reduction of a vector to return the `min` of the elements.
 * [Vector Exp](./vector_exp) - A simple element-wise exponent function, using the look up table capabilities of the AI Engine.
+* [Matrix Scalar Add](./matrix_scalar_add) - Single tile performs `matrix * vector` with matrix size of `16x8`.
 * [Matrix Multiplication](./matrix_multiplication) - This directory contains multiple designs spanning: single core and multi-core (whole array) matrix-matrix multiplication, and matrix-vector multiplication designs. It also contains sweep infrastructure for benchmarking.
diff --git a/programming_examples/ml/README.md b/programming_examples/ml/README.md
@@ -0,0 +1,23 @@
+<!---//===- README.md --------------------------*- Markdown -*-===//
+//
+// This file is licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+// Copyright (C) 2024, Advanced Micro Devices, Inc.
+// 
+//===----------------------------------------------------------------------===//-->
+
+# <ins>Machine Learning Examples</ins>
+
+| Design name | Data type | Description | 
+|-|-|-|
+| [Eltwise Add](../../programming_examples/ml/eltwise_add/) | bfloat16 | An element by element addition of two vectors | 
+| [Eltwise Mul](../../programming_examples/ml/eltwise_mul/) | i32 | An element by element multiplication of two vectors | 
+| [ReLU](../../programming_examples/ml/relu/) | bfloat16 | Rectified linear unit (ReLU) activation function on a vector| 
+| [Softmax](../../programming_examples/ml/softmax/) | bfloat16 | Softmax operation on a matrix  | 
+| [Conv2D](../../programming_examples/ml/conv2d) | i8 | A single core 2D convolution for CNNs |
+| [Conv2D+ReLU](../../programming_examples/ml/conv2d_fused_relu) | i8 | A Conv2D with a ReLU fused at the vector register level |
+|[Bottleneck](../../programming_examples/ml/bottleneck/)|ui8|A Bottleneck Residual Block is a variant of the residual block that utilizes three convolutions, using 1x1, 3x3, and 1x1 filter sizes, respectively. The implementation features fusing of multiple kernels and dataflow optimizations, highlighting the unique architectural capabilities of AI Engines|
+|[ResNet](../../programming_examples/ml/resnet/)|ui8|ResNet with offloaded conv2_x layers. The implementation features depth-first implementation of multiple bottleneck blocks across multiple NPU columns.|
+
diff --git a/programming_examples/ml/weight_expand/CMakeLists.txt b/programming_examples/ml/weight_expand/CMakeLists.txt
diff --git a/programming_examples/ml/weight_expand/Makefile b/programming_examples/ml/weight_expand/Makefile
diff --git a/programming_examples/ml/weight_expand/README.md b/programming_examples/ml/weight_expand/README.md
diff --git a/programming_examples/ml/weight_expand/aie2.py b/programming_examples/ml/weight_expand/aie2.py