ROCm · CongMa13 · Jan 30, 2024 · Dec 8, 2023 · Dec 8, 2023 · Dec 12, 2023
@@ -0,0 +1,2 @@
+GoogleTest
+rocm
@@ -9,10 +9,11 @@ through general purpose kernel languages, like HIP C++.
 * AMD CDNA class GPU featuring matrix core support:
 gfx908, gfx90a, gfx940, gfx941, gfx942 as 'gfx9'
 
-> Note: Double precision FP64 datatype support requires
-> gfx90a, gfx940, gfx941 or gfx942
+:::{note}
+Double precision FP64 datatype support requires gfx90a, gfx940, gfx941 or gfx942
+:::
 
-## Minimum Software Requirements
+## Minimum software requirements
 
 * ROCm stack minimum version 5.7
 * ROCm-cmake minimum version 0.8.0 for ROCm 5.7
@@ -28,7 +29,7 @@ Optional:
 
 Run the steps below to build documentation locally.
 
-```shell
+```bash
 cd docs
 
 pip3 install -r sphinx/requirements.txt
@@ -38,15 +39,38 @@ python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
 
 ## Currently supported
 
-Operations - Contraction Tensor
-Data Types - FP32 , FP64
+### Operation: Contraction tensor
+
+Supported data-type combinations are:
+
+| typeA       | typeB       | typeC       | typeCompute       | notes                              |
+|-------------|-------------|-------------|-------------------|------------------------------------|
+| bf16        | bf16        | bf16        | f32               |                                    |
+| __half      | __half      | __half      | f32               |                                    |
+| f32         | f32         | f32         | bf16              |                                    |
+| f32         | f32         | f32         | __half            |                                    |
+| f32         | f32         | f32         | f32               |                                    |
+| f64         | f64         | f64         | f32               | f64 is only supported on gfx90a +  |
+| f64         | f64         | f64         | f64               | f64 is only supported on gfx90a +  |
+| cf32        | cf32        | cf32        | cf32              | cf32 is only supported on gfx90a + |
+| cf64        | cf64        | cf64        | cf64              | cf64 is only supported on gfx90a + |
+
+### Operation: Permutation tensor
+
+Supported data-type combinations are:
+
+| typeA     | typeB     | descCompute     | notes |
+|-----------|-----------|-----------------|-------|
+| f16       | f16       | f16             |       |
+| f16       | f16       | f32             |       |
+| f32       | f32       | f32             |       |
 
 ## Contributing to the code
 
 1. Create and track a hipTensor fork.
 2. Clone your fork:
 
-```shell
+```bash
 git clone -b develop https://github.com/<your_fork>/hipTensor.git .
 .githooks/install
 git checkout -b <new_branch>
@@ -69,24 +93,24 @@ git push origin <new_branch>
 
 ### Project options
 
-|Option|Description|Default Value|
-|---|---|---|
-|AMDGPU_TARGETS|Build code for specific GPU target(s)|gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx940;gfx941;gfx942|
-|HIPTENSOR_BUILD_TESTS|Build Tests|ON|
-|HIPTENSOR_BUILD_SAMPLES|Build Samples|ON|
+| Option                  | Description                           | Default Value                                                  |
+|-------------------------|---------------------------------------|----------------------------------------------------------------|
+| AMDGPU_TARGETS          | Build code for specific GPU target(s) | gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx940;gfx941;gfx942 |
+| HIPTENSOR_BUILD_TESTS   | Build Tests                           | ON                                                             |
+| HIPTENSOR_BUILD_SAMPLES | Build Samples                         | ON                                                             |
 
 ### Example configurations
 
 By default, the project is configured as Release mode.
 Here are some of the examples for the configuration:
-|Configuration|Command|
-|---|---|
-|Basic|`CC=hipcc CXX=hipcc cmake -B<build_dir> .`|
-|Targeting gfx908|`CC=hipcc CXX=hipcc cmake -B<build_dir> . -DAMDGPU_TARGETS=gfx908:xnack-` |
-|Debug build|`CC=hipcc CXX=hipcc cmake -B<build_dir> . -DCMAKE_BUILD_TYPE=Debug` |
-|Build without tests (default on)|`CC=hipcc CXX=hipcc cmake -B<build_dir> . -DHIPTENSOR_BUILD_TESTS=OFF` |
+| Configuration                    | Command                                                                   |
+|----------------------------------|---------------------------------------------------------------------------|
+| Basic                            | `CC=hipcc CXX=hipcc cmake -B<build_dir> .`                                |
+| Targeting gfx908                 | `CC=hipcc CXX=hipcc cmake -B<build_dir> . -DAMDGPU_TARGETS=gfx908:xnack-` |
+| Debug build                      | `CC=hipcc CXX=hipcc cmake -B<build_dir> . -DCMAKE_BUILD_TYPE=Debug`       |
+| Build without tests (default on) | `CC=hipcc CXX=hipcc cmake -B<build_dir> . -DHIPTENSOR_BUILD_TESTS=OFF`    |
 
-After configuration, build with `cmake --build <build_dir> -- -j<nproc>`
+After configuration, build with `cmake --build <build_dir> -- -j<nproc>`.
 
 ### Tips to reduce tests compile time
 
@@ -99,44 +123,70 @@ After configuration, build with `cmake --build <build_dir> -- -j<nproc>`
 
 Tests API implementation of logger verbosity and functionality.
 
-* `<build_dir>/bin/logger_test`
+```bash
+  <build_dir>/bin/logger_test
+```
 
-## Running Contraction Tests
+## Running contraction tests
 
-### Bilinear contraction tests
+* Bilinear contraction tests
 
 Tests the API implementation of bilinear contraction algorithm with validation.
 
-* `<build_dir>/bin/bilinear_contraction_f32_test`
-* `<build_dir>/bin/bilinear_contraction_f64_test`
+```bash
+  <build_dir>/bin/bilinear_contraction_test
+  <build_dir>/bin/complex_bilinear_contraction_test
+```
 
-### Scale contraction tests
+* Scale contraction tests
 
 Tests the API implementation of scale contraction algorithm with validation.
 
-* `<build_dir>/bin/scale_contraction_f32_test`
-* `<build_dir>/bin/scale_contraction_f64_test`
+```bash
+  <build_dir>/bin/scale_contraction_test
+  <build_dir>/bin/complex_scale_contraction_test
+```
+
+## Running permutation tests
 
-### Samples
+Test API implementation of the permutation algorithm with validation.
+
+```bash
+  <build_dir>/bin/permutation_test
+```
+
+## Samples
 
 These are stand-alone use-cases of the hipTensor contraction operations.
 
-## F32 Bilinear contraction
+### F32 bilinear contraction
 
 Demonstrates the API implementation of bilinear contraction operation without validation.
 
-* `<build_dir>/bin/simple_contraction_bilinear_f32`
+```bash
+  <build_dir>/bin/simple_bilinear_contraction_<typeA>_<typeB>_<typeC>_<typeD>_compute_<computeType>
+```
 
-## F32 Scale contraction
+### F32 scale contraction
 
 Demonstrates the API implementation of scale contraction operation without validation.
 
-* `<build_dir>/bin/simple_contraction_scale_f32`
+```bash
+  <build_dir>/bin/simple_scale_contraction_<typeA>_<typeB>_<typeD>_compute_<typeCompute>
+```
+
+### Permutation
+
+Demonstrates the API implementation of permutation operation without validation.
+
+```bash
+  <build_dir>/bin/simple_permutation
+```
 
-### Build Samples as external client
+### Build samples as external client
 
-Client application links to hipTensor library,
-and therefore hipTensor library needs to be installed before building client applications.
+The client application links to the hipTensor library; therefore, you must install the
+hipTensor library before building client applications.
 
 ## Build
 

@@ -24,7 +24,7 @@ The hipTensor repository follows a workflow which dictates a /master branch wher
 -  ensure code builds successfully.
 -  do not break existing test cases
 -  new functionality will only be merged with new unit tests
--  new unit tests should integrate within the existing googletest framework.
+-  new unit tests should integrate within the existing GoogleTest framework.
 -  tests must have good code coverage
 -  code must also have benchmark tests, and performance must approach
    the compute bound limit or memory bound limit.

@@ -59,7 +59,7 @@ For Centos use
     yum info rocm-libs
 
 The ROCm version has major, minor, and patch fields, possibly followed by a build specific identifier. For example the ROCm version could be 4.0.0.40000-23, this corresponds to major = 4, minor = 0, patch = 0, build identifier 40000-23.
-There are GitHub branches at the hipTensor site with names rocm-major.minor.x where major and minor are the same as in the ROCm version. For ROCm version 4.0.0.40000-23 you need to use the following to download hipTensor:
+There are GitHub branches at the hipTensor site with names `rocm-major.minor.x` where major and minor are the same as in the ROCm version. For ROCm version 4.0.0.40000-23 you need to use the following to download hipTensor:
 
 ::