Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMSNorm Blocked Implementation #638

Open
wants to merge 25 commits into
base: main_perf
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
2d2dbe1
Add Perf Kernels
micmelesse May 10, 2024
17575ea
skip backward (#586)
micmelesse May 13, 2024
a3d784a
Change all block pointers to tensor pointers (#585)
vgokhale May 16, 2024
aa6685a
Add support for bshd layout (#587)
vgokhale May 20, 2024
dbe1173
Post-Merge CI (#612)
micmelesse Jul 16, 2024
23ba546
Increase CI timeout (#615)
vgokhale Jul 18, 2024
df4c4d3
Couple of FA optimizations (#608)
vgokhale Jul 19, 2024
52a908f
streamk v0.1 (#619)
xiaohuguo2023 Jul 31, 2024
1d2e066
Add explicit multiply-reduce GEMM kernel (#621)
brunomazzottiamd Aug 6, 2024
11e4447
Copy *tune_gemm* from `triton-mlir` branch to `main_perf` branch (#614)
brunomazzottiamd Aug 13, 2024
624335f
Clean up *tune_gemm* script from `main_perf` branch (#629)
brunomazzottiamd Aug 16, 2024
15cb3a8
[tune gemm v3.4] Add xcd-based pid remapping and change back to rocpr…
zhanglx13 Aug 19, 2024
177d0bd
add barrier to fix racing for spinning locks (#632)
xiaohuguo2023 Aug 19, 2024
e42690d
Softmax kernel
Aug 8, 2024
6d283a2
Merge pull request #634 from ROCm/main_perf-softmax
rahulbatra85 Sep 6, 2024
3704738
Move utility tools from triton-mlir to main_perf branch (#635)
zhanglx13 Sep 6, 2024
f80aed7
Add rmsnorm kernel
Aug 27, 2024
9da4278
Merge branch 'main_perf' into main_perf-rmsnorm
rahulbatra85 Sep 6, 2024
c4bd738
Merge pull request #633 from ROCm/main_perf-rmsnorm
rahulbatra85 Sep 7, 2024
a782caf
Online softmax implementation
Sep 13, 2024
96b3d37
Merge pull request #639 from ROCm/softmax_updates
rahulbatra85 Sep 16, 2024
042aa91
Add Layernorm kernel
Sep 13, 2024
ccb3538
Add use mask
Sep 23, 2024
e13fc4c
Merge pull request #641 from ROCm/main_perf-layernorm
rahulbatra85 Sep 24, 2024
44e9360
RMSNorm Blocked Implementation
Sep 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions .github/workflows/amd_perf_kernel_Integration_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
name: AMD Perf Kernel Integration Tests

on:
workflow_dispatch:
pull_request:
branches: [main_perf]
merge_group:
branches: [main_perf]
types: [checks_requested]

concurrency:
group: ${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main_perf' }}

permissions: read-all

env:
TRITON_BUILD_WITH_CLANG_LLD: "TRUE"
TRITON_USE_ASSERT_ENABLED_LLVM: "TRUE"
TRITON_DISABLE_LINE_INFO: 1

jobs:
Check-File-Changes:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check file changes
run: |
git fetch origin ${{ github.base_ref }}
changed_files=$(git diff --name-only origin/${{ github.base_ref }} ${{ github.sha }})
echo "Changed files:"
echo "$changed_files"
if echo "$changed_files" | grep -vE "^python/perf-kernels/|^\.github/workflows/amd_"; then
echo "Changes detected outside of the python/perf-kernels directory or .github/workflows/amd_ files. Failing the workflow."
exit 1
fi

Runner-Preparation-AMD:
runs-on: ubuntu-latest
timeout-minutes: 30
outputs:
matrix-HIP: ${{ steps.set-matrix.outputs.matrix-HIP }}
steps:
- name: Prepare runner matrix
id: set-matrix
run: |
if [ x"${{ github.repository }}" == x"ROCm/triton" ]; then
echo '::set-output name=matrix-HIP::[["self-hosted", "rocm.gfx90a"]]'
else
echo '::set-output name=matrix-HIP::[["ubuntu-latest"]]'
fi

pre-commit:
name: pre-commit (code formatting)
needs: Runner-Preparation-AMD
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
cache: 'pip'
- name: Compute hash of pre-commit config
id: cache-key
run: |
echo "pre_commit_hash=$(sha256sum .pre-commit-config.yaml)" >> $GITHUB_OUTPUT
shell: bash
- name: Cache pre-commit's cache dir
uses: actions/cache@v4
with:
# Note that we cannot use environment variables here given there is
# no shell to interpret them in the paths.
path: |
~/.cache/pre-commit
key: ${{ runner.os }}-${{ steps.cache-key.outputs.pre_commit_hash }}
- name: Check pre-commit
run: |
python3 -m pip install --upgrade pre-commit
# TODO: ignore the first yapf failure until https://github.com/google/yapf/issues/1164 is fixed
python3 -m pre_commit run --all-files --verbose yapf &> /dev/null || true
# If first run of yapf worked and made changes reset the tree to the original state
git reset --hard
python3 -m pre_commit run --all-files --verbose
- name: Print diff of changes if pre-commit failed
if: failure()
run: |
git diff

Integration-Tests-AMD:
needs: Runner-Preparation-AMD
if: needs.Runner-Preparation-AMD.outputs.matrix-HIP != ''
runs-on: ${{ matrix.runner }}
timeout-minutes: 90
strategy:
matrix:
runner: ${{fromJson(needs.Runner-Preparation-AMD.outputs.matrix-HIP)}}
container:
image: rocm/pytorch:rocm6.1_ubuntu22.04_py3.10_pytorch_2.4
options: --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --user root
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Clear cache
run: |
rm -rf ~/.triton
mkdir -p ~/.triton
ls -alh ~/.triton
- name: Update PATH
run: |
echo "/opt/rocm/llvm/bin" >> $GITHUB_PATH
- name: Install pip dependencies
run: |
python3 -m pip install --upgrade pip
python3 -m pip install lit matplotlib pandas
- name: Install Triton
run: |
echo "PATH is '$PATH'"
pip uninstall -y triton
cd python
pip install -v -e .
- name: Run Perf Kernels Unit Tests
run: |
pytest -vvv ./python/perf-kernels/flash-attention.py
pytest -vvvv ./python/perf-kernels/softmax.py
pytest -vvv ./python/perf-kernels/rmsnorm.py
pytest -vvv ./python/perf-kernels/layernorm.py
- name: Run Perf Kernels Benchmark
run: |
python ./python/perf-kernels/flash-attention.py
python ./python/perf-kernels/softmax.py
python ./python/perf-kernels/rmsnorm.py
python ./python/perf-kernels/layernorm.py
92 changes: 92 additions & 0 deletions .github/workflows/amd_perf_kernel_postmerge_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
name: AMD Perf Kernel Post-Merge Tests

on:
workflow_dispatch:
push:
branches: [main_perf, micmelesse/post_merge_ci]

concurrency:
group: ${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main_perf' }}

permissions: read-all

env:
TRITON_BUILD_WITH_CLANG_LLD: "TRUE"
TRITON_USE_ASSERT_ENABLED_LLVM: "TRUE"
TRITON_DISABLE_LINE_INFO: 1

jobs:
Runner-Preparation-AMD:
runs-on: ubuntu-latest
timeout-minutes: 30
outputs:
matrix-HIP: ${{ steps.set-matrix.outputs.matrix-HIP }}
steps:
- name: Prepare runner matrix
id: set-matrix
run: |
if [ x"${{ github.repository }}" == x"ROCm/triton" ]; then
echo '::set-output name=matrix-HIP::[["self-hosted", "rocm.gfx90a"]]'
else
echo '::set-output name=matrix-HIP::[["ubuntu-latest"]]'
fi

PostMerge-Tests-AMD:
needs: Runner-Preparation-AMD
if: needs.Runner-Preparation-AMD.outputs.matrix-HIP != ''
runs-on: ${{ matrix.runner }}
timeout-minutes: 90
strategy:
matrix:
runner: ${{fromJson(needs.Runner-Preparation-AMD.outputs.matrix-HIP)}}
container:
image: rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2
options: --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --user root
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0 # Ensure the entire history is fetched for rebase
- name: Add upstream remote
run: |
git config --global --add safe.directory /__w/triton/triton
if [ $(git remote | grep -c upstream) -eq 0 ]; then
git remote add upstream https://github.com/triton-lang/triton.git
fi
git fetch upstream
- name: Rebase onto upstream/main
run: |
git config --global user.email "ci@amd.com"
git config --global user.name "Github Actions Post-Merge CI Script"
git rebase upstream/main || { echo "Rebase failed"; exit 1; }
- name: Show Git Log
run: |
echo "Git log after rebase from upstream/main to HEAD:"
git log $(git rev-parse upstream/main~2)..HEAD --oneline --graph --decorate
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Clear cache
run: |
rm -rf ~/.triton
mkdir -p ~/.triton
ls -alh ~/.triton
- name: Update PATH
run: |
echo "/opt/rocm/llvm/bin" >> $GITHUB_PATH
- name: Install pip dependencies
run: |
python3 -m pip install --upgrade pip
python3 -m pip install lit matplotlib pandas
- name: Install Triton
run: |
echo "PATH is '$PATH'"
pip uninstall -y triton
cd python
pip install -v -e .
- name: Run Perf Kernels Unit Tests
run: |
pytest -vvv ./python/perf-kernels/flash-attention.py
- name: Run Perf Kernels Benchmark
run: |
python ./python/perf-kernels/flash-attention.py
Loading
Loading