Skip to content

Commit

Permalink
Initial commit of package based CI and regression testing. (#14793)
Browse files Browse the repository at this point in the history
This patch lays a lot of new track:

* New CI jobs that are rooted in building deployable packages and having
those be used by subsequent jobs (for testing).
* New variant of the building scripts that will eventually subsume the
old once we rebase package building on top of this workflow (like it is
done for downstreams).
* A new `regression_test` project (currently under experimental) that
has the seeds for being able to handle "big" tests and a wider variety
of combinations than we presently have in other places like `e2e`.
* An initial set of tests targeting both CPU and AMD/Vulkan for llama2
7b f16/i4 to commemorate the current work the team has been doing. The
tests are not yet super inspired, just verifying that it compiles and
does in fact run, but I will expand them in a followup once the CI can
guide me.
* A regression testing job for CPU. Will add one for AMD GPU shortly
once I finish setting up the runner.

The regression_test project should be suitable for the development
workflow too, but it needs a bit more turns and mileage on it. Consider
this a WIP that I'll be holding carefully for some days to get it ready
for general use.
  • Loading branch information
stellaraccident authored Aug 25, 2023
1 parent b24a6e2 commit b76b6df
Show file tree
Hide file tree
Showing 19 changed files with 1,306 additions and 0 deletions.
39 changes: 39 additions & 0 deletions .github/workflows/pkgci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Copyright 2023 The IREE Authors
#
# Licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

name: PkgCI

on:
workflow_dispatch:
pull_request:
push:
branches:
- main

concurrency:
# A PR number if a pull request and otherwise the commit hash. This cancels
# queued and in-progress runs for the same PR (presubmit) or commit
# (postsubmit). The workflow name is prepended to avoid conflicts between
# different workflows.
group: ${{ github.workflow }}-${{ github.event.number || github.sha }}
cancel-in-progress: true

jobs:
build_packages:
name: Build Packages
uses: ./.github/workflows/pkgci_build_packages.yml
with:
package_version: 0.dev1

regression_test_cpu:
name: Regression Test CPU
uses: ./.github/workflows/pkgci_regression_test_cpu.yml
needs: [build_packages]

regression_test_amdgpu:
name: Regression Test AMDGPU
uses: ./.github/workflows/pkgci_regression_test_amdgpu.yml
needs: [build_packages]
145 changes: 145 additions & 0 deletions .github/workflows/pkgci_build_packages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Copyright 2023 The IREE Authors
#
# Licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

name: PkgCI Build Packages
on:
workflow_call:
inputs:
package_version:
type: string
required: true

jobs:
linux_x86_64_release_packages:
name: Linux Release (x86_64)
runs-on:
- self-hosted # must come first
- runner-group=${{ github.event_name == 'pull_request' && 'presubmit' || 'postsubmit' }}
- environment=prod
- cpu
- os-family=Linux
strategy:
fail-fast: false
env:
CACHE_DIR: ${{ github.workspace }}/.iree-container-cache
MANYLINUX_DOCKER_IMAGE: ghcr.io/nod-ai/manylinux_x86_64:main
PACKAGE_SUFFIX: ""
steps:
- name: Prefetch Docker
run: |
docker pull "$MANYLINUX_DOCKER_IMAGE" &
- name: Checking out repository
uses: actions/checkout@8f4b7f84864484a7bf31766abe9204da3cbe65b3 # v3.5.0
with:
submodules: true
- name: Write version info
shell: bash
run: |
cat << EOF > version_info.json
{
"package-suffix": "${PACKAGE_SUFFIX}",
"package-version": "${{ inputs.package_version }}",
"iree-revision": "$(cd ../iree && git rev-parse HEAD)"
}
EOF
realpath version_info.json
cat version_info.json
- name: Enable cache
uses: actions/cache@88522ab9f39a2ea568f7027eddc7d8d8bc9d59c8 # v3.3.1
with:
path: ${{ env.CACHE_DIR }}
key: iree-pkgci-linux-release-x86_64-v1-${{ github.sha }}
restore-keys: |
iree-pkgci-linux-release-x86_64-v1-
- name: Wait for docker pull
run: |
wait
- name: Build
run: |
export cache_dir="${{ env.CACHE_DIR }}"
export output_dir="${{ github.workspace }}/wheelhouse"
export toolchain_suffix=release
export manylinux_docker_image="$MANYLINUX_DOCKER_IMAGE"
export package_suffix="$PACKAGE_SUFFIX"
# If just iterating locally, uncomment this to build a cheap wheel.
# export packages="iree-runtime"
./build_tools/pkgci/build_linux_packages.sh
# Some things put stuff in cache with weird, root read-only
# permissions. Take them back.
sudo chown -R "$(whoami)" "${cache_dir}"
- name: Upload wheel artifacts
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce # v3.1.2
with:
name: linux_x86_64_release_packages
path: |
${{ github.workspace }}/wheelhouse/iree*.whl
if-no-files-found: error

# TODO: Debug low ccache hit rate and re-enable.
# linux_x86_64_release_asserts_packages:
# name: Linux Release Asserts (x86_64)
# runs-on:
# - self-hosted # must come first
# - runner-group=${{ github.event_name == 'pull_request' && 'presubmit' || 'postsubmit' }}
# - environment=prod
# - cpu
# - os-family=Linux
# strategy:
# fail-fast: false
# env:
# CACHE_DIR: ${{ github.workspace }}/.iree-container-cache
# MANYLINUX_DOCKER_IMAGE: ghcr.io/nod-ai/manylinux_x86_64:main
# PACKAGE_SUFFIX: "-asserts"
# steps:
# - name: Prefetch Docker
# run: |
# docker pull "$MANYLINUX_DOCKER_IMAGE" &
# - name: "Checking out repository"
# uses: actions/checkout@8f4b7f84864484a7bf31766abe9204da3cbe65b3 # v3.5.0
# with:
# submodules: true
# - name: Write version info
# shell: bash
# run: |
# cat << EOF > version_info.json
# {
# "package-suffix": "${PACKAGE_SUFFIX}",
# "package-version": "${{ inputs.package_version }}",
# "iree-revision": "$(cd ../iree && git rev-parse HEAD)"
# }
# EOF
# realpath version_info.json
# cat version_info.json
# - name: Enable cache
# uses: actions/cache@88522ab9f39a2ea568f7027eddc7d8d8bc9d59c8 # v3.3.1
# with:
# path: ${{ env.CACHE_DIR }}
# key: iree-pkgci-linux-release-asserts-x86_64-v1-${{ github.sha }}
# restore-keys: |
# iree-pkgci-linux-release-asserts-x86_64-v1-
# - name: Wait for docker pull
# run: |
# wait
# - name: Build
# run: |
# export cache_dir="${{ env.CACHE_DIR }}"
# export output_dir="${{ github.workspace }}/wheelhouse"
# export toolchain_suffix=release_asserts
# export manylinux_docker_image="$MANYLINUX_DOCKER_IMAGE"
# export package_suffix="$PACKAGE_SUFFIX"
# # If just iterating locally, uncomment this to build a cheap wheel.
# # export packages="iree-runtime"
# ./build_tools/pkgci/build_linux_packages.sh
# # Some things put stuff in cache with weird, root read-only
# # permissions. Take them back.
# sudo chown -R "$(whoami)" "${cache_dir}"
# - name: Upload wheel artifacts
# uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce # v3.1.2
# with:
# name: linux_x86_64_release_asserts_packages
# path: |
# ${{ github.workspace }}/wheelhouse/iree*.whl
# if-no-files-found: error
51 changes: 51 additions & 0 deletions .github/workflows/pkgci_regression_test_amdgpu.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Copyright 2023 The IREE Authors
#
# Licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

name: PkgCI Regression Test (AMDGPU)
on:
workflow_call:
inputs:
artifact_run_id:
type: string
default: ""
workflow_dispatch:
inputs:
artifact_run_id:
type: string
default: ""

jobs:
linux_x86_64:
name: Linux (x86_64)
runs-on: nodai-amdgpu-x86_64
env:
PACKAGE_DOWNLOAD_DIR: ${{ github.workspace }}/.packages
IREERS_ARTIFACT_DIR: ${{ github.workspace }}/artifacts
VENV_DIR: ${{ github.workspace }}/venv
steps:
- name: Checking out repository
uses: actions/checkout@8f4b7f84864484a7bf31766abe9204da3cbe65b3 # v3.5.0
with:
submodules: false
- uses: actions/setup-python@61a6322f88396a6271a6ee3565807d608ecaddd1 # v4.7.0
with:
# Must match the subset of versions built in pkgci_build_packages.
python-version: '3.11'
- uses: actions/download-artifact@9bc31d5ccc31df68ecc42ccf4149144866c47d8a # v3.0.2
with:
name: linux_x86_64_release_packages
path: ${{ env.PACKAGE_DOWNLOAD_DIR }}
- name: Setup venv
run: |
./build_tools/pkgci/setup_venv.py $VENV_DIR \
--artifact-path=${PACKAGE_DOWNLOAD_DIR} \
--fetch-gh-workflow=${{ inputs.artifact_run_id }}
- name: Run Tests
run: |
source $VENV_DIR/bin/activate
pytest \
-s -m "plat_rdna3_vulkan and presubmit" \
experimental/regression_suite
56 changes: 56 additions & 0 deletions .github/workflows/pkgci_regression_test_cpu.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Copyright 2023 The IREE Authors
#
# Licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

name: PkgCI Regression Test (CPU)
on:
workflow_call:
inputs:
artifact_run_id:
type: string
default: ""
workflow_dispatch:
inputs:
artifact_run_id:
type: string
default: ""

jobs:
linux_x86_64:
name: Linux (x86_64)
runs-on:
- self-hosted # must come first
- runner-group=${{ github.event_name == 'pull_request' && 'presubmit' || 'postsubmit' }}
- environment=prod
- cpu
- os-family=Linux
env:
PACKAGE_DOWNLOAD_DIR: ${{ github.workspace }}/.packages
IREERS_ARTIFACT_DIR: ${{ github.workspace }}/artifacts
VENV_DIR: ${{ github.workspace }}/venv
steps:
- name: Checking out repository
uses: actions/checkout@8f4b7f84864484a7bf31766abe9204da3cbe65b3 # v3.5.0
with:
submodules: false
- uses: actions/setup-python@61a6322f88396a6271a6ee3565807d608ecaddd1 # v4.7.0
with:
# Must match the subset of versions built in pkgci_build_packages.
python-version: '3.11'
- uses: actions/download-artifact@9bc31d5ccc31df68ecc42ccf4149144866c47d8a # v3.0.2
with:
name: linux_x86_64_release_packages
path: ${{ env.PACKAGE_DOWNLOAD_DIR }}
- name: Setup venv
run: |
./build_tools/pkgci/setup_venv.py $VENV_DIR \
--artifact-path=${PACKAGE_DOWNLOAD_DIR} \
--fetch-gh-workflow=${{ inputs.artifact_run_id }}
- name: Run Tests
run: |
source $VENV_DIR/bin/activate
pytest \
-s -m "plat_host_cpu and presubmit" \
experimental/regression_suite
12 changes: 12 additions & 0 deletions build_tools/pkgci/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# PkgCI Scripts

This directory contains scripts and configuration for the "new" CI, which
is based on building packages and then flowing those to followon jobs.

The traditional CI attempted to do all steps as various kinds of source
builds at head vs a split package/test style of workflow. It can mostly
be found in the `cmake` directory but is also scattered around.

This directory generally corresponds to "pkgci_" prefixed workflows. Over
time, as this CI flow takes over more of the CI pipeline, the traditional
CI will be reduced to outlier jobs and policy checks.
Loading

0 comments on commit b76b6df

Please sign in to comment.