Skip to content

Commit

Permalink
Merge branch 'dev' into python3.12
Browse files Browse the repository at this point in the history
  • Loading branch information
roytman committed Oct 4, 2024
2 parents 95115ac + afc4150 commit f813f5d
Show file tree
Hide file tree
Showing 229 changed files with 12,532 additions and 2,501 deletions.
34 changes: 30 additions & 4 deletions .github/workflows/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,46 @@ CODE_TRANSFORMS=code2parquet code_quality header_cleanser malware proglang_selec
LANG_TRANSFORMS=doc_chunk doc_quality lang_id pdf2parquet pii_redactor text_encoder


# A list that holds transforms that should not be tested with KFP

transform-tests:
$(MAKE) TRANSFORM_SUBDIR=universal .transform-tests
$(MAKE) TRANSFORM_SUBDIR=language .transform-tests
$(MAKE) TRANSFORM_SUBDIR=code .transform-tests
$(MAKE) TRANSFORM_SUBDIR=universal .transform-tests
$(MAKE) TRANSFORM_SUBDIR=universal .transform-kfp-tests
$(MAKE) TRANSFORM_SUBDIR=language .transform-tests
$(MAKE) TRANSFORM_SUBDIR=language .transform-kfp-tests
$(MAKE) TRANSFORM_SUBDIR=code .transform-tests
$(MAKE) TRANSFORM_SUBDIR=code .transform-kfp-tests

# Expects
# TRANSFORM_SUBDIR transforms subdirectory (such as universal)
.transform-tests:
@for i in $$(find ../../transforms/$(TRANSFORM_SUBDIR) -depth 1 -type d); do \
@for i in $$(find ../../transforms/$(TRANSFORM_SUBDIR) -mindepth 1 -maxdepth 1 -type d); do \
dir=$$(basename $$i); \
yml=test-$(TRANSFORM_SUBDIR)-$$dir.yml; \
echo Generating $$yml; \
cat test-transform.template | sed -e "s?@TARGET_TRANSFORM_DIR@?transforms/$${TRANSFORM_SUBDIR}/$$dir?g" > $$yml; \
done

.transform-kfp-tests:
@KFP_BLACK_LIST=$$(cd ../..; bash scripts/check-workflows.sh -show-kfp-black-list); \
for i in $$(find ../../transforms/$(TRANSFORM_SUBDIR) -mindepth 1 -maxdepth 1 -type d); do \
dir=$$(basename $$i); \
yml=test-$(TRANSFORM_SUBDIR)-$$dir-kfp.yml; \
if [ ! -d ../../transforms/$(TRANSFORM_SUBDIR)/$$dir/kfp_ray ]; then \
echo No kfp_ray directory for $$dir. Skipping generation of $$yml; \
continue; \
fi; \
z=$$(echo $${KFP_BLACK_LIST} | grep $$dir); \
if [ ! -z "$$z" ]; then \
echo $$dir is black listed. Skipping generation of $$yml; \
continue; \
fi; \
echo Generating $$yml; \
cat test-kfp-transform.template | sed -e "s?@TARGET_TRANSFORM_DIR@?transforms/$${TRANSFORM_SUBDIR}/$$dir?g" > $$yml; \
done






37 changes: 17 additions & 20 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,22 @@
# Workflow Management

Here we have the start of a system to automatically generated github workflows (currently only for transforms).
Here we have the start of a system to automatically generated github workflows.
In general, the design is to use templates and `make` to generate/update the workflows.

#### Goals
1. Run only tests for a given transform when only the transform changes.
Includes python, ray, spark and kfp_ray as available.
2. When the core dpk lib components files changes, test all transforms
3. When the shared kfp components changes, test a randomly selected transform test
(We would like to avoid running all transform kfp tests in one PR)
4. Extra credit: If .md or other non-code changes are made, run no tests.
3. When the shared kfp components changes or core dpk lib components files changes,
test a randomly selected transform test. Otherwise run kfp test for the changed transforms.

#### Assumptions
1. All transforms will have test workflows. A transform can disable its tests locally
(temporarily?) by renaming its Makefile. For example,
`cp transforms/universal/noop/Makefile transforms/universal/noop/Makefile.disabled`.
A github action for kfp testing will not be generated if it appears in `KFP_BLACK_LIST`
in the [Makefile](./Makefile).


## DPK libraries (`data-processing-lib` directory)
The DPK libraries, in data-processing-lib/{python,ray,spark}, are tested
Expand All @@ -26,18 +28,18 @@ The transforms test workflows also depend on this directory tree and so
changes made here will trigger transform tests.

## Transforms (`transforms` directory tree)
We define a unique test workflow for each transform, based on a common
template [test-transform.template](test-transform.template).
The [Makefile](Makefile) is used to (re)generate all workflows a necessary.
By design, workflows for a given transform should run when
We define two test workflows for each transform: one is based on a common
template [test-transform.template](test-transform.template) and the other, for kfp testing,
is based on a common template [test-kfp-transform.template](test-kfp-transform.template).
The [Makefile](Makefile) is used to (re)generate all workflows as necessary.
By design, non kfp workflows for a given transform should run when

* anything of substance effecting operation is modified in the transform's directory tree.
* anything in the core libraries in this repo (e.g., data-processing/lib) assuming the transform depends on these.

Note that the kfp tests (in kfp_ray/Makefile workflow-test) for a given transform are
**not** currently being run when the transform's tests are run.
Currently these are run randomly via the [test-kfp.yml](test-kfp.yml).
We expect to fix this is in the future.
The generated kfp workflows should run when anything of substance effecting operation is modified in the transform's directory tree
and non of the core libraries in this repo nor the kfp components were changed.
Otherwise a randomly chosen transform will undergo KFP testing, triggered by the [test-kfp.yml](test-kfp.yml) workflow.

When a new transform is added to the repository,

Expand All @@ -58,16 +60,11 @@ git push --set-upstream origin new-branch

Like DPK core libs, kfp tests are defined in
[test-kfp.yml](test-kfp.yml) and run whenever changes are made in
the `kfp` directory tree. Tests currently include

1. test kfp on randomly selected transform.

Eventually we would like to enable the transform-specific kfp test
when only the transform code is modified or maybe when only
the `kfp_ray` directory contents is modified.
the `kfp` directory tree as well as in the DPK core libs. Tests currently include
test kfp on randomly selected transform.

## Miscellaneous
[test-misc.yml](test-misc.yml) defines some repo consistency tests including

1. Make sure `set-versions` make target can be run recursively throughout the repo
2. Makes sure there is a test workflow for each transform in the repo.
2. Makes sure there is a test workflow for each transform in the repo.
2 changes: 1 addition & 1 deletion .github/workflows/deploy-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ on:
- "releases/**"
jobs:
deploy:
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
env:
REPO_URL: "https://github.com/${{ github.repository }}"
REPO_BRANCH: "dev"
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/deploy-library.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ permissions:
jobs:
build-package:
name: Build Ray data processing libraries
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
Expand All @@ -30,7 +30,7 @@ jobs:
name: Publish packages to test.pypi.org
# disabled
if: false
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
needs: build-package

steps:
Expand All @@ -47,7 +47,7 @@ jobs:

publish-pypi:
name: Publish release to pypi.org
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
needs: build-package
# disabled as of now
if: false
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/deploy-transforms.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ on:
jobs:
build-images:
name: Build and check images
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
Expand All @@ -23,7 +23,7 @@ jobs:
name: Publish packages to quay.io
# disabled
if: false
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
needs: build-images

steps:
Expand Down
130 changes: 130 additions & 0 deletions .github/workflows/test-code-code2parquet-kfp.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
#
# DO NOT EDIT THIS FILE: it is generated from test-transform.template, Edit there and run make to change these files
#
name: Test KFP - transforms/code/code2parquet

on:
workflow_dispatch:
push:
branches:
- "dev"
- "releases/**"
tags:
- "*"
paths:
- ".make.*"
- "transforms/.make.workflow"
- "transforms/code/code2parquet/**"
- "!kfp/**" # This is tested in separate workflow
- "!data-processing-lib/**" # This is tested in separate workflow
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"
pull_request:
branches:
- "dev"
- "releases/**"
paths:
- ".make.*"
- "transforms/.make.workflow"
- "transforms/code/code2parquet/**"
- "!data-processing-lib/**" # This is tested in separate workflow
- "!kfp/**" # This is tested in separate workflow
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"

# taken from https://stackoverflow.com/questions/66335225/how-to-cancel-previous-runs-in-the-pr-when-you-push-new-commitsupdate-the-curre
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
test-kfp-v1:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test KFP libs (shared and v1) and run a workflow
timeout-minutes: 120
run: |
if [ -e "@TARGET_TRANSFORM_DIR/Makefile" -a -d "@TARGET_TRANSFORM_DIR/kfp_ray" ]; then
export REPOROOT=$PWD
export K8S_SETUP_SCRIPTS=$PWD/scripts/k8s-setup
source $K8S_SETUP_SCRIPTS/requirements.env
export PATH=$PATH:/tmp/
curl -Lo /tmp/kind https://kind.sigs.k8s.io/dl/v${KIND_VERSION}/kind-linux-amd64
chmod 777 /tmp/kind
curl -fsSL -o /tmp/get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 /tmp/get_helm.sh
HELM_INSTALL_DIR=/tmp/ /tmp/get_helm.sh -v v${HELM_VERSION} --no-sudo
chmod 777 /tmp/helm
curl -L https://dl.k8s.io/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl -o /tmp/kubectl
chmod 777 /tmp/kubectl
curl https://dl.min.io/client/mc/release/linux-amd64/mc --create-dirs -o /tmp/mc
chmod +x /tmp/mc
export DEPLOY_KUBEFLOW=1
make -C $K8S_SETUP_SCRIPTS setup
make -C kfp/kfp_support_lib test
make -C transforms/code/code2parquet workflow-build
source $K8S_SETUP_SCRIPTS/common.sh
make -C transforms/code/code2parquet workflow-test
echo "Run transforms/code/code2parquet completed"
else
echo "Skipping transforms/code/code2parquet kfp test for lack of Makefile and/or kfp_ray"
fi
test-kfp-v2:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test KFP libs (shared and v2) and run a workflow
timeout-minutes: 120
run: |
if [ -e "@TARGET_TRANSFORM_DIR/Makefile" -a -d "@TARGET_TRANSFORM_DIR/kfp_ray" ]; then
export REPOROOT=$PWD
export K8S_SETUP_SCRIPTS=$PWD/scripts/k8s-setup
source $K8S_SETUP_SCRIPTS/requirements.env
export PATH=$PATH:/tmp/
curl -Lo /tmp/kind https://kind.sigs.k8s.io/dl/v${KIND_VERSION}/kind-linux-amd64
chmod 777 /tmp/kind
curl -fsSL -o /tmp/get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 /tmp/get_helm.sh
HELM_INSTALL_DIR=/tmp/ /tmp/get_helm.sh -v v${HELM_VERSION} --no-sudo
chmod 777 /tmp/helm
curl -L https://dl.k8s.io/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl -o /tmp/kubectl
chmod 777 /tmp/kubectl
curl https://dl.min.io/client/mc/release/linux-amd64/mc --create-dirs -o /tmp/mc
chmod +x /tmp/mc
export DEPLOY_KUBEFLOW=1
export KFPv2=1
make -C $K8S_SETUP_SCRIPTS setup
make -C kfp/kfp_support_lib test
make -C transforms/code/code2parquet workflow-build
source $K8S_SETUP_SCRIPTS/common.sh
make -C transforms/code/code2parquet workflow-test
echo "Run transforms/code/code2parquet completed"
else
echo "Skipping transforms/code/code2parquet kfp test for lack of Makefile and/or kfp_ray"
fi
15 changes: 12 additions & 3 deletions .github/workflows/test-code-code2parquet.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ on:
tags:
- "*"
paths:
- ".make.*"
- "transforms/.make.transforms"
- "transforms/code/code2parquet/**"
- "data-processing-lib/**"
- "!transforms/code/code2parquet/**/kfp_ray/**" # This is/will be tested in separate workflow
Expand All @@ -26,6 +28,8 @@ on:
- "dev"
- "releases/**"
paths:
- ".make.*"
- "transforms/.make.transforms"
- "transforms/code/code2parquet/**"
- "data-processing-lib/**"
- "!transforms/code/code2parquet/**/kfp_ray/**" # This is/will be tested in separate workflow
Expand All @@ -36,13 +40,18 @@ on:
- "!**/images/**"
- "!**.gitignore"

# Taken from https://stackoverflow.com/questions/66335225/how-to-cancel-previous-runs-in-the-pr-when-you-push-new-commitsupdate-the-curre
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
check_if_push_image:
# check whether the Docker images should be pushed to the remote repository
# The images are pushed if it is a merge to dev branch or a new tag is created.
# The latter being part of the release process.
# The images tag is derived from the value of the DOCKER_IMAGE_VERSION variable set in the .make.versions file.
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
outputs:
publish_images: ${{ steps.version.outputs.publish_images }}
steps:
Expand All @@ -59,7 +68,7 @@ jobs:
fi
echo "publish_images=$publish_images" >> "$GITHUB_OUTPUT"
test-src:
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
Expand All @@ -81,7 +90,7 @@ jobs:
fi
test-image:
needs: [check_if_push_image]
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
timeout-minutes: 120
env:
DOCKER_REGISTRY_USER: ${{ secrets.DOCKER_REGISTRY_USER }}
Expand Down
Loading

0 comments on commit f813f5d

Please sign in to comment.