Skip to content

Commit

Permalink
merge with dev
Browse files Browse the repository at this point in the history
Signed-off-by: Maroun Touma <touma@us.ibm.com>
  • Loading branch information
touma-I committed Sep 8, 2024
2 parents 0af03cc + 9fd27e0 commit 65f4ac4
Show file tree
Hide file tree
Showing 142 changed files with 14,663 additions and 1,002 deletions.
17 changes: 10 additions & 7 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ on:
branches:
- "dev"
- "releases/**"
env:
KFP_BLACK_LIST: "doc_chunk-ray,pdf2parquet-ray,pii_redactor"

jobs:
check_if_push_images:
# check whether the Docker images should be pushed to the remote repository
Expand Down Expand Up @@ -164,15 +167,15 @@ jobs:
while :
do
dir=("code" "universal" "language") && index=$(($RANDOM % ${#dir[@]})) && subdirs=${dir[$index]} && transforms=($(find transforms/$subdirs -type d -maxdepth 1 -mindepth 1 ))
# First element is not really a subdir but rather the current dir so remove it and randomly choose a transform to run
set -- "${transforms[@]}" && shift && transforms=("$@") && size=${#transforms[@]} && index=$(($RANDOM % $size))
if [ -d ${transforms[$index]}/kfp_ray ]; then
set -- "${transforms[@]}" && transforms=("$@") && size=${#transforms[@]} && index=$(($RANDOM % $size))
transform=$(basename "${transforms[$index]}")
if [ -d ${transforms[$index]}/kfp_ray ] && echo ${KFP_BLACK_LIST} | grep -qv ${transform} ; then
header_text "Running ${transforms[$index]} workflow test"
break
fi
done
make -C ${transforms[$index]} workflow-test
header_text "Run ${transforms[$index]} completed"
echo "Run ${transforms[$index]} completed"
test-kfp-v2:
runs-on: ubuntu-22.04
Expand Down Expand Up @@ -214,9 +217,9 @@ jobs:
while :
do
dir=("code" "universal" "language") && index=$(($RANDOM % ${#dir[@]})) && subdirs=${dir[$index]} && transforms=($(find transforms/$subdirs -type d -maxdepth 1 -mindepth 1 ))
# First element is not really a subdir but rather the current dir so remove it and randomly choose a transform to run
set -- "${transforms[@]}" && shift && transforms=("$@") && size=${#transforms[@]} && index=$(($RANDOM % $size))
if [ -d ${transforms[$index]}/kfp_ray ]; then
set -- "${transforms[@]}" && transforms=("$@") && size=${#transforms[@]} && index=$(($RANDOM % $size))
transform=$(basename "${transforms[$index]}")
if [ -d ${transforms[$index]}/kfp_ray ] && echo ${KFP_BLACK_LIST} | grep -qv ${transform} ; then
header_text "Running ${transforms[$index]} workflow test"
break
fi
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ dmypy.json
**/*.swp
**/.pydevproject
.DS_Store
.directory


**/*.back
Expand Down
16 changes: 8 additions & 8 deletions .make.defaults
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ DOCKER_FILE?=Dockerfile
DOCKER?=docker
DOCKER_PLATFORM?=linux/amd64
# Can be used by transforms or others to add args to the "docker build" command in .defaults.image target
DOCKER_BUILD_EXTRA_ARGS=
DOCKER_BUILD_EXTRA_ARGS?=
# Can be used by transforms or others to add args to the "pip install" commands referencing toml or requirements.txt files
PIP_INSTALL_EXTRA_ARGS=
PIP_INSTALL_EXTRA_ARGS?=
DOCKER_HOSTNAME?=quay.io
DOCKER_NAMESPACE ?= dataprep1/data-prep-kit
DOCKER_REGISTRY_USER?=$(DPK_DOCKER_REGISTRY_USER)
Expand Down Expand Up @@ -578,12 +578,12 @@ MINIO_ADMIN_PWD= localminiosecretkey
@# Help: Update pyproject.toml to depend on lib versions defined in .make.versions
@if [ -e pyproject.toml ]; then \
cat pyproject.toml | sed \
-e 's/"data-prep-toolkit-ray\(..\).*",/"data-prep-toolkit-ray\1$(DPK_LIB_VERSION)",/' \
-e 's/"data-prep-toolkit-spark\(..\).*",/"data-prep-toolkit-spark\1$(DPK_LIB_VERSION)",/' \
-e 's/"data-prep-toolkit-kfp\([=><][=><]\).*",/"data-prep-toolkit-kfp\1$(DPK_LIB_KFP_VERSION)",/' \
-e 's/"data-prep-toolkit\([=><][=><]\).*",/"data-prep-toolkit\1$(DPK_LIB_VERSION)",/' \
-e 's/"ray\[default\]\([=><][=><]\).*",/"ray\[default\]\1$(RAY)",/' \
-e 's/"data-prep-toolkit-kfp-shared\(..\).*",/"data-prep-toolkit-kfp-shared\1$(DPK_LIB_KFP_VERSION)",/' \
-e 's/"data-prep-toolkit-ray\([=><~][=]\).*"/"data-prep-toolkit-ray\1$(DPK_LIB_VERSION)"/' \
-e 's/"data-prep-toolkit-spark\([=><~][=]\).*"/"data-prep-toolkit-spark\1$(DPK_LIB_VERSION)"/' \
-e 's/"data-prep-toolkit-kfp\([=><~][=]\).*"/"data-prep-toolkit-kfp\1$(DPK_LIB_KFP_VERSION)"/' \
-e 's/"data-prep-toolkit\([=><~][=]\).*"/"data-prep-toolkit\1$(DPK_LIB_VERSION)"/' \
-e 's/"ray\[default\]\([=><~][=]\).*"/"ray\[default\]\1$(RAY)"/' \
-e 's/"data-prep-toolkit-kfp-shared\(..\).*"/"data-prep-toolkit-kfp-shared\1$(DPK_LIB_KFP_VERSION)"/' \
> tt.toml; \
mv tt.toml pyproject.toml; \
fi
Expand Down
3 changes: 3 additions & 0 deletions .make.versions
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ HEADER_CLEANSER_RAY_VERSION=$(DPK_VERSION)

PII_REDACTOR_PYTHON_VERSION=$(DPK_VERSION)

HTML2PARQUET_PYTHON_VERSION=$(DPK_VERSION)

DPK_TRNASFORM_REV=$(DPK_VERSION)

################## ################## ################## ################## ################## ##################
Expand All @@ -117,3 +119,4 @@ ifeq ($(KFPv2), 1)
else
WORKFLOW_SUPPORT_LIB=kfp_v1_workflow_support
endif

23 changes: 4 additions & 19 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,23 @@
Our project welcomes external contributions. If you have an itch, please feel
free to scratch it.

To contribute code or documentation, please submit a pull request.

A good way to familiarize yourself with the codebase and contribution process is
to look for and tackle low-hanging fruit in the issues.
Before embarking on a more ambitious contribution, please quickly [get in touch](#communication) with us.
To contribute code or documentation, please submit a pull request. You can get started with open issues with the label - good first issue.
Before embarking on a more ambitious contribution, please quickly get in touch with us via raising an issue.

**Note: We appreciate your effort, and want to avoid a situation where a contribution
requires extensive rework (by you or by us), sits in backlog for a long time, or
cannot be accepted at all!**

### Proposing new features

If you would like to implement a new feature, please raise an issue.
If you would like to implement a new feature, please raise an issue,
before sending a pull request so the feature can be discussed. This is to avoid
you wasting your valuable time working on a feature that the project developers
are not interested in accepting into the code base.

### Fixing bugs

If you would like to fix a bug, please raise an issue before sending a
If you would like to fix a bug, please raise an issue, before sending a
pull request so it can be tracked.

### Merge approval
Expand Down Expand Up @@ -73,18 +70,6 @@ and include flag `-s | --sign-off` when you commit a change to your local git re
git commit -s -m "your commit message"
```

## Overall Setup
Please install Python 3.10 or 3.11, then

```
git clone git@github.com:IBM/data-prep-kit.git
cd data-prep-kit
pip install pre-commit
pip install twine
pre-commit install
make help
```

## Transform Setup and Testing
Please note the many useful options of the make command, as shown by using `make help`, that will take care of manual steps that would have been needed for tasks such as building, publishing, setting up or testing transforms in most directories.

Expand Down
Loading

0 comments on commit 65f4ac4

Please sign in to comment.