Skip to content

Commit

Permalink
fix Dockerfile users
Browse files Browse the repository at this point in the history
Signed-off-by: Maroun Touma <touma@us.ibm.com>
  • Loading branch information
touma-I committed Oct 24, 2024
1 parent e7625ef commit 88438d1
Show file tree
Hide file tree
Showing 24 changed files with 58 additions and 77 deletions.
31 changes: 31 additions & 0 deletions data-processing-lib/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Data Processing Library
This provides a python framework for developing _transforms_
on data stored in files - currently parquet files are supported -
and running them in a [ray](https://www.ray.io/) cluster.
Data files may be stored in the local file system or COS/S3.
For more details see the [documentation](../doc/overview.md).

### Virtual Environment
The project uses `pyproject.toml` and a Makefile for operations.
To do development you should establish the virtual environment
```shell
make venv
```
and then either activate
```shell
source venv/bin/activate
```
or set up your IDE to use the venv directory when developing in this project

## Library Artifact Build and Publish
To test, build and publish the library
```shell
make test build publish
```

To up the version number, edit the Makefile to change VERSION and rerun
the above. This will require committing both the `Makefile` and the
autotmatically updated `pyproject.toml` file.



4 changes: 2 additions & 2 deletions data-processing-lib/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ dynamic = ["dependencies", "optional-dependencies"]
[project_urls]
Repository = "https://github.com/IBM/data-prep-kit"
Issues = "https://github.com/IBM/data-prep-kit/issues"
Documentation = "https://ibm.github.io/data-prep-kit/"
Documentation = "https://ibm.github.io/data-prep-kit/doc"
"Transform project" = "https://github.com/IBM/data-prep-kit/tree/dev/transforms/universal/noop"

[build-system]
Expand All @@ -26,7 +26,7 @@ build-backend = "setuptools.build_meta"
file = ["python/requirements.txt"]

[tool.setuptools.dynamic.optional-dependencies]
dev = { file = ["requirements.txt"]}
dev = { file = ["requirements-dev.txt"]}
ray = { file = ["ray/requirements.txt"]}
spark = { file = ["spark/requirements.txt"]}

Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code2parquet/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

# END OF STEPS destined for a data-prep-kit base image
Expand Down
4 changes: 2 additions & 2 deletions transforms/code/code_profiler/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

COPY --chown=dpk:root src/ src/
Expand All @@ -30,7 +30,7 @@ COPY ./src/code_profiler_transform_python.py .
COPY ./src/code_profiler_local.py local/

# Copy the tree-sitter bindings (this is the important part)
COPY --chown=ray:users ../../input/tree-sitter-bindings/ /home/dpk/input/tree-sitter-bindings/
COPY --chown=dpk:root ../../input/tree-sitter-bindings/ /home/dpk/input/tree-sitter-bindings/

# copy test
# COPY test/ test/
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code_quality/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

# END OF STEPS destined for a data-prep-kit base image
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/header_cleanser/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

# END OF STEPS destined for a data-prep-kit base image
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/license_select/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

# END OF STEPS destined for a data-prep-kit base image
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/malware/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

COPY --chown=dpk:root src/ src/
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/proglang_select/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

# END OF STEPS destined for a data-prep-kit base image
Expand Down
2 changes: 1 addition & 1 deletion transforms/universal/doc_id/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

COPY --chown=dpk:root src/ src/
Expand Down
2 changes: 1 addition & 1 deletion transforms/universal/doc_id/spark/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=spark:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}[spark]


Expand Down
2 changes: 1 addition & 1 deletion transforms/universal/ededup/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

COPY --chown=dpk:root src/ src/
Expand Down
2 changes: 1 addition & 1 deletion transforms/universal/filter/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

# END OF STEPS destined for a data-prep-kit base image
Expand Down
2 changes: 1 addition & 1 deletion transforms/universal/filter/spark/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=spark:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}[spark]

COPY --chown=spark:root python-transform/ python-transform/
Expand Down
2 changes: 1 addition & 1 deletion transforms/universal/hap/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

# END OF STEPS destined for a data-prep-kit base image
Expand Down
50 changes: 0 additions & 50 deletions transforms/universal/hap/ray/output/metadata.json

This file was deleted.

Binary file removed transforms/universal/hap/ray/output/test1.parquet
Binary file not shown.
2 changes: 1 addition & 1 deletion transforms/universal/noop/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

# END OF STEPS destined for a data-prep-kit base image
Expand Down
4 changes: 2 additions & 2 deletions transforms/universal/noop/spark/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=root:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}[spark]

COPY --chown=spark:root python-transform/ python-transform/
COPY --chown=root:root python-transform/ python-transform/
RUN cd python-transform && pip install --no-cache-dir -e .

COPY --chown=root:root src/ src/
Expand Down
2 changes: 1 addition & 1 deletion transforms/universal/profiler/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

COPY --chown=dpk:root src/ src/
Expand Down
2 changes: 1 addition & 1 deletion transforms/universal/profiler/spark/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}[spark]

COPY --chown=spark:root python-transform/ python-transform/
Expand Down
8 changes: 4 additions & 4 deletions transforms/universal/resize/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

# Install python project source
COPY --chown=dpk:users src/ src/
COPY --chown=dpk:users pyproject.toml pyproject.toml
COPY --chown=dpk:users README.md Readme.md
COPY --chown=dpk:root src/ src/
COPY --chown=dpk:root pyproject.toml pyproject.toml
COPY --chown=dpk:root README.md Readme.md
COPY --chown=dpk:root requirements.txt requirements.txt
RUN pip install --no-cache-dir -e .

Expand Down
2 changes: 1 addition & 1 deletion transforms/universal/resize/spark/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=spark:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}[spark]

COPY --chown=spark:root python-transform/ python-transform/
Expand Down
2 changes: 1 addition & 1 deletion transforms/universal/tokenization/python/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ARG WHEEL_FILE_NAME

# Copy and install data processing libraries
# These are expected to be placed in the docker context before this is run (see the make image).
COPY --chown=ray:users data-processing-dist data-processing-dist
COPY --chown=dpk:root data-processing-dist data-processing-dist
RUN pip install data-processing-dist/${WHEEL_FILE_NAME}

# END OF STEPS destined for a data-prep-kit base image
Expand Down

0 comments on commit 88438d1

Please sign in to comment.