Skip to content

Commit

Permalink
added pii redactor
Browse files Browse the repository at this point in the history
Signed-off-by: Maroun Touma <touma@us.ibm.com>
  • Loading branch information
touma-I committed Sep 11, 2024
1 parent 703ebe0 commit d54708a
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 6 deletions.
6 changes: 5 additions & 1 deletion transforms/packaging/python/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,11 @@ PACKAGING_RUN_TIME=python

#Excluded List
# ./code/malware
# ./language/pii_redactor
# ./universal/html2parquet
# ./universal/profiler # Missing implementation
# ./universal/fdedup # Missing implementation
# code/repo_level_ordering # Missing implementation


TRANSFORMS_NAMES = code/code_quality \
code/code2parquet \
Expand All @@ -26,6 +29,7 @@ TRANSFORMS_NAMES = code/code_quality \
language/doc_quality \
language/lang_id \
language/pdf2parquet \
language/pii_redactor \
language/text_encoder \
universal/ededup \
universal/filter \
Expand Down
14 changes: 9 additions & 5 deletions transforms/packaging/python/requirements.transforms.python.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,13 @@ bs4==0.0.2
#docling 1.9.0 depends on docling-parse<2.0.0 and >=1.1.3
#pdf2parquet depends on docling-parse==1.0.0
#docling 1.8.5 depends on docling-parse<2.0.0 and >=1.1.3
docling-parse>=1.0.0,
#docling-parse>=1.0.0,
# language/doc_chunk has conflict dependencies with pdf2parquet that need to be resolved
# doc_chunk depends on docling>=1.8.2,<2.0.0
# pdf2parquet depends on docling==1.7.0
#docling==1.7.0,
docling>=1.8.2,<2.0.0,
llama-index-core>=0.11.1,<0.12.0,
docling-core>=1.1.2,<2.0.0,
quackling==0.1.1,
docling==1.8.5,
quackling==0.4.0,
# quackling will pull
# docling>=1.8.2,<2.0.0
# llama-index-core<0.12.0,>=0.11.1
Expand All @@ -29,4 +27,10 @@ scancode-toolkit==32.1.0 ; platform_system != 'Darwin'
sentence-transformers==3.0.1
transformers==4.38.2
xxhash==3.4.1
# PII-redactor
presidio-analyzer>=2.2.355
presidio-anonymizer>=2.2.355
flair>=0.14.0
pandas>=2.2.2


1 change: 1 addition & 0 deletions transforms/packaging/ray/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ TRANSFORMS_NAMES = code/proglang_select \
language/doc_quality \
language/lang_id \
language/text_encoder \
language/pii_redactor \
language/pdf2parquet \
universal/fdedup \
universal/tokenization \
Expand Down

0 comments on commit d54708a

Please sign in to comment.