-
Notifications
You must be signed in to change notification settings - Fork 75
Pull requests: NVIDIA/NeMo-Curator
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add
nemo_curator.__version__
, --version
, and -v
displays
#323
opened Oct 24, 2024 by
sarahyurick
Loading…
Add more comments for clearing Fuzzy Dedup Cache
#321
opened Oct 23, 2024 by
praateekmahajan
Loading…
3 tasks
Retiring Run GPU CI/CD on PR
text_bytes_aware_shuffle
to use shuffle
directly
gpuci
#316
opened Oct 21, 2024 by
praateekmahajan
•
Draft
3 tasks
[WIP] MinHash improvement using minhash_permuted
#313
opened Oct 18, 2024 by
praateekmahajan
•
Draft
3 tasks
add tutorials/pretraining-Vietnamese-data-curation
#300
opened Oct 14, 2024 by
hoangphu7122002
Loading…
3 tasks
[DRAFT] Passing meta to map_partitions for read_data
#291
opened Oct 9, 2024 by
praateekmahajan
•
Draft
3 tasks
[DRAFT] Trying dask_cudf's read_json / read_parquet
#285
opened Oct 8, 2024 by
praateekmahajan
•
Draft
3 tasks
Added example notebook for translation with ct2 model.
documentation
Improvements or additions to documentation
Add Multiple Model Quality Classification example
documentation
Improvements or additions to documentation
#173
opened Jul 30, 2024 by
sarahyurick
•
Draft
Adding an example for executing NeMo modules using kubernetes Python …
documentation
Improvements or additions to documentation
#148
opened Jul 9, 2024 by
dpadmanabhan03
Loading…
2 of 3 tasks
Fix #53 - Add batched files reading support to separate_by_metadata script
#54
opened May 6, 2024 by
miguelusque
Loading…
ProTip!
Exclude everything labeled
bug
with -label:bug.