Skip to content

Pull requests: NVIDIA/NeMo-Curator

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Semantic deduplication improvements
#327 opened Oct 25, 2024 by sarahyurick Draft
1 of 2 tasks
Dapt data curation fuzzy dedupe
#322 opened Oct 24, 2024 by ruchaa-apte Loading…
Add more comments for clearing Fuzzy Dedup Cache
#321 opened Oct 23, 2024 by praateekmahajan Loading…
3 tasks
Japanese support for get_word_splitter
#320 opened Oct 23, 2024 by sarahyurick Loading…
Skip reading files with incorrect extension
#318 opened Oct 22, 2024 by sarahyurick Loading…
Make max_text_bytes_per_part configurable
#314 opened Oct 18, 2024 by sarahyurick Loading…
[REVIEW] Speedup Connected Components
#302 opened Oct 15, 2024 by VibhuJawa Loading…
3 tasks done
add tutorials/pretraining-Vietnamese-data-curation
#300 opened Oct 14, 2024 by hoangphu7122002 Loading…
3 tasks
Added example notebook for translation with ct2 model. documentation Improvements or additions to documentation
#262 opened Sep 25, 2024 by uahmed93 Draft
3 tasks
Add support for parallel data curation
#193 opened Aug 8, 2024 by shuoyangd Loading…
3 tasks done
Fixed bug: changed to correct model name
#186 opened Aug 6, 2024 by ByteWrite Loading…
1 of 3 tasks
Add Multiple Model Quality Classification example documentation Improvements or additions to documentation
#173 opened Jul 30, 2024 by sarahyurick Draft
Adding an example for executing NeMo modules using kubernetes Python … documentation Improvements or additions to documentation
#148 opened Jul 9, 2024 by dpadmanabhan03 Loading…
2 of 3 tasks
ProTip! Exclude everything labeled bug with -label:bug.