Skip to content

Commit

Permalink
Add links
Browse files Browse the repository at this point in the history
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
  • Loading branch information
yury-tokpanov committed Oct 16, 2024
1 parent 65affd6 commit 2fdbe18
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
2 changes: 1 addition & 1 deletion tutorials/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,5 @@ To get started, we recommend starting with the following tutorials to become fam
| [synthetic-preference-data](./synthetic-preference-data) | Demonstrates the use of NeMo Curator synthetic data generation modules to leverage [LLaMa 3.1 405B Instruct](https://build.nvidia.com/meta/llama-3_1-405b-instruct) for generating synthetic preference data |
| [synthetic-retrieval-evaluation](./synthetic-retrieval-evaluation) | Demonstrates the use of NeMo Curator synthetic data generation modules to leverage [LLaMa 3.1 405B Instruct](https://build.nvidia.com/meta/llama-3_1-405b-instruct) for generating synthetic data to evaluate retrieval pipelines |
| [tinystories](./tinystories) | A comprehensive example of curating a small dataset to use for model pre-training. | [Blog post](https://developer.nvidia.com/blog/curating-custom-datasets-for-llm-training-with-nvidia-nemo-curator/)
| [zyda2-tutorial](./zyda2-tutorial) | A comprehensive tutorial on how to reproduce Zyda2 dataset. |
| [zyda2-tutorial](./zyda2-tutorial) | A comprehensive tutorial on how to reproduce [Zyda2 dataset](https://huggingface.co/datasets/Zyphra/Zyda2) with NeMo Curator. | [Nvidia blog post](https://developer.nvidia.com/blog/train-highly-accurate-llms-with-the-zyda-2-open-5t-token-dataset-processed-with-nvidia-nemo-curator/) [Zyphra blog post](https://www.zyphra.com/post/building-zyda-2)
</div>
6 changes: 5 additions & 1 deletion tutorials/zyda2-tutorial/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Zyda2
This tutorial demonstrates how to reproduce Zyda2 dataset, that was curated by Zyphra using NeMo Curator: https://huggingface.co/datasets/Zyphra/Zyda2-5T
This tutorial demonstrates how to reproduce Zyda2 dataset, that was curated by Zyphra in collaboration with Nvidia using NeMo Curator.

- Download Zyda2 dataset from HuggingFace: https://huggingface.co/datasets/Zyphra/Zyda2
- Zyphra blog: https://www.zyphra.com/post/building-zyda-2
- Nvidia blog: https://developer.nvidia.com/blog/train-highly-accurate-llms-with-the-zyda-2-open-5t-token-dataset-processed-with-nvidia-nemo-curator/

## Tutorial structure
Tutorial is split into separate folders each containing scripts for running corresponding steps:
Expand Down

0 comments on commit 2fdbe18

Please sign in to comment.