Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tutorials] Update the SDG tutorial and expose the inference endpoint #301

Merged
merged 1 commit into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 15 additions & 11 deletions tutorials/peft-curation-with-sdg/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,45 +48,49 @@ showcased in this code:

* In order to run the data curation pipeline with semantic deduplication enabled, you would need an
NVIDIA GPU.
* To generate synthetic data, you would need a synthetic data generation model compatible with the OpenAI API. Out of the box, this tutorial supports the following model through the [build.nvidia.com](https://build.nvidia.com) API gateway:
* To generate synthetic data, you would need a synthetic data generation model compatible with the [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). Out of the box, this tutorial supports the following model through the [build.nvidia.com](https://build.nvidia.com) API gateway:
* [Nemotron-4 340B Instruct](https://build.nvidia.com/nvidia/nemotron-4-340b-instruct)
* [LLaMa 3.1 405B Instruct](https://build.nvidia.com/meta/llama-3_1-405b-instruct)
* For assigning qualitative metrics to the generated records, you would need a reward model compatible with the OpenAI API (such as the [Nemotron-4 340B Reward](https://build.nvidia.com/nvidia/nemotron-4-340b-reward) model).
* For assigning qualitative metrics to the generated records, you would need a reward model compatible with the [OpenAI API](https://platform.openai.com/docs/api-reference/introduction) (such as the [Nemotron-4 340B Reward](https://build.nvidia.com/nvidia/nemotron-4-340b-reward) model).

> **Note:** A valid [build.nvidia.com](https://build.nvidia.com) API key is required to use any of the above models.
> **Note:** A valid [build.nvidia.com](https://build.nvidia.com) API key is required to use any of the above models. You can obtain a free API key by visiting [build.nvidia.com](https://build.nvidia.com) and creating an account with your email address.

## Usage
After installing the NeMo Curator package, you can simply run the following commands:
```bash
# Running the basic pipeline (no GPUs or external LLMs needed)
python tutorials/peft-curation-with-sdg/main.py

# Run with synthetic data generation and semantic dedeuplication
# Running with synthetic data generation and semantic dedeuplication using
# an external LLM inference endpoint located at "https://api.example.com/v1/chat/completions"
# and the model called "my-llm-model" that is served at that endpoint:
python tutorials/peft-curation-with-sdg/main.py \
--api-key YOUR_BUILD.NVIDIA.COM_API_KEY \
--synth-gen-endpoint https://api.example.com/v1/chat/completions \
--synth-gen-model my-llm-model \
--api-key API_KEY_FOR_LLM_ENDPOINT \
--device gpu

# Here are some examples that:
# - Use the GPU and enable semantic deduplication
# - Use the specified model from build.nvidia.com for synthetic data generation
# - Do 1 round of synthetic data generation
# - Generate synthetic data using 0.1% of the real data
# - Use the specified model from build.nvidia.com for synthetic data generation
# - Use the GPU and enable semantic deduplication

# Using LLaMa 3.1 405B:
python tutorials/peft-curation-with-sdg/main.py \
--api-key YOUR_BUILD.NVIDIA.COM_API_KEY \
--device gpu \
--synth-gen-model "meta/llama-3.1-405b-instruct" \
--synth-gen-rounds 1 \
--synth-gen-ratio 0.001 \
--synth-gen-model "meta/llama-3.1-405b-instruct"
--device gpu

# Using Nemotron-4 340B:
python tutorials/peft-curation-with-sdg/main.py \
--api-key YOUR_BUILD.NVIDIA.COM_API_KEY \
--device gpu \
--synth-gen-model "nvidia/nemotron-4-340b-instruct" \
--synth-gen-rounds 1 \
--synth-gen-ratio 0.001 \
--synth-gen-model "nvidia/nemotron-4-340b-instruct"
--device gpu
```

By default, this tutorial will use at most 8 workers to run the curation pipeline. If you face any
Expand Down
33 changes: 25 additions & 8 deletions tutorials/peft-curation-with-sdg/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,16 +242,28 @@ def run_pipeline(args, jsonl_fp):
Returns:
The file path to the final curated JSONL file.
"""
# Disable synthetic data generation if no model specified, or no API key is provided.
if args.synth_gen_model is None or args.synth_gen_model == "":
# Disable synthetic data generation if the necessary arguments are not provided.
if not args.synth_gen_endpoint:
print(
"No synthetic data generation endpoint provided. Skipping synthetic data generation."
)
args.synth_gen_rounds = 0
if not args.synth_gen_model:
print(
"No synthetic data generation model provided. Skipping synthetic data generation."
)
args.synth_gen_round = 0
if args.api_key is None:
print("No API key provided. Skipping synthetic data generation.")
args.synth_gen_rounds = 0
if not args.api_key:
print(
"No synthetic data generation API key provided. Skipping synthetic data generation."
)
args.synth_gen_rounds = 0

if args.synth_gen_rounds:
print(
f"Using {args.synth_gen_endpoint}/{args.synth_gen_model} for synthetic data generation."
)

synth_gen_ratio = args.synth_gen_ratio
synth_gen_rounds = args.synth_gen_rounds
synth_n_variants = args.synth_n_variants
Expand All @@ -277,7 +289,7 @@ def run_pipeline(args, jsonl_fp):
# Create the synthetic data generator.
llm_client = AsyncOpenAIClient(
AsyncOpenAI(
base_url="https://integrate.api.nvidia.com/v1",
base_url=args.synth_gen_endpoint,
api_key=args.api_key or "",
timeout=args.api_timeout,
)
Expand Down Expand Up @@ -348,12 +360,17 @@ def run_pipeline(args, jsonl_fp):
def main():
parser = argparse.ArgumentParser()
parser = ArgumentHelper(parser).add_distributed_args()
parser.add_argument(
"--synth-gen-endpoint",
type=str,
default="https://integrate.api.nvidia.com/v1",
help="The API endpoint to use for synthetic data generation. Any endpoint compatible with the OpenAI API can be used.",
)
parser.add_argument(
"--synth-gen-model",
type=str,
default="nvidia/nemotron-4-340b-instruct",
choices=["nvidia/nemotron-4-340b-instruct", "meta/llama-3.1-405b-instruct", ""],
help="The model from build.nvidia.com to use for synthetic data generation. Leave blank to skip synthetic data generation.",
help="The model from the provided API endpoint to use for synthetic data generation. Leave blank to skip synthetic data generation.",
)
parser.add_argument(
"--synth-gen-ratio",
Expand Down