Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFT Script and Hyperparameters used for DBRX-Instruct #99

Open
alpayariyak opened this issue Mar 28, 2024 · 5 comments
Open

SFT Script and Hyperparameters used for DBRX-Instruct #99

alpayariyak opened this issue Mar 28, 2024 · 5 comments

Comments

@alpayariyak
Copy link

Hi, I saw you mentioned that you used your fork of Megatron-LM for training - could you please provide scripts and hyperparams used for the SFT of DBRX? It would mean the world for the OSS community!

At openchat, we'd like to fine-tune your model on our data and open source it.

@alpayariyak
Copy link
Author

The training would be on H100s.

Another question - how many do you need at minimum?

@mvpatel2000
Copy link
Contributor

@tgale96 might have scripts for megatron LM integration

We will have integrations with other stacks soon.

For DBRX specifically, you do not necessarily need to use megablocks (though it is more efficient) -- Zero3 + the HF model code is sufficient. For example, foundry would work with this: https://github.com/mosaicml/llm-foundry

CC: @dakinggg

@alpayariyak
Copy link
Author

alpayariyak commented Mar 29, 2024

Thank you very much! Do you have insight into the hyperparameters used for DBRX Instruct?

Hyperparameter exploration on this scale is very expensive and out of reach for most of the open source, so this would be incredibly helpful to have.

@alpayariyak
Copy link
Author

If there's any chance you could confirm, might these be the hyperparams used for DBRX Instruct?
https://github.com/mosaicml/llm-foundry/blob/7a8a1564827cbcbc281a6bdc4a11bc8f584142bd/scripts/train/yamls/finetune/dbrx-full-ft.yaml

@alpayariyak
Copy link
Author

One more question (if the above config is what was actually used) - it is noted there that 8x8x80GB are required for the fine-tune. Would you mind sharing approximately the number of tokens or sft examples, the GPUs you used and how long this took?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants