Run fast transformer decoders on your Macbooks' GPU! Working towards a fast reimplementation of GPT-2 and Llama-like models in mlx.
The aim is that the only dependencies are:
mlx
sentencepiece
tqdm
numpy
With an optional dev dependency of:
transformers
for downloading and converting weights
-
makemore llama reimplementation(train your own w/python train.py
!) - BERT merged into
mlx-examples
- Phi-2 merged into
mlx-examples
- AdamW merged into
mlx
This project will be considered complete once these goals are achieved.
- finetune BERT
- GPT-2 reimplementation and loading in MLX
- speculative decoding
- learning rate scheduling
poetry install --no-root
To download and convert the model:
python phi2/convert.py
That will fill in weights/phi-2.npz
.
🚧 (Not yet done) To run the model:
python phi2/generate.py
Some great resources: