This repository has been archived by the owner on May 10, 2024. It is now read-only.
v0.1.3
This patch includes some bugfixes as well enabling passing huggingface tokens to access gated/private models for serving and training. This update also enables tensor parallelism on all gpus of a given model to enable serving of larger models like llama-70b on a multigpu instance.
I promise to write a detailed changelog coming up in v0.1.4!