LLMC: 准确高效的LLM压缩工具

LLMC 是一个开箱即用的工具，专为压缩LLM设计，利用最先进的压缩算法提高效率并减少模型体积，同时不影响预测精度。

英文文档在此处。

中文文档在此处。

docker hub在此处。

阿里云docker: registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:[tag]

你可以通过以下命令下载可以运行llmc的docker镜像，中国大陆用户推荐使用阿里云docker。

docker hub

docker pull llmcompression/llmc:pure-latest

阿里云docker

docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-latest

社区:

Discord 服务器
腾讯QQ群

亮点功能

💥综合算法支持: 提供广泛的 ✨SOTA压缩算法 支持，包括 ✅量化、✅混合精度量化和 ✅稀疏化，同时保持与原始仓库一致的精度。我们还提供 ✨量化最佳实践（参见✨最佳实践 章节此处），确保最佳性能和效率。
💥支持的格式: 支持 ✨量化（整型和浮点）和 ✨稀疏化，具体包括 ✅权重激活量化、✅权重量化、✅混合精度量化，以及 ✅结构化和 ✅非结构化稀疏化。
💥广泛模型支持: 支持多种 ✨LLM模型，包括 ✅LLama、✅Mistral、✅InternLM2、✅Qwen2 等，以及 ✅MOE 和 ✅VLM 模型（参见支持的模型列表）。
💥多后端兼容性: 无缝集成多个后端，增强部署灵活性。多种量化设置和模型格式兼容广泛的后端和硬件平台，例如 ✅VLLM、✅Sglang、✅LightLLM、✅MLC-LLM 和 ✅AutoAWQ，使其高度灵活（参见✨推理后端 章节此处）。
💥性能效率: 支持大规模LLM的量化，例如 ✨Llama3.1-405B 和 ✨OPT-175B，并可在 单个 A100/H100/H800 GPU 上评估 PPL。

使用指南

请参阅 🚀快速入门章节此处。

支持的模型列表

✅ BLOOM

✅ LLaMA

✅ LLaMA V2

✅ StarCoder

✅ OPT

✅ Falcon

✅ InternLM2

✅ Mistral

✅ LLaMA V3

✅ Mixtral

✅ Qwen V2

✅ LLaVA

✅ InternLM2.5

✅ StableLM

✅ Gemma2

✅ Phi2

✅ Phi 1.5

✅ MiniCPM

✅ SmolLM

你可以参考 llmc/models/*.py 文件添加自己的模型类型。

支持的后端列表

✅ VLLM

✅ LightLLM

✅ Sglang

✅ MLC-LLM

✅ AutoAWQ

支持的算法列表

量化

✅ Naive

✅ AWQ

✅ GPTQ

✅ SmoothQuant

✅ OS+

✅ OmniQuant

✅ NormTweaking

✅ AdaDim

✅ QUIK

✅ SpQR

✅ DGQ

✅ OWQ

✅ LLM.int8()

✅ HQQ

✅ QuaRot

剪枝

✅ Naive（Magnitude）

✅ Wanda

✅ ShortGPT

鸣谢

我们的代码参考了以下仓库：

https://github.com/mit-han-lab/llm-awq
https://github.com/mit-han-lab/smoothquant
https://github.com/OpenGVLab/OmniQuant
https://github.com/IST-DASLab/gptq
https://github.com/ModelTC/Outlier_Suppression_Plus
https://github.com/IST-DASLab/QUIK
https://github.com/Vahe1994/SpQR
https://github.com/ilur98/DGQ
https://github.com/xvyaward/owq
https://github.com/TimDettmers/bitsandbytes
https://github.com/mobiusml/hqq
https://github.com/spcl/QuaRot
https://github.com/locuslab/wanda
https://github.com/EleutherAI/lm-evaluation-harness

Star 历史

引用

如果您认为我们的 LLM-QBench 论文/llmc 工具对您的研究有用或相关，请务必引用我们的论文：

@misc{llmc,
   author = {llmc contributors},
   title = {llmc: Towards Accurate and Efficient LLM Compression},
   year = {2024},
   publisher = {GitHub},
   journal = {GitHub repository},
   howpublished = {\url{https://github.com/ModelTC/llmc}},
}

@misc{gong2024llmqbench,
      title={LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models},
      author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
      year={2024},
      eprint={2405.06001},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{gong2024llmcbenchmarkinglargelanguage,
      title={LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit},
      author={Ruihao Gong and Yang Yong and Shiqiao Gu and Yushi Huang and Chentao Lv and Yunchen Zhang and Xianglong Liu and Dacheng Tao},
      year={2024},
      eprint={2405.06001},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2405.06001},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_zh.md

README_zh.md

LLMC: 准确高效的LLM压缩工具

最新消息

亮点功能

使用指南

支持的模型列表

支持的后端列表

支持的算法列表

量化

剪枝

鸣谢

Star 历史

引用

引用

Files

README_zh.md

Latest commit

History

README_zh.md

File metadata and controls

LLMC: 准确高效的LLM压缩工具

最新消息

亮点功能

使用指南

支持的模型列表

支持的后端列表

支持的算法列表

量化

剪枝

鸣谢

Star 历史

引用

引用