We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import os import platform import torch from transformers import AutoTokenizer, AutoModelForCausalLM def auto_configure_device_map(num_gpus: int): num_trans_layers = 32 per_gpu_layers = num_trans_layers / num_gpus device_map = {'model.embed_tokens': 0, 'model.norm': num_gpus-1, 'lm_head': num_gpus-1} for i in range(num_trans_layers): device_map[f'model.layers.{i}'] = int(i//per_gpu_layers) return device_map MODEL_NAME = "baichuan-inc/baichuan-7B" NUM_GPUS = torch.cuda.device_count() if torch.cuda.is_available() else None device_map = auto_configure_device_map(NUM_GPUS) if NUM_GPUS > 0 else None device = torch.device("cuda") if NUM_GPUS > 0 else torch.device("cpu") device_dtype = torch.half if NUM_GPUS > 0 else torch.float tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.float16, trust_remote_code=True).quantize(8) # 当前是int8量化;需要int4量化,只需将8改为4即可;需要原生部署,去掉.quantize(8)即可 model = dispatch_model(model, device_map=device_map) model = model.eval()
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Required prerequisites
Questions
支持Baichuan-7B 原生部署、 int8 和 int4 量化部署,代码如下:
感谢 #50 中小伙伴们提供的宝贵方法。
Checklist
The text was updated successfully, but these errors were encountered: