[Feature]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet #9714

CyrusCY · 2024-10-26T04:10:00Z

Your current environment

Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.26.4
Libc version: glibc-2.31

Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA H100 80GB HBM3

Model Input Dumps

vllm serve unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit --quantization bitsandbytes --load-format bitsandbytes --trust-remote-code --enforce-eager

Initializing an LLM engine (v0.6.3.post1) with config: model='unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit', speculative_config=None
, tokenizer='unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None,
rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.BITSANDBYTES, tensor_pa
rallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=bitsandbytes, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, d
evice_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forw
ard_time=False, collect_model_execute_time=False), seed=0, served_model_name=unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit, num_scheduler_steps=1, chunked_prefill_enabled=F
alse multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=False, use_cached_outputs=True, mm_processor_kwargs=None)

🐛 Describe the bug

AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet

I was trying the Llama-3.2-90B-Vision-Instruct-bnb-4bit model, it shows such an error. Not sure which place is better to raise this issue, unsloth or transformers or just here.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-10-26T05:47:36Z

cc @mgoin

CyrusCY added the bug Something isn't working label Oct 26, 2024

DarkLight1337 added feature request and removed bug Something isn't working labels Oct 26, 2024

DarkLight1337 changed the title ~~[Bug]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet~~ [Feature]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet Oct 26, 2024

DarkLight1337 mentioned this issue Oct 26, 2024

[RFC]: Multi-modality Support Refactoring #4194

Open

56 tasks

Isotr0py linked a pull request Oct 26, 2024 that will close this issue

[Model] Add BNB quantization support for Mllama #9720

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet #9714

[Feature]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet #9714

CyrusCY commented Oct 26, 2024

DarkLight1337 commented Oct 26, 2024

[Feature]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet #9714

[Feature]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet #9714

Comments

CyrusCY commented Oct 26, 2024

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

DarkLight1337 commented Oct 26, 2024