Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet #9714

Open
1 task done
CyrusCY opened this issue Oct 26, 2024 · 1 comment · May be fixed by #9720
Open
1 task done

Comments

@CyrusCY
Copy link

CyrusCY commented Oct 26, 2024

Your current environment

Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.26.4
Libc version: glibc-2.31

Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA H100 80GB HBM3

Model Input Dumps

vllm serve unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit --quantization bitsandbytes --load-format bitsandbytes --trust-remote-code --enforce-eager

Initializing an LLM engine (v0.6.3.post1) with config: model='unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit', speculative_config=None
, tokenizer='unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None,
rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.BITSANDBYTES, tensor_pa
rallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=bitsandbytes, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, d
evice_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forw
ard_time=False, collect_model_execute_time=False), seed=0, served_model_name=unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit, num_scheduler_steps=1, chunked_prefill_enabled=F
alse multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=False, use_cached_outputs=True, mm_processor_kwargs=None)

🐛 Describe the bug

AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet

I was trying the Llama-3.2-90B-Vision-Instruct-bnb-4bit model, it shows such an error. Not sure which place is better to raise this issue, unsloth or transformers or just here.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@CyrusCY CyrusCY added the bug Something isn't working label Oct 26, 2024
@DarkLight1337
Copy link
Member

cc @mgoin

@DarkLight1337 DarkLight1337 added feature request and removed bug Something isn't working labels Oct 26, 2024
@DarkLight1337 DarkLight1337 changed the title [Bug]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet [Feature]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet Oct 26, 2024
@Isotr0py Isotr0py linked a pull request Oct 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants