Skip to content

Intel Neural Compressor Release 3.1

Latest
Compare
Choose a tag to compare
@chensuyue chensuyue released this 25 Oct 08:18
· 2 commits to master since this release
  • Highlights
  • Features
  • Improvements
  • Validated Hardware
  • Validated Configurations

Highlights

  • Aligned with Habana 1.18 release with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
  • Provided for weight-only quantization on LLM, which offers transformer-based user one-stop experience for quantization & inference with IPEX on Intel GPU and CPU.

Features

  • Add Transformer-like quantization API for weight-only quantization on LLM
  • Support fast quantization with light weight recipe and layer-wise approach on Intel AI PC
  • Support INT4 quantization of Visual Language Model (VLM), like Llava, Phi-3-vision, Qwen-VL with AutoRound algorithm

Improvements

  • Support AWQ format INT4 model loading and converting for IPEX inference in Transformer-like API
  • Enable auto-round format export for INT4 model
  • Support per-channel INT8 Post Training Quantization for PT2E

Validated Hardware

  • Intel Gaudi Al Accelerators (Gaudi 2 and 3)
  • Intel Xeon Scalable processor (4th, 5th, 6th Gen)
  • Intel Core Ultra Processors (Series 1 and 2)
  • Intel Data Center GPU Max Series (1100)

Validated Configurations

  • Centos 8.4 & Ubuntu 22.04 & Win 11
  • Python 3.9, 3.10, 3.11, 3.12
  • PyTorch/IPEX 2.2, 2.3, 2.4