Merge branch 'master' into add_vlm_examples

intel · Oct 15, 2024 · ce514db · ce514db
2 parents 5dcb9bd + d9377b8
commit ce514db
Show file tree

Hide file tree

Showing 28 changed files with 1,445 additions and 358 deletions.
diff --git a/.azure-pipelines/scripts/ut/run_itrex.sh b/.azure-pipelines/scripts/ut/run_itrex.sh
diff --git a/.azure-pipelines/ut-itrex.yml b/.azure-pipelines/ut-itrex.yml
diff --git a/.github/checkgroup.yml b/.github/checkgroup.yml
@@ -78,19 +78,6 @@ subprojects:
       - "UT-Basic (Unit Test other basic case Test other basic case)"
       - "UT-Basic (Unit Test other cases baseline Test other cases baseline)"
 
-  - id: "Unit Tests ITREX workflow"
-    paths:
-      - "neural_compressor/**"
-      - "setup.py"
-      - "requirements.txt"
-      - ".azure-pipelines/scripts/ut/run_itrex.sh"
-      - ".azure-pipelines/ut-itrex.yml"
-      - "!neural_compressor/common/**"
-      - "!neural_compressor/torch/**"
-      - "!neural_compressor/tensorflow/**"
-    checks:
-      - "UT-ITREX"
-
   - id: "Unit Tests 3x-TensorFlow workflow"
     paths:
       - "neural_compressor/common/**"

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -76,7 +76,7 @@ repos:
           )$
 
   - repo: https://github.com/PyCQA/docformatter
-    rev: v1.7.5
+    rev: 06907d0
     hooks:
       - id: docformatter
         args: [

diff --git a/README.md b/README.md
@@ -27,6 +27,7 @@ support AMD CPU, ARM CPU, and NVidia GPU through ONNX Runtime with limited testi
 * Collaborate with cloud marketplaces such as [Google Cloud Platform](https://console.cloud.google.com/marketplace/product/bitnami-launchpad/inc-tensorflow-intel?project=verdant-sensor-286207), [Amazon Web Services](https://aws.amazon.com/marketplace/pp/prodview-yjyh2xmggbmga#pdp-support), and [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/bitnami.inc-tensorflow-intel), software platforms such as [Alibaba Cloud](https://www.intel.com/content/www/us/en/developer/articles/technical/quantize-ai-by-oneapi-analytics-on-alibaba-cloud.html), [Tencent TACO](https://new.qq.com/rain/a/20221202A00B9S00) and [Microsoft Olive](https://github.com/microsoft/Olive), and open AI ecosystem such as [Hugging Face](https://huggingface.co/blog/intel), [PyTorch](https://pytorch.org/tutorials/recipes/intel_neural_compressor_for_pytorch.html), [ONNX](https://github.com/onnx/models#models), [ONNX Runtime](https://github.com/microsoft/onnxruntime), and [Lightning AI](https://github.com/Lightning-AI/lightning/blob/master/docs/source-pytorch/advanced/post_training_quantization.rst)
 
 ## What's New
+* [2024/10] [Transformers-like API](./docs/source/3x/transformers_like_api.md) for INT4 inference on Intel CPU and GPU.
 * [2024/07] From 3.0 release, framework extension API is recommended to be used for quantization.
 * [2024/07] Performance optimizations and usability improvements on [client-side](./docs/source/3x/client_quant.md).
 
@@ -164,6 +165,16 @@ Intel Neural Compressor will convert the model format from auto-gptq to hpu form
           <td colspan="2" align="center"><a href="./docs/source/3x/TF_SQ.md">Smooth Quantization</a></td>
       </tr>
   </tbody>
+  <thead>
+      <tr>
+        <th colspan="8">Transformers-like APIs</th>
+      </tr>
+  </thead>
+  <tbody>
+      <tr>
+          <td colspan="8" align="center"><a href="./docs/source/3x/transformers_like_api.md">Overview</a></td>
+      </tr>
+  </tbody>
   <thead>
       <tr>
         <th colspan="8">Other Modules</th>

diff --git a/docs/source/3x/transformers_like_api.md b/docs/source/3x/transformers_like_api.md
@@ -208,6 +208,8 @@ python run_generation_gpu_woq.py --woq --benchmark --model save_dir
 >Note:
 > * Saving quantized model should be executed before the optimize_transformers function is called.
 > * The optimize_transformers function is designed to optimize transformer-based models within frontend Python modules, with a particular focus on Large Language Models (LLMs). It provides optimizations for both model-wise and content-generation-wise. The detail of `optimize_transformers`, please refer to [the link](https://github.com/intel/intel-extension-for-pytorch/blob/xpu-main/docs/tutorials/llm/llm_optimize_transformers.md).
+>* The quantization process is performed on the CPU accelerator by default. Users can override this setting by specifying the environment variable `INC_TARGET_DEVICE`. Usage on bash: ```export INC_TARGET_DEVICE=xpu```.
+>* For Linux systems, users need to configure the environment variables appropriately to achieve optimal performance. For example, set the OMP_NUM_THREADS explicitly. For processors with hybrid architecture (including both P-cores and E-cores), it is recommended to bind tasks to all P-cores using taskset.
 
 ## Examples