Skip to content

Commit

Permalink
Merge branch 'master' into add_vlm_examples
Browse files Browse the repository at this point in the history
  • Loading branch information
WeiweiZhang1 authored Oct 15, 2024
2 parents 5dcb9bd + d9377b8 commit ce514db
Show file tree
Hide file tree
Showing 28 changed files with 1,445 additions and 358 deletions.
44 changes: 0 additions & 44 deletions .azure-pipelines/scripts/ut/run_itrex.sh

This file was deleted.

35 changes: 0 additions & 35 deletions .azure-pipelines/ut-itrex.yml

This file was deleted.

13 changes: 0 additions & 13 deletions .github/checkgroup.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,19 +78,6 @@ subprojects:
- "UT-Basic (Unit Test other basic case Test other basic case)"
- "UT-Basic (Unit Test other cases baseline Test other cases baseline)"

- id: "Unit Tests ITREX workflow"
paths:
- "neural_compressor/**"
- "setup.py"
- "requirements.txt"
- ".azure-pipelines/scripts/ut/run_itrex.sh"
- ".azure-pipelines/ut-itrex.yml"
- "!neural_compressor/common/**"
- "!neural_compressor/torch/**"
- "!neural_compressor/tensorflow/**"
checks:
- "UT-ITREX"

- id: "Unit Tests 3x-TensorFlow workflow"
paths:
- "neural_compressor/common/**"
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ repos:
)$
- repo: https://github.com/PyCQA/docformatter
rev: v1.7.5
rev: 06907d0
hooks:
- id: docformatter
args: [
Expand Down
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ support AMD CPU, ARM CPU, and NVidia GPU through ONNX Runtime with limited testi
* Collaborate with cloud marketplaces such as [Google Cloud Platform](https://console.cloud.google.com/marketplace/product/bitnami-launchpad/inc-tensorflow-intel?project=verdant-sensor-286207), [Amazon Web Services](https://aws.amazon.com/marketplace/pp/prodview-yjyh2xmggbmga#pdp-support), and [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/bitnami.inc-tensorflow-intel), software platforms such as [Alibaba Cloud](https://www.intel.com/content/www/us/en/developer/articles/technical/quantize-ai-by-oneapi-analytics-on-alibaba-cloud.html), [Tencent TACO](https://new.qq.com/rain/a/20221202A00B9S00) and [Microsoft Olive](https://github.com/microsoft/Olive), and open AI ecosystem such as [Hugging Face](https://huggingface.co/blog/intel), [PyTorch](https://pytorch.org/tutorials/recipes/intel_neural_compressor_for_pytorch.html), [ONNX](https://github.com/onnx/models#models), [ONNX Runtime](https://github.com/microsoft/onnxruntime), and [Lightning AI](https://github.com/Lightning-AI/lightning/blob/master/docs/source-pytorch/advanced/post_training_quantization.rst)

## What's New
* [2024/10] [Transformers-like API](./docs/source/3x/transformers_like_api.md) for INT4 inference on Intel CPU and GPU.
* [2024/07] From 3.0 release, framework extension API is recommended to be used for quantization.
* [2024/07] Performance optimizations and usability improvements on [client-side](./docs/source/3x/client_quant.md).

Expand Down Expand Up @@ -164,6 +165,16 @@ Intel Neural Compressor will convert the model format from auto-gptq to hpu form
<td colspan="2" align="center"><a href="./docs/source/3x/TF_SQ.md">Smooth Quantization</a></td>
</tr>
</tbody>
<thead>
<tr>
<th colspan="8">Transformers-like APIs</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="8" align="center"><a href="./docs/source/3x/transformers_like_api.md">Overview</a></td>
</tr>
</tbody>
<thead>
<tr>
<th colspan="8">Other Modules</th>
Expand Down
2 changes: 2 additions & 0 deletions docs/source/3x/transformers_like_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,8 @@ python run_generation_gpu_woq.py --woq --benchmark --model save_dir
>Note:
> * Saving quantized model should be executed before the optimize_transformers function is called.
> * The optimize_transformers function is designed to optimize transformer-based models within frontend Python modules, with a particular focus on Large Language Models (LLMs). It provides optimizations for both model-wise and content-generation-wise. The detail of `optimize_transformers`, please refer to [the link](https://github.com/intel/intel-extension-for-pytorch/blob/xpu-main/docs/tutorials/llm/llm_optimize_transformers.md).
>* The quantization process is performed on the CPU accelerator by default. Users can override this setting by specifying the environment variable `INC_TARGET_DEVICE`. Usage on bash: ```export INC_TARGET_DEVICE=xpu```.
>* For Linux systems, users need to configure the environment variables appropriately to achieve optimal performance. For example, set the OMP_NUM_THREADS explicitly. For processors with hybrid architecture (including both P-cores and E-cores), it is recommended to bind tasks to all P-cores using taskset.
## Examples

Expand Down
Loading

0 comments on commit ce514db

Please sign in to comment.