Squashed commit of the following:

commit 2bb257e Author: Kaihui-intel <kaihui.tang@intel.com> Date: Thu Oct 10 19:27:11 2024 +0800 Add woq examples (#1982) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> commit 586eb88 Author: Huang, Tai <tai.huang@intel.com> Date: Wed Oct 9 09:22:39 2024 +0800 add transformers-like api link in readme (#2022) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 4e9c764 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Oct 8 13:13:45 2024 +0800 Remove itrex dependency for 3x example (#2016) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> commit a0066d4 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Sep 30 18:17:32 2024 +0800 Fix transformers rtn layer-wise quant (#2008) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 802a5af Author: Huang, Tai <tai.huang@intel.com> Date: Mon Sep 30 17:02:52 2024 +0800 add autoround EMNLP24 to pub list (#2014) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 44795a1 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Sep 30 16:55:22 2024 +0800 Adapt transformers 4.45.1 (#2019) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: changwangss <chang1.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit d4662ad Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Sep 30 15:52:17 2024 +0800 Add transformers-like api doc (#2018) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 72398b6 Author: Wang, Chang <chang1.wang@intel.com> Date: Fri Sep 27 15:11:04 2024 +0800 fix xpu device set weight and bias (#2010) Signed-off-by: changwangss <chang1.wang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> commit 9d27743 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Fri Sep 27 14:17:24 2024 +0800 Update model accuracy (#2006) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit 7bbc473 Author: xinhe <xin3.he@intel.com> Date: Fri Sep 27 11:47:00 2024 +0800 add pad_to_buckets in evaluation for hpu performance (#2011) * add pad_to_buckets in evaluation for hpu performance --------- Signed-off-by: xin3he <xin3.he@intel.com> commit b6b7d7c Author: Kaihui-intel <kaihui.tang@intel.com> Date: Thu Sep 26 17:21:54 2024 +0800 Update auto_round requirements for transformers example (#2013) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit ee600ba Author: Wang, Chang <chang1.wang@intel.com> Date: Fri Sep 20 13:54:06 2024 +0800 add repack_awq_to_optimum_format function (#1998) Signed-off-by: changwangss <chang1.wang@intel.com> commit 4ee6861 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Thu Sep 19 22:27:05 2024 +0800 remove accelerate version in unit test (#2007) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit 2445811 Author: WeiweiZhang1 <weiwei1.zhang@intel.com> Date: Sat Sep 14 18:13:30 2024 +0800 enable auto_round format export (#2002) Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> commit 906333a Author: Kaihui-intel <kaihui.tang@intel.com> Date: Sat Sep 14 16:17:46 2024 +0800 Replace FORCE_DEVICE with INC_TARGET_DEVICE [transformers] (#2005) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 443d007 Author: xinhe <xin3.he@intel.com> Date: Fri Sep 13 21:35:32 2024 +0800 add INC_FORCE_DEVICE introduction (#1988) * add INC_FORCE_DEVICE introduction Signed-off-by: xin3he <xin3.he@intel.com> * Update PyTorch.md * Update PyTorch.md * Update docs/source/3x/PyTorch.md Co-authored-by: Yi Liu <yi4.liu@intel.com> * rename to INC_TARGET_DEVICE Signed-off-by: xin3he <xin3.he@intel.com> --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: Yi Liu <yi4.liu@intel.com> commit 5de9a4f Author: Kaihui-intel <kaihui.tang@intel.com> Date: Fri Sep 13 20:48:22 2024 +0800 Support transformers-like api for woq quantization (#1987) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Wang, Chang <chang1.wang@intel.com> commit 9c39b42 Author: chen, suyue <suyue.chen@intel.com> Date: Thu Sep 12 14:34:49 2024 +0800 update docker image prune rules (#2003) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 09d4f2d Author: Huang, Tai <tai.huang@intel.com> Date: Mon Sep 9 09:24:35 2024 +0800 Add recent publications (#1995) * add recent publications Signed-off-by: Huang, Tai <tai.huang@intel.com> * update total count Signed-off-by: Huang, Tai <tai.huang@intel.com> --------- Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 399cd44 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Sep 3 16:37:09 2024 +0800 Remove the save of gptq config (#1993) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 05272c4 Author: Yi Liu <yi4.liu@intel.com> Date: Tue Sep 3 10:21:51 2024 +0800 add per_channel_minmax (#1990) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 82d8c06 Author: chen, suyue <suyue.chen@intel.com> Date: Fri Aug 30 21:21:00 2024 +0800 update 3x pt binary build (#1992) Signed-off-by: chensuyue <suyue.chen@intel.com> commit e9f06af Author: Huang, Tai <tai.huang@intel.com> Date: Fri Aug 30 17:49:48 2024 +0800 Update installation_guide.md (#1989) Correct typo in installation doc commit 093c966 Author: Wang, Chang <chang1.wang@intel.com> Date: Fri Aug 30 17:45:54 2024 +0800 add quantize, save, load function for transformers-like api (#1986) Signed-off-by: changwangss <chang1.wang@intel.com> commit 4dd49a4 Author: xinhe <xin3.he@intel.com> Date: Thu Aug 29 17:23:18 2024 +0800 add hasattr check for torch fp8 dtype (#1985) Signed-off-by: xin3he <xin3.he@intel.com> commit f2c454f Author: chen, suyue <suyue.chen@intel.com> Date: Thu Aug 29 13:45:39 2024 +0800 update installation and ci test for 3x api (#1991) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 7ba9fdc Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Aug 19 14:50:50 2024 +0800 support gptq `true_sequential` and `quant_lm_head` (#1977) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 68b1f8b Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Fri Aug 16 09:43:46 2024 +0800 Fix UT env and upgrade torch to 2.4.0 (#1978) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit f9dfd54 Author: Yi Liu <yi4.liu@intel.com> Date: Thu Aug 15 14:13:26 2024 +0800 Skip some tests for torch 2.4 (#1981) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 46d9192 Author: xinhe <xin3.he@intel.com> Date: Thu Aug 15 09:57:22 2024 +0800 update readme for fp8 (#1979) Signed-off-by: xinhe3 <xinhe3@habana.ai> commit 842b715 Author: chen, suyue <suyue.chen@intel.com> Date: Tue Aug 13 12:09:25 2024 +0800 bump main version into v3.1 (#1974) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 3845cdc Author: Neo Zhang Jianyu <jianyu.zhang@intel.com> Date: Tue Aug 13 12:09:09 2024 +0800 fix online doc search issue (#1975) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> commit 7056720 Author: chen, suyue <suyue.chen@intel.com> Date: Sun Aug 11 20:58:34 2024 +0800 update main page (#1973) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 95197d1 Author: xinhe <xin3.he@intel.com> Date: Sat Aug 10 23:28:43 2024 +0800 Cherry pick v1.17.0 (#1964) * [SW-184941] INC CI, CD and Promotion Change-Id: I60c420f9776e1bdab7bb9e02e5bcbdb6891bfe52 * [SW-183320]updated setup.py Change-Id: I592af89486cb1d9e0b5197521c428920197a9103 * [SW-177474] add HQT FP8 porting code Change-Id: I4676f13a5ed43c444f2ec68675cc41335e7234dd Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> * [SW-189361] Fix white list extend Change-Id: Ic2021c248798fce37710d28014a6d59259c868a3 * [SW-191317] Raise exception according to hqt config object Change-Id: I06ba8fa912c811c88912987c11e5c12ef328348a * [SW-184714] Port HQT code into INC HQT lib content was copied as is under fp8_quant Tests were copied to 3.x torch location Change-Id: Iec6e1fa7ac4bf1df1c95b429524c40e32bc13ac9 * [SW-184714] Add internal folder to fp8 quant This is a folder used for experiments, not to be used by users Change-Id: I9e221ae582794e304e95392c0f37638f7bce69bc * [SW-177468] Removed unused code + cleanup Change-Id: I4d27c067e87c1a30eb1da9df16a16c46d092c638 * Fix errors in regression_detection Change-Id: Iee5318bd5593ba349812516eb5641958ece3c438 * [SW-187731] Save orig module as member of patched module This allows direct usage of the original module methods, which solves torch compile issue Change-Id: I464d8bd1bacdfc3cd1f128a67114e1e43f092632 * [SW-190899] Install packages according to configuration Change-Id: I570b490658f5d2c5399ba1db93f8f52f56449525 * [SW-184689] use finalize_calibration intrenaly for one step flow Change-Id: Ie0b8b426c951cf57ed7e6e678c86813fb2d05c89 * [SW-191945] align requirement_pt.txt in gerrit INC with Github INC Change-Id: If5c0dbf21bf989af37a8e29246e4f8760cd215ef Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [SW-192358] Remove HQT reference in INC Change-Id: Ic25f9323486596fa2dc6d909cd568a37ab84dd5e * [SW-191415] update fp8 maxAbs observer using torch.copy_ Change-Id: I3923c832f9a8a2b14e392f3f4719d233a457702f * [SW-184943] Enhance INC WOQ model loading - Support loading huggingface WOQ model - Abstract WeightOnlyLinear base class. Add INCWeightOnlyLinear and HPUWeighOnlyLinear subclasses - Load woq linear weight module by module - Save hpu format tensor to reuse it once load it again Change-Id: I679a42759b49e1f45f52bbb0bdae8580a23d0bcf * [SW-190303] Implement HPUWeightOnlyLinear class in INC Change-Id: Ie05c8787e708e2c3559dce24ef0758d6c498ac41 * [SW-192809] fix json_file bug when instantiating FP8Config class Change-Id: I4a715d0a706efe20ccdb49033755cabbc729ccdc Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> * [SW-192931] align setup.py with github INC and remove fp8_convert Change-Id: Ibbc157646cfcfad64b323ecfd96b9bbda5ba9e2f Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [SW-192917] Update all HQT logic files with pre-commit check Change-Id: I119dc8578cb10932fd1a8a674a8bdbf61f978e42 Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update docstring Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * add fp8 example and document (#1639) Signed-off-by: xinhe3 <xinhe3@hababa.ai> * Update settings to be compatible with gerrit * enhance ut Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * move fp8 sample to helloworld folder Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * update torch version of habana docker Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update readme demo Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update WeightOnlyLinear to INCWeightOnlyLinear Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add docstring for FP8Config Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix pylint Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update fp8 test scripts Signed-off-by: chensuyue <suyue.chen@intel.com> * delete deps Signed-off-by: chensuyue <suyue.chen@intel.com> * update container into v1.17.0 Signed-off-by: chensuyue <suyue.chen@intel.com> * update docker version Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update pt ut Signed-off-by: chensuyue <suyue.chen@intel.com> * add lib path Signed-off-by: chensuyue <suyue.chen@intel.com> * fix dir issue Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update fp8 test scope Signed-off-by: chensuyue <suyue.chen@intel.com> * fix typo Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update fp8 test scope Signed-off-by: chensuyue <suyue.chen@intel.com> * update pre-commit-ci Signed-off-by: chensuyue <suyue.chen@intel.com> * work around for hpu Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix UT Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix parameter Signed-off-by: chensuyue <suyue.chen@intel.com> * omit some test Signed-off-by: chensuyue <suyue.chen@intel.com> * update main page example to llm loading Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix autotune Signed-off-by: xinhe3 <xinhe3@hababa.ai> --------- Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> Signed-off-by: xinhe3 <xinhe3@hababa.ai> Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Ron Ben Moshe <rbenmoshe@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Danny Semiat <dsemiat@habana.ai> Co-authored-by: smarkovichgolan <smarkovich@habana.ai> Co-authored-by: Dudi Lester <dlester@habana.ai> commit de0fa21 Author: Huang, Tai <tai.huang@intel.com> Date: Fri Aug 9 22:32:37 2024 +0800 Fix broken link in docs (#1969) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 385da7c Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Fri Aug 9 21:53:51 2024 +0800 Add 3.x readme (#1971) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit acd8f4f Author: Huang, Tai <tai.huang@intel.com> Date: Fri Aug 9 15:24:14 2024 +0800 Add version mapping between INC and Gaudi SW Stack (#1967) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 74a4641 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Fri Aug 9 10:23:59 2024 +0800 remove unnecessary CI (#1966) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit b99abae Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Aug 6 16:02:03 2024 +0800 Fix `opt_125m_woq_gptq_int4_dq_ggml` issue (#1965) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit b35ff8f Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Fri Aug 2 09:06:35 2024 +0800 example update for 3.x ipex sq (#1902) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> commit 000946f Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Thu Aug 1 10:19:32 2024 +0800 add SDXL model example to INC 3.x (#1887) * add SDXL model example to INC 3.x Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * add evaluation script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * add test script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * minor fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update run_quant.sh * add iter limit Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * modify test script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * update json Signed-off-by: chensuyue <suyue.chen@intel.com> * add requirements Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update run_benchmark.sh * Update sdxl_smooth_quant.py * minor fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Co-authored-by: chensuyue <suyue.chen@intel.com> commit aa42e5e Author: xinhe <xin3.he@intel.com> Date: Wed Jul 31 15:36:06 2024 +0800 replenish docstring (#1955) * replenish docstring Signed-off-by: xin3he <xin3.he@intel.com> * update Quantizer API docstring Signed-off-by: xin3he <xin3.he@intel.com> * Add docstring for auto accelerator (#1956) Signed-off-by: yiliu30 <yi4.liu@intel.com> * temporary remove torch/quantization and add it back after fp8 code is updated. * Update config.py --------- Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: Yi Liu <106061964+yiliu30@users.noreply.github.com> commit 81a076d Author: Neo Zhang Jianyu <jianyu.zhang@intel.com> Date: Wed Jul 31 13:51:33 2024 +0800 fix welcome.html link issue (#1962) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> commit 87f02c1 Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 31 10:09:47 2024 +0800 fix docs link (#1959) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 03813e2 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed Jul 31 10:09:29 2024 +0800 Bump tensorflow version (#1961) Signed-off-by: dependabot[bot] <support@github.com> commit 3b5dbf6 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 30 17:27:21 2024 +0800 Set low_gpu_mem_usage=False for AutoRound Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 41244d3 Author: chen, suyue <suyue.chen@intel.com> Date: Mon Jul 29 23:05:36 2024 +0800 new previous results could not find all raise issues in CI model test (#1958) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 190e6b2 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Jul 29 19:39:57 2024 +0800 Fix itrex qbits nf4/int8 training core dumped issue (#1954) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> commit 0e724a4 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Jul 29 16:22:13 2024 +0800 Add save/load for pt2e example (#1927) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 50eb6fb Author: chen, suyue <suyue.chen@intel.com> Date: Mon Jul 29 13:40:36 2024 +0800 update 3x torch installation (#1957) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 6e1b1da Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Fri Jul 26 15:58:00 2024 +0800 add ipex xpu example to 3x API (#1948) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> commit 19024b3 Author: zehao-intel <zehao.huang@intel.com> Date: Fri Jul 26 14:52:01 2024 +0800 Enable yolov5 Example for TF 3x API (#1943) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit d84a93f Author: zehao-intel <zehao.huang@intel.com> Date: Thu Jul 25 14:45:19 2024 +0800 Complement UT of calibration function for TF 3x API (#1945) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit fb85779 Author: zehao-intel <zehao.huang@intel.com> Date: Thu Jul 25 14:04:25 2024 +0800 Update Examples for TF 3x API (#1901) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit 6b30207 Author: zehao-intel <zehao.huang@intel.com> Date: Thu Jul 25 13:39:06 2024 +0800 Add Docstring for TF 3x API and Torch 3x Mixed Precision (#1944) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit d254d50 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Wed Jul 24 21:50:44 2024 +0800 Update doc for client-usage and LWQ (#1947) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit f253d35 Author: Neo Zhang Jianyu <jianyu.zhang@intel.com> Date: Wed Jul 24 17:48:05 2024 +0800 Update publish.yml (#1950) commit 6cda338 Author: Neo Zhang Jianyu <jianyu.zhang@intel.com> Date: Wed Jul 24 17:31:19 2024 +0800 Update publish.yml (#1949) * Update publish.yml * Update publish.yml commit c80b68a Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 23 21:26:53 2024 +0800 Update AutoRound commit version (#1941) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 9077b38 Author: zehao-intel <zehao.huang@intel.com> Date: Tue Jul 23 17:04:37 2024 +0800 Refine Pytorch 3x Mixed Precision Example (#1946) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit efcb293 Author: Neo Zhang Jianyu <jianyu.zhang@intel.com> Date: Tue Jul 23 10:15:41 2024 +0800 Update for API 3.0 online doc (#1940) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> commit b787940 Author: Wang, Mengni <mengni.wang@intel.com> Date: Tue Jul 23 10:12:34 2024 +0800 add docstring for mx quant (#1932) Signed-off-by: Mengni Wang <mengni.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> commit 0c52e12 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 23 09:59:17 2024 +0800 Add docstring for WOQ&LayerWise (#1938) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> commit 08914d6 Author: Huang, Tai <tai.huang@intel.com> Date: Mon Jul 22 11:14:44 2024 +0800 add read permission token (#1942) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit e106dea Author: zehao-intel <zehao.huang@intel.com> Date: Sun Jul 21 21:48:51 2024 +0800 Update Example for Pytorch 3x Mixed Precision (#1882) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit 1ebf698 Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Fri Jul 19 15:56:09 2024 +0800 add docstring for static quant and smooth quant (#1936) * add docstring for static quant and smooth quant Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * format fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * update scan path Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update utility.py --------- Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> commit 296c5d4 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Fri Jul 19 15:08:05 2024 +0800 Add docstring for PT2E and HQQ (#1937) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 437c8e7 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Thu Jul 18 10:00:41 2024 +0800 Fix unused pkgs import (#1931) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit ff37401 Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 17 23:11:15 2024 +0800 3.X API installation update (#1935) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 6c27c19 Author: zehao-intel <zehao.huang@intel.com> Date: Wed Jul 17 20:35:42 2024 +0800 Support calib_func on TF 3x API (#1934) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit 53e6ee6 Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Wed Jul 17 20:35:03 2024 +0800 Support xpu for ipex static quant (#1916) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> commit a1cc618 Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 17 17:29:49 2024 +0800 remove peft version limit (#1933) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 3058388 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Wed Jul 17 15:31:38 2024 +0800 Add doc for client usage (#1914) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 29471df Author: Kaihui-intel <kaihui.tang@intel.com> Date: Wed Jul 17 12:12:40 2024 +0800 Enhance load_empty_model import (#1930) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit fd96851 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Wed Jul 17 12:05:32 2024 +0800 Integrate AutoRound v0.3 to 2x (#1926) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit bfa27e4 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Wed Jul 17 09:33:13 2024 +0800 Integrate AutoRound v0.3 (#1925) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 5767aed Author: xinhe <xin3.he@intel.com> Date: Wed Jul 17 09:16:37 2024 +0800 add docstring for torch.quantization and torch.utils (#1928) Signed-off-by: xin3he <xin3.he@intel.com> commit f909bca Author: chen, suyue <suyue.chen@intel.com> Date: Tue Jul 16 21:12:54 2024 +0800 update itrex ut test (#1929) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 649e6b1 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 16 21:05:55 2024 +0800 Support LayerWise for RTN/GPTQ (#1883) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: chensuyue <suyue.chen@intel.com> commit de43d85 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 16 17:18:12 2024 +0800 Support absorb dict for awq (#1920) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit e976595 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 16 17:17:56 2024 +0800 Support woq Autotune (#1921) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit d56075c Author: Huang, Tai <tai.huang@intel.com> Date: Tue Jul 16 15:21:06 2024 +0800 fix typo in architecture diagram (#1924) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 0a54239 Author: chen, suyue <suyue.chen@intel.com> Date: Tue Jul 16 15:12:43 2024 +0800 update documentation for 3x API (#1923) Signed-off-by: chensuyue <suyue.chen@intel.com> Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> commit be42d03 Author: xinhe <xin3.he@intel.com> Date: Tue Jul 16 09:48:48 2024 +0800 implement TorchBaseConfig (#1911) Signed-off-by: xin3he <xin3.he@intel.com> commit 7a4715c Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Jul 15 14:59:03 2024 +0800 Support PT2E save and load (#1918) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 34f0a9f Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Mon Jul 15 09:10:14 2024 +0800 Add `save`/`load` support for HQQ (#1913) Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> commit d320460 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Fri Jul 12 14:48:12 2024 +0800 remove 1x docs (#1900) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 6c547f7 Author: chen, suyue <suyue.chen@intel.com> Date: Fri Jul 12 14:42:04 2024 +0800 fix CI docker container clean up issue (#1917) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 1703658 Author: chen, suyue <suyue.chen@intel.com> Date: Fri Jul 12 11:14:48 2024 +0800 Remove deprecated modules (#1872) Signed-off-by: chensuyue <suyue.chen@intel.com> commit f698c96 Author: chen, suyue <suyue.chen@intel.com> Date: Thu Jul 11 18:00:28 2024 +0800 update Gaudi CI baseline artifacts name (#1912) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 4a45093 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Thu Jul 11 17:47:47 2024 +0800 Add export support for TEQ (#1910) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 16a7b11 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Thu Jul 11 17:13:24 2024 +0800 Get default config based on the auto-detect CPU type (#1904) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 2fc7255 Author: xinhe <xin3.he@intel.com> Date: Thu Jul 11 13:22:52 2024 +0800 implement `incbench` command for ease-of-use benchmark (#1884) implement incbench command as entrypoint for ease-of-use benchmark automatically check numa/socket info and dump it with table for ease-of-understand supports both Linux and Windows platform add benchmark documents dump benchmark summary add benchmark UTs incbench main.py: run 1 instance on NUMA:0. incbench --num_i 2 main.py: run 2 instances on NUMA:0. incbench --num_c 2 main.py: run multi-instances with 2 cores per instance on NUMA:0. incbench -C 24-47 main.py: run 1 instance on COREs:24-47. incbench -C 24-47 --num_c 4 main.py: run multi-instances with 4 COREs per instance on COREs:24-47. --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> commit de8577e Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 10 17:21:45 2024 +0800 bump version into 3.0 (#1908) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 01f16c4 Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 10 17:19:57 2024 +0800 support habana fp8 UT test in CI (#1909) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 28578b9 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Wed Jul 10 13:19:27 2024 +0800 Add docstring for `common` module (#1905) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 5fde50f Author: Wang, Chang <chang1.wang@intel.com> Date: Wed Jul 10 10:34:46 2024 +0800 update fp4_e2m1 mapping list (#1906) * update fp4_e2m1 mapping list * Update utility.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 3fe2fd9 Author: xinhe <xin3.he@intel.com> Date: Tue Jul 9 15:01:25 2024 +0800 fix bf16 symbolic_trace bug (#1892) Description: fix bf16 symbolic_trace bug, - cause abnormal recursive calling. - missing necessary attributes - By moving BF16 fallback ahead of quantization and removing bf16_symbolic_trace, we fix it. --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> commit e080e06 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Tue Jul 9 11:04:30 2024 +0800 remove neural insight CI (#1903) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit f28fcee Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Fri Jul 5 15:47:37 2024 +0800 Remove 1x API (#1865) Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> commit 1386ac5 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Thu Jul 4 12:18:03 2024 +0800 Port auto-detect absorb layers for TEQ (#1895) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 856118e Author: Wang, Chang <chang1.wang@intel.com> Date: Wed Jul 3 13:50:00 2024 +0800 remove import pdb (#1897) Signed-off-by: changwangss <chang1.wang@intel.com> commit f75ff40 Author: xinhe <xin3.he@intel.com> Date: Wed Jul 3 13:07:48 2024 +0800 support auto_host2device on RTN and GPTQ(#1894) Signed-off-by: He, Xin3 <xin3.he@intel.com> commit b9e73f5 Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 3 11:10:45 2024 +0800 tmp fix nas deps issue (#1896) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 63b2912 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Tue Jul 2 14:46:02 2024 +0800 Refine HQQ UTs (#1888) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 5592acc Author: zehao-intel <zehao.huang@intel.com> Date: Tue Jul 2 14:18:51 2024 +0800 Remove Gelu Fusion for TF Newapi (#1886) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit 4372a76 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Fri Jun 28 14:55:10 2024 +0800 Fix sql injection for Neural Solution gRPC (#1879) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 4ae2e87 Author: xinhe <xin3.he@intel.com> Date: Thu Jun 27 09:56:52 2024 +0800 support quant_lm_head arg in all WOQ configs (#1881) Signed-off-by: xin3he <xin3.he@intel.com> commit cc763f5 Author: Dina Suehiro Jones <dina.s.jones@intel.com> Date: Wed Jun 26 18:29:06 2024 -0700 Update the Gaudi container example in the README (#1885) commit 1f58f02 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Thu Jun 20 22:03:45 2024 +0800 Add `set_local` support for static quant with pt2e (#1870) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 0341295 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Wed Jun 19 09:40:11 2024 +0800 rm cov (#1878) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 503d9ef Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jun 18 17:12:12 2024 +0800 Add op statistics dump for woq (#1876) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 5a0374e Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Tue Jun 18 16:21:05 2024 +0800 Enhance autotune to return the best `q_model` directly (#1875) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 90fb431 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jun 18 16:06:04 2024 +0800 fix layer match (#1873) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> commit f4eb660 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Mon Jun 17 16:12:06 2024 +0800 Limit numpy versions (#1874) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit 2928d85 Author: chen, suyue <suyue.chen@intel.com> Date: Fri Jun 14 21:51:13 2024 +0800 update v2.6 release readme (#1871) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 48c5e3a Author: Kaihui-intel <kaihui.tang@intel.com> Date: Fri Jun 14 21:10:14 2024 +0800 Modify WOQ examples structure (#1866) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> commit 498af74 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Fri Jun 14 21:09:36 2024 +0800 Update SQ/WOQ status (#1869) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> commit b401b02 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Fri Jun 14 17:48:03 2024 +0800 Add PT2E cv&llm example (#1853) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit e470f6c Author: xinhe <xin3.he@intel.com> Date: Fri Jun 14 17:34:26 2024 +0800 [3x] add recommendation examples (#1844) Signed-off-by: xin3he <xin3.he@intel.com> commit a141512 Author: zehao-intel <zehao.huang@intel.com> Date: Fri Jun 14 14:56:30 2024 +0800 Improve UT Branch Coverage for TF 3x (#1867) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit b99a79d Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Fri Jun 14 14:10:49 2024 +0800 modify 3.x ipex example structure (#1858) * modify 3.x ipex example structure Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * add json path Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * fix for sq Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * minor fix Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * Update run_clm_no_trainer.py * Update run_clm_no_trainer.py * Update run_clm_no_trainer.py * minor fix Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * remove old files Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * fix act_algo Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> Co-authored-by: xinhe <xin3.he@intel.com> commit 922b247 Author: zehao-intel <zehao.huang@intel.com> Date: Fri Jun 14 12:33:39 2024 +0800 Add TF 3x Examples (#1839) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit 70a1d50 Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Fri Jun 14 10:17:33 2024 +0800 fix 3x ipex static quant regression (#1864) Description fix 3x ipex static quant regression cannot fallback with op type name ('linear') dump wrong op stats (no 'Linear&relu' op type) --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> commit 4e45f8f Author: zehao-intel <zehao.huang@intel.com> Date: Fri Jun 14 10:04:11 2024 +0800 Improve UT Coverage for TF 3x (#1852) Signed-off-by: zehao-intel <zehao.huang@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> commit 794b276 Author: xinhe <xin3.he@intel.com> Date: Thu Jun 13 18:02:04 2024 +0800 migrate export to 2x and 3x from deprecated (#1845) Signed-off-by: xin3he <xin3.he@intel.com> commit 0eced14 Author: yuwenzho <yuwen.zhou@intel.com> Date: Wed Jun 12 18:49:17 2024 -0700 Enhance INC WOQ model loading & support Huggingface WOQ model loading (#1826) Signed-off-by: yuwenzho <yuwen.zhou@intel.com> commit 6733dab Author: Wang, Mengni <mengni.wang@intel.com> Date: Wed Jun 12 17:08:31 2024 +0800 update mx script (#1838) Signed-off-by: Mengni Wang <mengni.wang@intel.com> commit a0dee94 Author: Wang, Chang <chang1.wang@intel.com> Date: Wed Jun 12 15:01:25 2024 +0800 Remove export_compressed_model in AWQConfig (#1831) commit 2c3556d Author: Huang, Tai <tai.huang@intel.com> Date: Wed Jun 12 14:46:14 2024 +0800 Add 3x architecture diagram (#1849) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 0e2cade Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed Jun 12 14:20:06 2024 +0800 Bump braces from 3.0.2 to 3.0.3 in /neural_insights/gui (#1862) Signed-off-by: dependabot[bot] <support@github.com> commit 5b5579b Author: Kaihui-intel <kaihui.tang@intel.com> Date: Wed Jun 12 14:12:00 2024 +0800 Fix Neural Solution security issue (#1856) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit e9cb48c Author: xinhe <xin3.he@intel.com> Date: Wed Jun 12 11:19:47 2024 +0800 improve UT coverage of PT Utils and Quantization (#1842) * update UTs --------- Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: xinhe3 <xinhe3@habana.ai> commit 6b27383 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Wed Jun 12 11:11:50 2024 +0800 Fix config expansion with empty options (#1861) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 25c71aa Author: WenjiaoYue <wenjiao.yue@intel.com> Date: Tue Jun 11 17:54:31 2024 +0800 Delete the static resources of the JupyterLab extension after packaging (#1860) Signed-off-by: Yue, Wenjiao <wenjiao.yue@intel.com> commit 455f1e1 Author: Wang, Mengni <mengni.wang@intel.com> Date: Tue Jun 11 15:28:40 2024 +0800 Add UT and remove unused code for torch MX quant (#1854) * Add UT and remove unused code for torch MX quant --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com> Signed-off-by: xinhe3 <xinhe3@habana.ai>
intel · Oct 11, 2024 · be20c15 · be20c15
1 parent 23fe77e
commit be20c15
Show file tree

Hide file tree

Showing 99 changed files with 76,801 additions and 1,750 deletions.
diff --git a/.azure-pipelines/scripts/fwk_version.sh b/.azure-pipelines/scripts/fwk_version.sh
@@ -2,9 +2,9 @@
 
 echo "export FWs version..."
 export tensorflow_version='2.15.0-official'
-export pytorch_version='2.3.0+cpu'
-export torchvision_version='0.18.0+cpu'
-export ipex_version='2.3.0+cpu'
+export pytorch_version='2.4.0+cpu'
+export torchvision_version='0.19.0'
+export ipex_version='2.4.0+cpu'
 export onnx_version='1.16.0'
 export onnxruntime_version='1.18.0'
 export mxnet_version='1.9.1'
diff --git a/.azure-pipelines/scripts/install_nc.sh b/.azure-pipelines/scripts/install_nc.sh
@@ -3,16 +3,21 @@
 echo -e "\n Install Neural Compressor ... "
 cd /neural-compressor
 if [[ $1 = *"3x_pt"* ]]; then
-    if [[ $1 != *"3x_pt_fp8"* ]]; then
+    python -m pip install --no-cache-dir -r requirements_pt.txt
+    if [[ $1 = *"3x_pt_fp8"* ]]; then
+        pip uninstall neural_compressor_3x_pt -y || true
+        python setup.py pt bdist_wheel
+    else
         echo -e "\n Install torch CPU ... "
-        pip install torch==2.3.0 --index-url https://download.pytorch.org/whl/cpu
+        pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cpu
+        python -m pip install --no-cache-dir -r requirements.txt
+        python setup.py bdist_wheel
     fi
-    python -m pip install --no-cache-dir -r requirements_pt.txt
-    python setup.py pt bdist_wheel
     pip install --no-deps dist/neural_compressor*.whl --force-reinstall
 elif [[ $1 = *"3x_tf"* ]]; then
+    python -m pip install --no-cache-dir -r requirements.txt
     python -m pip install --no-cache-dir -r requirements_tf.txt
-    python setup.py tf bdist_wheel
+    python setup.py bdist_wheel
     pip install dist/neural_compressor*.whl --force-reinstall
 else
     python -m pip install --no-cache-dir -r requirements.txt

diff --git a/.azure-pipelines/scripts/models/env_setup.sh b/.azure-pipelines/scripts/models/env_setup.sh
@@ -51,6 +51,10 @@ SCRIPTS_PATH="/neural-compressor/.azure-pipelines/scripts/models"
 log_dir="/neural-compressor/.azure-pipelines/scripts/models"
 if [[ "${inc_new_api}" == "3x"* ]]; then
     WORK_SOURCE_DIR="/neural-compressor/examples/3.x_api/${framework}"
+    git clone https://github.com/intel/intel-extension-for-transformers.git /itrex
+    cd /itrex
+    pip install -r requirements.txt
+    pip install -v .
 else
     WORK_SOURCE_DIR="/neural-compressor/examples/${framework}"
 fi
@@ -95,8 +99,8 @@ if [[ "${fwk_ver}" != "latest" ]]; then
             pip install intel-tensorflow==${fwk_ver}
         fi
     elif [[ "${framework}" == "pytorch" ]]; then
-        pip install torch==${fwk_ver} -f https://download.pytorch.org/whl/torch_stable.html
-        pip install torchvision==${torch_vision_ver} -f https://download.pytorch.org/whl/torch_stable.html
+        pip install torch==${fwk_ver} --index-url https://download.pytorch.org/whl/cpu
+        pip install torchvision==${torch_vision_ver} --index-url https://download.pytorch.org/whl/cpu
     elif [[ "${framework}" == "onnxrt" ]]; then
         pip install onnx==1.15.0
         pip install onnxruntime==${fwk_ver}

diff --git a/.azure-pipelines/scripts/ut/3x/run_3x_pt.sh b/.azure-pipelines/scripts/ut/3x/run_3x_pt.sh
@@ -21,7 +21,10 @@ rm -rf torch/quantization/fp8_quant
 LOG_DIR=/neural-compressor/log_dir
 mkdir -p ${LOG_DIR}
 ut_log_name=${LOG_DIR}/ut_3x_pt.log
-pytest --cov="${inc_path}" -vs --disable-warnings --html=report.html --self-contained-html . 2>&1 | tee -a ${ut_log_name}
+
+find . -name "test*.py" | sed "s,\.\/,python -m pytest --cov=\"${inc_path}\" --cov-report term --html=report.html --self-contained-html  --cov-report xml:coverage.xml --cov-append -vs --disable-warnings ,g" > run.sh
+cat run.sh
+bash run.sh 2>&1 | tee ${ut_log_name}
 
 cp report.html ${LOG_DIR}/
 

diff --git a/.azure-pipelines/scripts/ut/3x/run_3x_pt_fp8.sh b/.azure-pipelines/scripts/ut/3x/run_3x_pt_fp8.sh
@@ -7,6 +7,8 @@ echo "${test_case}"
 echo "set up UT env..."
 export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
 sed -i '/^intel_extension_for_pytorch/d' /neural-compressor/test/3x/torch/requirements.txt
+sed -i '/^auto_round/d' /neural-compressor/test/3x/torch/requirements.txt
+cat /neural-compressor/test/3x/torch/requirements.txt
 pip install -r /neural-compressor/test/3x/torch/requirements.txt
 pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.16.0
 pip install pytest-cov

diff --git a/.azure-pipelines/scripts/ut/env_setup.sh b/.azure-pipelines/scripts/ut/env_setup.sh
@@ -92,7 +92,7 @@ elif [[ $(echo "${test_case}" | grep -c "tf pruning") != 0 ]]; then
 fi
 
 if [[ $(echo "${test_case}" | grep -c "api") != 0 ]] || [[ $(echo "${test_case}" | grep -c "adaptor") != 0 ]]; then
-    pip install git+https://github.com/intel/auto-round.git@e24b9074af6cdb099e31c92eb81b7f5e9a4a244e
+    pip install git+https://github.com/intel/auto-round.git@5dd16fc34a974a8c2f5a4288ce72e61ec3b1410f
 fi
 
 # test deps

diff --git a/.azure-pipelines/scripts/ut/run_basic_pt_pruning.sh b/.azure-pipelines/scripts/ut/run_basic_pt_pruning.sh
@@ -4,9 +4,9 @@ test_case="run basic pt pruning"
 echo "${test_case}"
 
 echo "specify fwk version..."
-export pytorch_version='2.3.0+cpu'
+export pytorch_version='2.4.0+cpu'
 export torchvision_version='0.18.0+cpu'
-export ipex_version='2.3.0+cpu'
+export ipex_version='2.4.0+cpu'
 
 echo "set up UT env..."
 bash /neural-compressor/.azure-pipelines/scripts/ut/env_setup.sh "${test_case}"

diff --git a/.azure-pipelines/scripts/ut/run_itrex.sh b/.azure-pipelines/scripts/ut/run_itrex.sh
@@ -18,7 +18,8 @@ bash /intel-extension-for-transformers/.github/workflows/script/install_binary.s
 sed -i '/neural-compressor.git/d' /intel-extension-for-transformers/tests/requirements.txt
 pip install -r /intel-extension-for-transformers/tests/requirements.txt
 # workaround
-pip install onnx==1.15.0
+pip install onnx==1.16.0
+pip install onnxruntime==1.18.0
 echo "pip list itrex ut deps..."
 pip list
 LOG_DIR=/neural-compressor/log_dir

diff --git a/.azure-pipelines/template/docker-template.yml b/.azure-pipelines/template/docker-template.yml
@@ -36,19 +36,18 @@ steps:
   - ${{ if eq(parameters.dockerConfigName, 'commonDockerConfig') }}:
       - script: |
           rm -fr ${BUILD_SOURCESDIRECTORY} || sudo rm -fr ${BUILD_SOURCESDIRECTORY} || true
-          echo y | docker image prune -a
         displayName: "Clean workspace"
 
       - checkout: self
         clean: true
         displayName: "Checkout out Repo"
+        fetchDepth: 0
 
   - ${{ if eq(parameters.dockerConfigName, 'gitCloneDockerConfig') }}:
       - script: |
           rm -fr ${BUILD_SOURCESDIRECTORY} || sudo rm -fr ${BUILD_SOURCESDIRECTORY} || true
           mkdir ${BUILD_SOURCESDIRECTORY}
           chmod 777 ${BUILD_SOURCESDIRECTORY}
-          echo y | docker image prune -a
         displayName: "Clean workspace"
 
       - checkout: none
@@ -62,6 +61,7 @@ steps:
 
   - ${{ if eq(parameters.imageSource, 'build') }}:
       - script: |
+          docker image prune -a -f
           if [[ ! $(docker images | grep -i ${{ parameters.repoName }}:${{ parameters.repoTag }}) ]]; then
             docker build -f ${BUILD_SOURCESDIRECTORY}/.azure-pipelines/docker/${{parameters.dockerFileName}}.devel -t ${{ parameters.repoName }}:${{ parameters.repoTag }} .
           fi

diff --git a/.azure-pipelines/ut-basic.yml b/.azure-pipelines/ut-basic.yml
@@ -19,6 +19,8 @@ pr:
       - neural_compressor/torch
       - neural_compressor/tensorflow
       - neural_compressor/onnxrt
+      - neural_compressor/transformers
+      - neural_compressor/evaluation
       - .azure-pipelines/scripts/ut/3x
 
 pool: ICX-16C

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -129,7 +129,8 @@ repos:
               examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/ptq_static/prompt.json|
               examples/notebook/dynas/ResNet50_Quantiation_Search_Supernet_NAS.ipynb|
               examples/notebook/dynas/Transformer_LT_Supernet_NAS.ipynb|
-              neural_compressor/torch/algorithms/fp8_quant/internal/diffusion_evaluation/SR_evaluation/imagenet1000_clsidx_to_labels.txt
+              neural_compressor/torch/algorithms/fp8_quant/internal/diffusion_evaluation/SR_evaluation/imagenet1000_clsidx_to_labels.txt|
+              neural_compressor/evaluation/hf_eval/datasets/cnn_validation.json
           )$
 
   - repo: https://github.com/astral-sh/ruff-pre-commit

diff --git a/README.md b/README.md
@@ -27,6 +27,7 @@ support AMD CPU, ARM CPU, and NVidia GPU through ONNX Runtime with limited testi
 * Collaborate with cloud marketplaces such as [Google Cloud Platform](https://console.cloud.google.com/marketplace/product/bitnami-launchpad/inc-tensorflow-intel?project=verdant-sensor-286207), [Amazon Web Services](https://aws.amazon.com/marketplace/pp/prodview-yjyh2xmggbmga#pdp-support), and [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/bitnami.inc-tensorflow-intel), software platforms such as [Alibaba Cloud](https://www.intel.com/content/www/us/en/developer/articles/technical/quantize-ai-by-oneapi-analytics-on-alibaba-cloud.html), [Tencent TACO](https://new.qq.com/rain/a/20221202A00B9S00) and [Microsoft Olive](https://github.com/microsoft/Olive), and open AI ecosystem such as [Hugging Face](https://huggingface.co/blog/intel), [PyTorch](https://pytorch.org/tutorials/recipes/intel_neural_compressor_for_pytorch.html), [ONNX](https://github.com/onnx/models#models), [ONNX Runtime](https://github.com/microsoft/onnxruntime), and [Lightning AI](https://github.com/Lightning-AI/lightning/blob/master/docs/source-pytorch/advanced/post_training_quantization.rst)
 
 ## What's New
+* [2024/10] [Transformers-like API](./docs/source/3x/transformers_like_api.md) for INT4 inference on Intel CPU and GPU.
 * [2024/07] From 3.0 release, framework extension API is recommended to be used for quantization.
 * [2024/07] Performance optimizations and usability improvements on [client-side](./docs/source/3x/client_quant.md).
 
@@ -71,7 +72,7 @@ pip install "neural-compressor>=2.3" "transformers>=4.34.0" torch torchvision
 ```
 After successfully installing these packages, try your first quantization program.
 
-### [FP8 Quantization](./examples/3.x_api/pytorch/cv/fp8_quant/)
+### [FP8 Quantization](./docs/source/3x/PT_FP8Quant.md)
 Following example code demonstrates FP8 Quantization, it is supported by Intel Gaudi2 AI Accelerator. 
 
 To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
@@ -147,7 +148,7 @@ Intel Neural Compressor will convert the model format from auto-gptq to hpu form
     </tr>
     <tr>
         <td colspan="2" align="center"><a href="./docs/source/3x/PT_WeightOnlyQuant.md">Weight-Only Quantization</a></td>
-        <td colspan="2" align="center"><a href="./docs/3x/PT_FP8Quant.md">FP8 Quantization</a></td>
+        <td colspan="2" align="center"><a href="./docs/source/3x/PT_FP8Quant.md">FP8 Quantization</a></td>
         <td colspan="2" align="center"><a href="./docs/source/3x/PT_MXQuant.md">MX Quantization</a></td>
         <td colspan="2" align="center"><a href="./docs/source/3x/PT_MixedPrecision.md">Mixed Precision</a></td>
     </tr>
@@ -164,6 +165,16 @@ Intel Neural Compressor will convert the model format from auto-gptq to hpu form
           <td colspan="2" align="center"><a href="./docs/source/3x/TF_SQ.md">Smooth Quantization</a></td>
       </tr>
   </tbody>
+  <thead>
+      <tr>
+        <th colspan="8">Transformers-like APIs</th>
+      </tr>
+  </thead>
+  <tbody>
+      <tr>
+          <td colspan="8" align="center"><a href="./docs/source/3x/transformers_like_api.md">Overview</a></td>
+      </tr>
+  </tbody>
   <thead>
       <tr>
         <th colspan="8">Other Modules</th>
@@ -181,6 +192,9 @@ Intel Neural Compressor will convert the model format from auto-gptq to hpu form
 > From 3.0 release, we recommend to use 3.X API. Compression techniques during training such as QAT, Pruning, Distillation only available in [2.X API](https://github.com/intel/neural-compressor/blob/master/docs/source/2x_user_guide.md) currently.
 
 ## Selected Publications/Events
+
+* EMNLP'2024: [Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs](https://arxiv.org/abs/2309.05516) (Sep 2024)
+* Blog on Medium: [Quantization on Intel Gaudi Series AI Accelerators](https://medium.com/intel-analytics-software/intel-neural-compressor-v3-0-a-quantization-tool-across-intel-hardware-9856adee6f11) (Aug 2024)
 * Blog by Intel: [Neural Compressor: Boosting AI Model Efficiency](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Neural-Compressor-Boosting-AI-Model-Efficiency/post/1604740) (June 2024)
 * Blog by Intel: [Optimization of Intel AI Solutions for Alibaba Cloud’s Qwen2 Large Language Models](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-ai-solutions-accelerate-alibaba-qwen2-llms.html) (June 2024)
 * Blog by Intel: [Accelerate Meta* Llama 3 with Intel AI Solutions](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-meta-llama3-with-intel-ai-solutions.html) (Apr 2024)

diff --git a/docs/build_docs/source/conf.py b/docs/build_docs/source/conf.py
@@ -34,10 +34,12 @@
     "sphinx.ext.coverage",
     "sphinx.ext.autosummary",
     "sphinx_md",
+    "sphinx_rtd_theme",
     "autoapi.extension",
     "sphinx.ext.napoleon",
     "sphinx.ext.githubpages",
     "sphinx.ext.linkcode",
+    "sphinxcontrib.jquery",
 ]
 
 autoapi_dirs = ["../../neural_compressor"]

diff --git a/docs/build_docs/sphinx-requirements.txt b/docs/build_docs/sphinx-requirements.txt
@@ -1,6 +1,10 @@
-recommonmark
-sphinx==6.1.1
-sphinx-autoapi
-sphinx-markdown-tables
-sphinx-md
-sphinx_rtd_theme
+recommonmark==0.7.1
+setuptools_scm[toml]==8.1.0
+sphinx==7.3.7
+sphinx-autoapi==3.1.0
+sphinx-autobuild==2024.4.16
+sphinx-markdown-tables==0.0.17
+sphinx-md==0.0.4
+sphinx_rtd_theme==2.0.0
+sphinxcontrib-jquery==4.1
+sphinxemoji==0.3.1
diff --git a/docs/build_docs/update_html.py b/docs/build_docs/update_html.py
@@ -56,11 +56,34 @@ def update_source_url(version, folder_name, index_file):
         f.write(index_buf)
 
 
+def update_search(folder):
+    search_file_name = "{}/search.html".format(folder)
+
+    with open(search_file_name, "r") as f:
+        index_buf = f.read()
+        key_str = '<script src="_static/searchtools.js"></script>'
+        version_list = """<!--[if lt IE 9]>
+    <script src="_static/js/html5shiv.min.js"></script>
+    <![endif]-->
+        <script src="_static/jquery.js?v=5d32c60e"></script>
+        <script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
+        <script src="_static/documentation_options.js?v=fc837d61"></script>
+        <script src="_static/doctools.js?v=9a2dae69"></script>
+        <script src="_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="_static/js/theme.js"></script>
+    <script src="_static/searchtools.js"></script>"""
+        index_buf = index_buf.replace(key_str, version_list)
+
+    with open(search_file_name, "w") as f:
+        f.write(index_buf)
+
+
 def main(folder, version):
     folder_name = os.path.basename(folder)
     for index_file in glob.glob("{}/**/*.html".format(folder), recursive=True):
         update_version_link(version, folder_name, index_file)
         update_source_url(version, folder_name, index_file)
+    update_search(folder)
 
 
 def help(me):

diff --git a/docs/3x/PT_FP8Quant.md → docs/source/3x/PT_FP8Quant.md b/docs/3x/PT_FP8Quant.md → docs/source/3x/PT_FP8Quant.md
@@ -108,6 +108,6 @@ model = convert(model)
 | Task                 | Example |
 |----------------------|---------|
 | Computer Vision (CV)      |    [Link](../../examples/3.x_api/pytorch/cv/fp8_quant/)     |
-| Large Language Model (LLM) |    [Link](https://github.com/HabanaAI/optimum-habana-fork/tree/habana-main/examples/text-generation#running-with-fp8)     |
+| Large Language Model (LLM) |    [Link](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation#running-with-fp8)     |
 
 > Note: For LLM, Optimum-habana provides higher performance based on modified modeling files, so here the Link of LLM goes to Optimum-habana, which utilize Intel Neural Compressor for FP8 quantization internally.
diff --git a/docs/source/3x/PT_WeightOnlyQuant.md b/docs/source/3x/PT_WeightOnlyQuant.md
@@ -1,6 +1,7 @@
 
 PyTorch Weight Only Quantization
 ===============
+
 - [Introduction](#introduction)
 - [Supported Matrix](#supported-matrix)
 - [Usage](#usage)
@@ -14,6 +15,8 @@ PyTorch Weight Only Quantization
     - [HQQ](#hqq)
   - [Specify Quantization Rules](#specify-quantization-rules)
   - [Saving and Loading](#saving-and-loading)
+- [Layer Wise Quantization](#layer-wise-quantization)
+- [Efficient Usage on Client-Side](#efficient-usage-on-client-side)
 - [Examples](#examples)
 
 ## Introduction
@@ -108,9 +111,10 @@ model = convert(model)
 |               model_path (str)        |  Model path that is used to load   state_dict per layer                                                                                    |                    |
 |               use_double_quant (bool) |  Enables double quantization                                                                                                               |  False  |
 |               act_order (bool)        |  Whether to sort Hessian's diagonal   values to rearrange channel-wise quantization order                                                  |  False  |
-|               percdamp (float)        |  Percentage of Hessian's diagonal   values' average, which will be added to Hessian's diagonal to increase   numerical stability           |  0.01.  |
+|               percdamp (float)        |  Percentage of Hessian's diagonal   values' average, which will be added to Hessian's diagonal to increase   numerical stability           |  0.01  |
 |               block_size (int)        |  Execute GPTQ quantization per   block, block shape = [C_out, block_size]                                                                  |  128     |
-|               static_groups (bool)    |  Whether to calculate group wise   quantization parameters in advance. This option mitigate actorder's extra   computational requirements. |  False.  |
+|               static_groups (bool)    |  Whether to calculate group wise   quantization parameters in advance. This option mitigate actorder's extra   computational requirements. |  False  |
+|               true_sequential (bool)    |  Whether to quantize layers within a transformer block in their original order. This can lead to higher accuracy but slower overall quantization process. |  False  |
 > **Note:** `model_path` is only used when use_layer_wise=True. `layer-wise` is stay-tuned.
 
 ``` python

diff --git a/docs/source/3x/PyTorch.md b/docs/source/3x/PyTorch.md
@@ -176,16 +176,21 @@ def load(output_dir="./saved_results", model=None):
     <td class="tg-9wq8"><a href="PT_SmoothQuant.md">link</a></td>
   </tr>
   <tr>
-    <td class="tg-9wq8" rowspan="2">Static Quantization</td>
-    <td class="tg-9wq8" rowspan="2"><a href=https://pytorch.org/docs/master/quantization.html#post-training-static-quantization>Post-traning Static Quantization</a></td>
-    <td class="tg-9wq8">intel-extension-for-pytorch</td>
+    <td class="tg-9wq8" rowspan="3">Static Quantization</td>
+    <td class="tg-9wq8" rowspan="3"><a href=https://pytorch.org/docs/master/quantization.html#post-training-static-quantization>Post-traning Static Quantization</a></td>
+    <td class="tg-9wq8">intel-extension-for-pytorch (INT8)</td>
     <td class="tg-9wq8">&#10004</td>
     <td class="tg-9wq8"><a href="PT_StaticQuant.md">link</a></td>
   </tr>
   <tr>
-    <td class="tg-9wq8"><a href=https://pytorch.org/docs/stable/torch.compiler_deepdive.html>TorchDynamo</a></td>
+    <td class="tg-9wq8"><a href=https://pytorch.org/docs/stable/torch.compiler_deepdive.html>TorchDynamo (INT8)</a></td>
     <td class="tg-9wq8">&#10004</td>
     <td class="tg-9wq8"><a href="PT_StaticQuant.md">link</a></td>
+  <tr>
+    <td class="tg-9wq8"><a href=https://docs.habana.ai/en/latest/index.html>Intel Gaudi AI accelerator (FP8)</a></td>
+    <td class="tg-9wq8">&#10004</td>
+    <td class="tg-9wq8"><a href="PT_FP8Quant.md">link</a></td>
+  </tr>
   </tr>
   <tr>
     <td class="tg-9wq8">Dynamic Quantization</td>
@@ -240,7 +245,7 @@ Deep Learning</a></td>
     </table>
 
 2. How to set different configuration for specific op_name or op_type?
-    > INC extends a `set_local` method based on the global configuration object to set custom configuration.
+    > Neural Compressor extends a `set_local` method based on the global configuration object to set custom configuration.
 
     ```python
     def set_local(self, operator_name_or_list: Union[List, str, Callable], config: BaseConfig) -> BaseConfig:
@@ -259,3 +264,15 @@ Deep Learning</a></td>
     quant_config.set_local(".*mlp.*", RTNConfig(bits=8))  # For layers with "mlp" in their names, set bits=8
     quant_config.set_local("Conv1d", RTNConfig(dtype="fp32"))  # For Conv1d layers, do not quantize them.
     ```
+
+3. How to specify an accelerator?
+
+    > Neural Compressor provides automatic accelerator detection, including HPU, XPU, CUDA, and CPU.
+
+    > The automatically detected accelerator may not be suitable for some special cases, such as poor performance, memory limitations. In such situations, users can override the detected accelerator by setting the environment variable `INC_TARGET_DEVICE`.
+
+    > Usage:
+
+    ```bash
+    export INC_TARGET_DEVICE=cpu
+    ```