Compiling for llvm-cpu without targeting a specific CPU is a bad experience #18561

stellaraccident · 2024-09-19T23:54:26Z

I've seen multiple people falling down this hole: they run iree-compile on their model, targeting CPU. Then they get performance that is 10x-100x off of any reasonable expectation. Then they either go away silently or report back about poor experiences (not always reporting flags and such).

There are good reasons why a compiler like IREE shouldn't make assumptions about what the CPU target is, but on the other hand, it will almost always produce a a grossly subpar experience to not specify a target CPU, since the generic target (at least on X86) lacks so many features as to be basically useless for any high performance numerics.

I've even fallen down this hole recently and had to go remember the incantation to select a specific CPU. In the case I was working on (an f16 CPU LLM), performance was 100x different between not specifying a target CPU and specifying "host". We need to guide people better than this.

Proposal

As mentioned, there are good reasons for a compiler to not make too many assumptions without being told what to do. But I think we can/should actively warn, possibly with a link to the documentation site when the compiler is invoked without the user specifying a target CPU. Whatever the warning is should be very explicit that the user should pass --iree-llvmcpu-target-cpu=host to target the precise CPU they are running on. We should possibly also accept "generic" or something like that for if the user really wants to target the default and not get the warning. I basically want to guard against the case where the user has not specified anything and the compiler just silently generates 100x too slow of code. In almost all cases, it will be better for the user to say something and we should guide them on a proper choice.

The text was updated successfully, but these errors were encountered:

ScottTodd · 2024-09-20T00:15:16Z

We need to guide people better than this.

Yes please! #15487

stellaraccident · 2024-09-20T01:36:22Z

Let's stop over thinking this and do something simple like I suggest. Open to other options but would like to see this improved.

ScottTodd · 2024-09-20T15:49:21Z

The suggested proposal SGTM. I might even want to default to having LLVM use the current host for the target CPU and available features (if those are different), then have users explicitly pass "generic" for the lowest common denominator.

We could apply similar logic to the GPU backends - try to detect devices on the system (shell out to vulkaninfo / rocm-smi / nvidia-smi?) and default to what is available, but still support cross compilation with explicit device info and a "generic" target where possible.

benvanik · 2024-09-20T15:52:26Z

Yuck - that is not a cheap thing to do and has a high risk of flakes - I am still not sure why proper documentation is insufficient? You must specify your target device (--iree-hal-target-device=) when compiling so also specifying a "use my hardware info" is fine. Just change the documentation to include both flags and then a user has a choice and knows what to do if they want to change things. Anything automatic is going to have issues. Users aren't coming in to iree-compile command line invocations blind - if all the docs specify the flag and they choose not to copy/paste it that's on them.

benvanik · 2024-09-20T16:00:46Z

Note that this is also what clang does - https://clang.llvm.org/docs/HIPSupport.html - you must pass --offload-arch= to compile HIP code and if you want the native host target you must pass --offload-arch=native. nvcc does the same thing - you pass -arch=[some gpu] or -arch=native.

benvanik · 2024-09-20T16:02:23Z

(if there's such a big concern about documentation not fixing this issue then I'd be ok with making compilation fail if the user doesn't specify an arch for a backend - whether a particular arg, generic, native, etc - but guessing is bad)

ScottTodd · 2024-09-20T16:11:50Z

We can certainly update the docs (https://iree.dev/guides/deployment-configurations/cpu/#compile-a-program) and start with a warning from the compiler if information is omitted and generic is used as the default.

I'm seeing a proliferation of flags (mainly in rocm usage, but also cpu) and the documentation can't keep up. I want more of that to be captured somewhere - docs, samples, the compiler itself, etc.

See one example here:

iree/experimental/regression_suite/shark-test-suite-models/sdxl/test_unet.py

Lines 90 to 110 in 914858f

    
           ROCM_COMPILE_FLAGS = [ 
        
               "--iree-hal-target-backends=rocm", 
        
               f"--iree-hip-target={rocm_chip}", 
        
               "--iree-opt-const-eval=false", 
        
               f"--iree-codegen-transform-dialect-library={iree_test_path_extension}/attention_and_matmul_spec.mlir", 
        
               "--iree-global-opt-propagate-transposes=true", 
        
               "--iree-dispatch-creation-enable-fuse-horizontal-contractions=true", 
        
               "--iree-dispatch-creation-enable-aggressive-fusion=true", 
        
               "--iree-opt-aggressively-propagate-transposes=true", 
        
               "--iree-opt-outer-dim-concat=true", 
        
               "--iree-vm-target-truncate-unsupported-floats", 
        
               "--iree-llvmgpu-enable-prefetch=true", 
        
               "--iree-opt-data-tiling=false", 
        
               "--iree-codegen-gpu-native-math-precision=true", 
        
               "--iree-codegen-llvmgpu-use-vector-distribution", 
        
               "--iree-hip-waves-per-eu=2", 
        
               "--iree-execution-model=async-external", 
        
               "--iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline,iree-preprocessing-pad-to-intrinsics)", 
        
               "--iree-scheduling-dump-statistics-format=json", 
        
               "--iree-scheduling-dump-statistics-file=compilation_info.json", 
        
           ]

benvanik · 2024-09-20T16:14:31Z

That's insanity - besides the debug flags (dumping statistics/etc) if any of those are required that's a bug. I think Mahesh has said it before: a feature is not done until it's on by default and if all of those flags are needed to make the model compile or perform then the engineering was never completed. The only two flags required there should be --iree-hal-target-device=hip (that's using the old deprecated flag) and --iree-hip-target= (which could be native if we wanted to do what clang does and invoke amdgpu-arch if it's present).

stellaraccident · 2024-09-20T16:30:05Z

I'm fine making --iree-llvmcpu-target-cpu=<something> required. Given the proliferation of things out there, I think that the way to get there may be to first do what I am suggesting: make it issue a warning if not specified and incorrectly/implicitly defaulting to a generic CPU (with a note that this flag will soon be required).

Agreed on all of the other points. Need to burn down all of the other flags. I'm just starting with this one.

ScottTodd · 2024-09-20T18:00:20Z

I can take a pass at this, unless someone else wants to.

Plan:

Emit a warning if --iree-llvmcpu-target-cpu is omitted
(Maybe?) emit a warning if iree-llvmcpu-target-cpu-features is omitted
Update docs at https://iree.dev/guides/deployment-configurations/cpu/
Audit usage of --iree-hal-target-backends=llvm-cpu in-tree and set those flags explicitly
- Also start switching the repo over to --iree-hal-target-device?
(Later) make one or both of those flags required

ScottTodd · 2024-09-20T19:11:19Z

Can someone clarify why we have all three of these flags?

--iree-llvmcpu-target-triple
--iree-llvmcpu-target-cpu
--iree-llvmcpu-target-cpu-features

It seems like the triple could be a superset of the cpu? Is there some redundancy there? I see some riscv sample code setting both:

iree/runtime/src/iree/hal/local/elf/testdata/generate.sh

Lines 68 to 73 in d834aa7

    
           RISCV_64=( 
        
             --iree-llvmcpu-target-triple=riscv64-pc-linux-elf 
        
             --iree-llvmcpu-target-cpu=generic-rv64 
        
             --iree-llvmcpu-target-cpu-features=+m,+a,+f,+d,+c 
        
             --iree-llvmcpu-target-abi=lp64d 
        
           )

but even our microkernels blog post (highlighting cpu performance work) only includes a few of the flags:

iree/docs/website/docs/community/blog/posts/microkernels.md

Lines 25 to 32 in d834aa7

    
           Basic compilation command line: 
        
           ```bash 
        
           $ iree-compile matmul.mlir -o /tmp/matmul.vmfb \ 
        
             --iree-hal-target-backends=llvm-cpu \ 
        
             --iree-llvmcpu-target-cpu=znver4 \ 
        
             --iree-llvmcpu-enable-ukernels=all 
        
           ```

Oh, the linalg tutorial from @bjacob explains that target-cpu is for x86, but target-cpu-features is for other architectures?

iree/docs/website/docs/community/blog/posts/linalg-tutorial.md

Lines 183 to 193 in d834aa7

    
           * To run on GPU or other non-CPU targets, explore other values for 
        
             `--iree-hal-target-backends=`. You will then need to pass a matching 
        
             `--device=` to `iree-run-module` below. 
        
           * To cross-compile, explore `--iree-llvmcpu-target-triple=`. 
        
           * To enable higher CPU performance by enabling CPU features: 
        
               * On x86, explore `--iree-llvmcpu-target-cpu=` (e.g. 
        
                 `--iree-llvmcpu-target-cpu=znver4` to target AMD Zen4). 
        
               * On other architectures, explore `--iree-llvmcpu-target-cpu-features=`. 
        
               * To optimize for running on the same machine that the compilation ran 
        
                 on, pass  `--iree-llvmcpu-target-cpu=host`. That works regardless of 
        
                 CPU architecture.

There is some complicated logic and then a few calls into LLVM itself in https://github.com/iree-org/iree/blob/main/compiler/plugins/target/LLVMCPU/LLVMTargetOptions.cpp.

(This is why I filed #15487 - I've wanted someone directly familiar with LLVM CPU to be driving this)

ScottTodd · 2024-09-20T19:58:12Z

More context on cpu vs cpu-features:

stellaraccident · 2024-09-20T20:01:53Z

This stuff always grows into a bit of a hairball. The condition we are trying to guard is that iree-llvmcpu-target-cpu being empty should never drive a decision (should be a warning now and an error eventually). Will need to peel back the decision tree to that.

benvanik · 2024-09-20T20:49:36Z

The CPU flags mirror LLVM - we can't remove them, but we could more intelligently populate them - maybe - triple is often not enough. I think we do ask for defaults from LLVM today. I'm hesitant to suggest we diverge from clang behavior as then we have to support that (and if the issue here is that our documentation sucks adding bespoke stuff only hurts that).

ScottTodd · 2024-09-20T23:28:58Z

Made some progress stepping through the details:

I tried this resnet50 ONNX model with and without --iree-llvmcpu-target-cpu=host on my system. I see about 140ms with the flag and 170ms without:

See logs and commands used here

# (Download the file and upgrade it to ONNX version >= 17)
$ iree-compile.exe \
  resnet50-v2-7_version17.mlir \
  --iree-hal-target-backends=llvm-cpu \
  -o resnet_noflags.vmfb
$ iree-compile.exe \
  resnet50-v2-7_version17.mlir \
  --iree-hal-target-backends=llvm-cpu \
  --iree-llvmcpu-target-cpu=host \
  -o resnet_targetcpu_host.vmfb

$ iree-benchmark-module.exe \
  --module=resnet_noflags.vmfb \
  --device=local-task \
  --function=main \
  --input=1x3x224x224xf32
2024-09-20T14:05:40-07:00
Running iree-benchmark-module.exe
Run on (64 X 3693 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x32)
  L1 Instruction 32 KiB (x32)
  L2 Unified 512 KiB (x32)
  L3 Unified 16384 KiB (x8)
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------
BM_main/process_time/real_time        171 ms         2898 ms            4 items_per_second=5.86105/s

$ iree-benchmark-module.exe \
  --module=resnet_targetcpu_host.vmfb \
  --device=local-task \
  --function=main \
  --input=1x3x224x224xf32
2024-09-20T14:06:24-07:00
Running iree-benchmark-module.exe
Run on (64 X 3693 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x32)
  L1 Instruction 32 KiB (x32)
  L2 Unified 512 KiB (x32)
  L3 Unified 16384 KiB (x8)
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------
BM_main/process_time/real_time        138 ms         2134 ms            5 items_per_second=7.24555/s

JitGlobals runs the CPU compilation pipeline for the host, regardless of flags and requested devices. That's fine, it's an implementation detail of the compiler. We shouldn't emit any warnings on this path. Relevant code:

iree/compiler/src/iree/compiler/ConstEval/JitGlobals.cpp

Lines 658 to 668 in 891f438

    
           static std::string 
        
           resolveTargetDevice(const IREE::HAL::TargetRegistry &targetRegistry) { 
        
             if (clJitTargetDevice.empty()) { 
        
               // Default - choose something we have. 
        
               // First llvm-cpu then vmvx. 
        
               if (targetRegistry.getTargetDevice("llvm-cpu")) { 
        
                 return std::string("llvm-cpu"); 
        
               } else { 
        
                 return std::string("vmvx"); 
        
               } 
        
             }

iree/compiler/src/iree/compiler/ConstEval/JitGlobals.cpp

Lines 804 to 814 in 891f438

    
           // Set the target. 
        
           std::optional<IREE::HAL::DeviceTargetAttr> targetAttr = 
        
               targetDevice->getHostDeviceTarget(&getContext(), *targetRegistry.value); 
        
           { 
        
             if (!targetAttr) { 
        
               emitError(UnknownLoc::get(&getContext())) 
        
                   << "consteval requested backend " << requestedTargetDevice 
        
                   << " cannot target the host"; 
        
               signalPassFailure(); 
        
               return; 
        
             }

The targetCPU flag default is set here:

iree/compiler/plugins/target/LLVMCPU/LLVMTargetOptions.h

Lines 178 to 181 in 891f438

    
           // Default device options. 
        
           std::string targetTriple = ""; 
        
           std::string targetCPU = "generic"; 
        
           std::string targetCPUFeatures = "";

I'm thinking I'll replace that with empty string then add some logic that warns and sets it back to that default later. Need to still watch for how JitGlobals, default target construction from CLI flags, and explicit target construction from --iree-hal-target-device or program IR interact with all the code paths.

ScottTodd · 2024-09-23T20:31:26Z

@marbre pointed out that for bare metal arm, the target handling is letting some "errors" fall through:

iree/compiler/plugins/target/LLVMCPU/LLVMTargetOptions.cpp

Lines 83 to 97 in e19950c

    
           if (triple.isX86()) { 
        
             llvm::SmallVector<llvm::StringRef> cpuFeatureList; 
        
             addCpuFeatures(llvm::X86::getFeaturesForCPU, cpuFeatureList); 
        
           } else if (triple.isRISCV64()) { 
        
             llvm::SmallVector<std::string> cpuFeatureList; 
        
             addCpuFeatures(llvm::RISCV::getFeaturesForCPU, cpuFeatureList); 
        
           } else { 
        
             llvm::errs() 
        
                 << "error: Resolution of target CPU to target CPU features is not " 
        
                    "implemented on " 
        
                    "this target architecture. Pass explicit CPU features " 
        
                    "instead of a CPU " 
        
                    "on this architecture, or implement that.\n"; 
        
             return false; 
        
           }

iree/compiler/plugins/target/LLVMCPU/LLVMTargetOptions.cpp

Lines 147 to 155 in e19950c

    
             if (!resolveCPUAndCPUFeatures(cpu, cpuFeatures, llvm::Triple(triple), 
        
                                           target.cpu, target.cpuFeatures)) { 
        
               // Something bad happened, and our target might not be what the user expects 
        
               // but we need to continue to avoid breaking existing users. Hopefully 
        
               // resolveCPUAndCPUFeatures logged a helpful error already. 
        
             } 
        
             return target; 
        
           }

Sample logs: https://github.com/iree-org/iree-bare-metal-arm/actions/runs/10923467370/job/30320173426#step:11:262

[158/258] Generating simple_mul_int_bytecode_module_static_c_module_emitc.h, simple_mul_int_bytecode_module_static_c_module.o, simple_mul_int_bytecode_module_static_c_module.h
error: Resolution of target CPU to target CPU features is not implemented on this target architecture. Pass explicit CPU features instead of a CPU on this architecture, or implement that.
error: Resolution of target CPU to target CPU features is not implemented on this target architecture. Pass explicit CPU features instead of a CPU on this architecture, or implement that.

Flags for those logs: https://github.com/iree-org/iree-bare-metal-arm/blob/23deb47d546786e7bd64fc6edd51a3095b6c1817/samples/simple_embedding/CMakeLists.txt#L98-L109

We may want to amend that logic here too. The concern about not "breaking existing users" is potentially leaving performance on the table with that style of error reporting.

ScottTodd · 2024-09-23T20:32:32Z

More context for my previous comment: #15387

bjacob · 2024-09-23T20:40:03Z

Oh yeah, I only cared about ARM when I wrote that :-D

ScottTodd · 2024-09-23T21:03:58Z

I noticed that we override the targetTriple (--iree-llvm-target-triple=) string when using embedded linking:

iree/compiler/plugins/target/LLVMCPU/LLVMTargetOptions.cpp

Lines 139 to 146 in e19950c

    
           if (target.linkEmbedded) { 
        
             // Force the triple to something compatible with embedded linking. 
        
             targetTriple.setVendor(llvm::Triple::VendorType::UnknownVendor); 
        
             targetTriple.setEnvironment(llvm::Triple::EnvironmentType::EABI); 
        
             targetTriple.setOS(llvm::Triple::OSType::UnknownOS); 
        
             targetTriple.setObjectFormat(llvm::Triple::ObjectFormatType::ELF); 
        
             target.triple = targetTriple.str(); 
        
           }

However, we only override parts of the triple, not the full object/string. In particular, that code appears to leave the "arch" unchanged. Possible values for that are in https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/TargetParser/Triple.h. Does that mean that if you compile on x86_64, your code generated with llvm-cpu won't be compatible with aarch64?

I'm wondering if this other default should be changed to an explicit "host" too:

iree/compiler/plugins/target/LLVMCPU/LLVMTargetOptions.cpp

Lines 669 to 671 in e19950c

    
           if (targetTriple.empty()) { 
        
             targetTriple = llvm::sys::getProcessTriple(); 
        
           }

ScottTodd · 2024-09-23T21:42:08Z

Does that mean that if you compile on x86_64, your code generated with llvm-cpu won't be compatible with aarch64?

Answering my own question - yes. Compiled with embedded linking and

--iree-llvmcpu-target-triple=aarch64-pc-linux-elf
- This program failed to load on Windows x86_64 with <vm>:0: NOT_FOUND; HAL device `__device_0` not found or unavailable: #hal.device.target<"local", [#hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu = "", cpu_features = "+reserve-x18", data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32", native_vector_size = 16 : i64, target_triple = "aarch64-unknown-unknown-eabi-elf"}>]>;
--iree-llvmcpu-target-triple=x86_64-pc-linux-elf
- I was able to run this program on Windows x86_64

I'm still wondering if we want to default the target triple to llvm::sys::getProcessTriple() or also make that an explicit "host lets LLVM decide what to do" option. At least there we have no generic choice to fall back on, right?

ScottTodd · 2024-09-23T21:48:52Z

Either way, we could have our docs explain OS, arch, features, etc.

OS: matters when using the system linker (for full integration with debug tools). Does not matter with embedded linking mode.
Arch: Inferred from the hosting process or set explicitly
Features: Can be inferred from the cpu on some architectures (x86 and riscv64), otherwise must be set explicitly

ScottTodd · 2024-09-23T22:02:32Z

Ehhh... we only support the "host" CPU name on x86?

iree/compiler/plugins/target/LLVMCPU/LLVMTargetOptions.cpp

Line 50 in e19950c

outCpu = triple.isX86() ? llvm::sys::getHostCPUName().str() : "";

The https://github.com/llvm/llvm-project/blob/main/llvm/lib/TargetParser/Host.cpp file supports plenty of other architectures though...

benvanik · 2024-09-23T22:47:01Z

You're unfortunately finding the results of engineers not caring about anything but the exact configuration they are looking at in the moment they author something. Has been an issue for the lifetime of the project and probably always will be 😢

stellaraccident · 2024-09-23T22:50:38Z

Thanks for digging into it, Scott. Feel free to loop one of the backend engineers in if you need help untangling it. I'm happy to nominate others to care in detail.

ScottTodd · 2024-09-23T22:54:22Z

I think I see enough of the pieces now to refactor the code a bit and add some helpful warnings and documentation.

I'm not sure how I'll test my changes though, since a fair portion of this is different depending on the architecture of the host machine running the compiler and I only have x86_64 dev machines.

It would be helpful to get some more eyes on the various configurations we want to support and then do some manual QA testing that the compiler either detects the right features and generates good code, or bails with a helpful error.

stellaraccident · 2024-09-23T22:58:49Z

Maybe more of a unit test via some magic env var or test only flag: --iree-testing-assume-host= then a lit test variant for each arch branch that runs device assignment and validates. We're not looking to test llvm here, just ensure that we're not fumbling the flag parsing.

ScottTodd · 2024-09-23T23:00:44Z

That could work, yeah. When I say "test my changes" here, I'm still just referring to local development "testing", not automated CI testing - that would be a nice bonus.

stellaraccident · 2024-09-23T23:03:42Z

Well, if you have the knobs to verify locally, then you're more than halfway to a lit test. That's how most of these things in llvm proper get tested.

ScottTodd · 2024-09-23T23:40:47Z

Pushed an initial attempt at reworking how the target init is handled: #18587 . I could pass that off to someone else and context switch to other tasks 🤔

bjacob · 2024-09-24T01:46:23Z

Sorry, I had not kept up with the discussion here, was heads down in GPU data tiling.

Here are the difficulties that I know of:

Different concepts are more or less relevant on different CPU architectures:
- On x86, people want to talk in terms of "CPU" (meaning microarchitecture) such as znver4 or cascadelake. People do not typically want to talk in terms of CPU features on x86 because that is very cumbersome. For example, just enabling the baseline AVX-512 feature set on x86 is a combination of 5 features, each with long names; a typical compilation relies on > 10 CPU features.
- On RISC-V, people want to talk in terms of CPU features, and there are many, but they are not too cumbersome thanks to very short names, e.g. +z,+a,+m. The "CPU" string is not much used on RISC-V, according to RISC-V folks I asked back then, due to the very modular nature of the architecture.
- On Arm, people typically specify baseline Arm architecture version plus a few CPU features, e.g. armv8.2-a+i8mm. The CPU names are also not much used on Arm; when targeting Android, fragmentation makes that hard anyway.
To map a CPU name to CPU features, LLVM has a nice utility function doing that... on x86, but not on other architectures.
- You know me, if that utility function had been available outside of x86, I would not have special-cased x86.
- I tried implementing that on other architectures, but the ways I could see were either still architecture-specific in some way, or felt too heavy.

Here is what I would do:

When --iree-llvmcpu-target-triple is host (or unspecified, so defaults to host), default to --iree-llvmcpu-target-cpu=host.
- This matches the behavior of hipcc, so I presume also nvcc.
When --iree-llvmcpu-target-triple is not host, your Warn when --iree-llvmcpu-target-cpu defaults to "generic". #18587 sounds like a good way to go. Could dump a list of recognized CPU names for the specified target architecture.

ScottTodd · 2024-09-24T15:51:13Z

Thanks for the context!

When --iree-llvmcpu-target-triple is host (or unspecified, so defaults to host), default to --iree-llvmcpu-target-cpu=host.

This matches the behavior of hipcc, so I presume also nvcc.

Defaulting to host (and not requiring it be set explicitly) goes against @benvanik 's suggestions up in the issue: #18561 (comment) (unless that was specifically referring to autodetection for gpu targets?)

Note that this is also what clang does - https://clang.llvm.org/docs/HIPSupport.html - you must pass --offload-arch= to compile HIP code and if you want the native host target you must pass --offload-arch=native. nvcc does the same thing - you pass -arch=[some gpu] or -arch=native.

2. Could dump a list of recognized CPU names for the specified target architecture.

Oh, this sounds useful. Are there functions in LLVM that would help get such a list?

benvanik · 2024-09-24T16:02:20Z

Yeah mostly about GPU targets (where we have to launch other tools that fully load/initialize drivers and such) - CPU detection in LLVM is free.

bjacob · 2024-10-03T19:03:38Z

WDYT about #18682 ?

Progress on #18561. This introduces a warning (which we intend to promote to an error in the future) when targeting a generic CPU without explicitly asking for it. This addresses a performance footgun as that IREE default results in low performance. Along the way this grew into a substantial change to e2e testing rules: - `TARGET_CPU` and `TARGET_CPU_FEATURES` arguments are gone (were redundant with `COMPILER_FLAGS`). - For `TARGET_CPU_FEATURES_VARIANTS`, the special value `"default"` is renamed to `"generic"` and a new value `"host"` is also supported. Example warning (this is customized to the target architecture, here x86): ``` /home/benoit/matmul_i8.mlir:0:0: warning: while creating CPU target: Defaulting to targeting a generic CPU for the target architecture will result in poor performance. Please specify a target CPU and/or a target CPU feature set. If it is intended to target a generic CPU, specify "generic" as the CPU. This can be done in two ways: 1. With command-line flags: --iree-llvmcpu-target-cpu=... --iree-llvmcpu-target-cpu-features=... 2. Within the IR: #hal.executable.target< ... , cpu="...", cpu_features="..."> In the rest of this message, these fields are referred to as just `cpu` and `cpu_features`. Examples: cpu=generic Target a generic CPU of the target architecture. The generated code will have poor performance, but will run on any CPU. cpu=host Target the host CPU. The generated code will have optimal performance on the host CPU but will crash on other CPUs not supporting the same CPU features. cpu="name" Target a specific CPU. This is mostly used on x86. The accepted values are the same as in Clang command lines. List of accepted x86 CPUs: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, rocketlake, icelake-server, tigerlake, sapphirerapids, alderlake, raptorlake, meteorlake, arrowlake, arrowlake-s, lunarlake, gracemont, pantherlake, sierraforest, grandridge, graniterapids, graniterapids-d, emeraldrapids, clearwaterforest, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, znver4, znver5, x86-64, x86-64-v2, x86-64-v3, x86-64-v4 cpu_features="+feature1,..." Target a CPU supporting the comma-separated of (+-prefixed) features. The accepted values are the same as in Clang command lines. ``` --------- Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

bjacob · 2024-10-30T17:39:57Z

@ScottTodd , should we close this as completed by #18682 or leave this open until further changes are made, such as promoting that warning into an error? That we shouldn't do before a full release cycle of whichever distribution channel users are getting their IREE from, so I wonder if leaving this open means having an open, essentially unactionable issue for half a year.

ScottTodd · 2024-10-30T17:44:40Z

Let's keep this open until we at least update our docs (https://iree.dev/guides/deployment-configurations/cpu/) to include the recommended best practices (e.g. --iree-llvmcpu-target-cpu=host).

We'll push a new stable release within a few weeks, and the next should then be 6-8 weeks later.

ScottTodd added documentation ✏️ Improvements or additions to documentation codegen/llvm LLVM code generation compiler backend labels Sep 20, 2024

ScottTodd self-assigned this Sep 20, 2024

ScottTodd mentioned this issue Sep 23, 2024

Warn when --iree-llvmcpu-target-cpu defaults to "generic". #18587

Closed

4 tasks

ScottTodd assigned bjacob Oct 16, 2024

ScottTodd mentioned this issue Oct 16, 2024

Warn when --iree-llvmcpu-target-cpu defaults to "generic". #18682

Merged

ScottTodd mentioned this issue Oct 30, 2024

CPU LLM Integration Test nod-ai/SHARK-Platform#373

Draft

Compiling for llvm-cpu without targeting a specific CPU is a bad experience #18561

Compiling for llvm-cpu without targeting a specific CPU is a bad experience #18561

Comments

stellaraccident commented Sep 19, 2024

Proposal

ScottTodd commented Sep 20, 2024

stellaraccident commented Sep 20, 2024

ScottTodd commented Sep 20, 2024

benvanik commented Sep 20, 2024 • edited Loading

benvanik commented Sep 20, 2024

benvanik commented Sep 20, 2024

ScottTodd commented Sep 20, 2024

benvanik commented Sep 20, 2024

stellaraccident commented Sep 20, 2024 • edited Loading

ScottTodd commented Sep 20, 2024

ScottTodd commented Sep 20, 2024

ScottTodd commented Sep 20, 2024 • edited Loading

stellaraccident commented Sep 20, 2024

benvanik commented Sep 20, 2024

ScottTodd commented Sep 20, 2024

ScottTodd commented Sep 23, 2024

ScottTodd commented Sep 23, 2024

bjacob commented Sep 23, 2024

ScottTodd commented Sep 23, 2024

ScottTodd commented Sep 23, 2024

ScottTodd commented Sep 23, 2024

ScottTodd commented Sep 23, 2024

benvanik commented Sep 23, 2024

stellaraccident commented Sep 23, 2024

ScottTodd commented Sep 23, 2024

stellaraccident commented Sep 23, 2024

ScottTodd commented Sep 23, 2024

stellaraccident commented Sep 23, 2024

ScottTodd commented Sep 23, 2024

bjacob commented Sep 24, 2024 • edited Loading

ScottTodd commented Sep 24, 2024

benvanik commented Sep 24, 2024

bjacob commented Oct 3, 2024

bjacob commented Oct 30, 2024

ScottTodd commented Oct 30, 2024

benvanik commented Sep 20, 2024 •

edited

Loading

stellaraccident commented Sep 20, 2024 •

edited

Loading

ScottTodd commented Sep 20, 2024 •

edited

Loading

bjacob commented Sep 24, 2024 •

edited

Loading