Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn when --iree-llvmcpu-target-cpu defaults to "generic". #18587

Closed
wants to merge 3 commits into from

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Sep 23, 2024

Progress on #18561.

We have these high level flags (and MLIR attributes) controlling what code the llvm-cpu compiler target generates using LLVM:

Some targets (x86 and riscv64) in LLVM can infer the "features" for a given "cpu" with the getFeaturesForCPU() functions. Other targets may be able to look up available features on the host via llvm::sys::getHostCPUFeatures(). Short of either of those, we must fall back to having users explicitly list the features they want the compiler to use.

If the target CPU and target CPU features are both omitted, we currently fall back to "generic", meaning no "features". This PR sends more warning information to stderr if that case is detected. We could go a step further and make the default case propagate an error through the compiler, requiring users to explicit set "host", "generic", a known cpu type, or a known list of features.


TODO


Debugging tips while working on this:

  • Run with --debug-only=iree-llvm-cpu-target to see the computed LLVMTarget get logged
  • Note that multiple code paths run this code, including CLI flags that build defaultOptions_ for getDefaultExecutableTargets() and getVariantTarget/loadFromConfigAttr that may come from already constructed IR. We want error handling for all of them.

@stellaraccident
Copy link
Collaborator

Thanks - I think that starting with warnings is a good way to flush this. Then have an easy thing to upgrade to an error.

@bjacob
Copy link
Contributor

bjacob commented Sep 24, 2024

Thanks!

The warning message could helpfully dump a list of accepted CPU values for the target architecture.

At that point, passing the missing CPU flag would be so easy that you might as well mandate it, promoting this warning into an error. If a user is technical enough to specify a non-host target triple, and you do the work for them to look up the known CPUs for that target triple, they can be bothered to pick one. Then they can even choose to just pass generic (and the message may suggest so) if that's what they want. At least they will blame their own poor (command) line choices for bad performance.

These strings are small enough that we don't benefit much from std::string_view.
Comment on lines +63 to +66
llvm::errs() << "warning: LLVMTarget cpu unspecified, defaulting to "
"`generic`. Performance may suffer. Use "
"`--iree-llvmcpu-target-cpu=` or the target attribute to "
"set the target cpu.\n";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this warning should only be logged if cpuFeatures is also empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning could also mention the other flag (--iree-llvmcpu-target-cpu-features) and attribute. The logs and our docs should cover what @bjacob explained here: #18561 (comment)

On x86, people want to talk in terms of "CPU" (meaning microarchitecture) such as znver4 or cascadelake. People do not typically want to talk in terms of CPU features on x86 because that is very cumbersome.

On RISC-V, people want to talk in terms of CPU features, and there are many, but they are not too cumbersome thanks to very short names

On Arm, people typically specify baseline Arm architecture version plus a few CPU features, e.g. armv8.2-a+i8mm.

So: "you used these flags/attributes, but that is omitting some important information for getting peak performance, this is what you should do to fix that"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Want to chat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can link to docs if the message gets long: https://iree.dev/guides/deployment-configurations/cpu/
Source for docs: https://github.com/iree-org/iree/blob/main/docs/website/docs/guides/deployment-configurations/cpu.md

(Any logs that are dependent on the code running, like "list all features for the given CPU" would still want to be put here though)

@MaheshRavishankar
Copy link
Contributor

I wouldnt say I have the full context of all the moving pieces here (especially all the RISC-V stuff), but I would personally vote for making the host be the default for iree-llvmcpu-target-cpu . Thats the main use case people come with when compiling for CPU. I am not sure I fully understand the value of iree-llvmgpu-target-cpu=generic at this point. If someone is compiling for a different target-cpu then they can explicitly use that CPU and CPU features, but I think in terms of default, using host is what most folks are looking for I think?

@benvanik
Copy link
Collaborator

generic is useful for people who want to run something on a machine that is not their own - that's why it's the default for things like clang/gcc - host is only useful if you are compiling and running on the same machine and never useful if you're compiling and running separately. Defaulting to host has the danger of making every vmfb produced crash on any machine but the one it runs on and that's why AOT compilers never default to that. Frameworks/etc hosting IREE that know they are JITing can default safely, or if we want to be extremely explicit in all documentation that defaults will produce non-portable vmfbs then we could default iree-compile to it, but I really don't like debugging SIGILLs :)

@benvanik
Copy link
Collaborator

(note that there's a spectrum between "generic" and "host" - we can decide we want avx2 as a baseline in our generic config instead of emitting scalar code, etc - it's mostly a way to trade off "here's what we expect our minimum requirements for default vmfbs to be" vs "here's something that only runs on one particular machine").

@bjacob
Copy link
Contributor

bjacob commented Sep 26, 2024

I was going to advocate for least-surprisal as the overriding principle here, so I dug what compilers are doing, and unfortunately it's fragmented:

  • C/C++ compilers are of course defaulting to generic. This gives determinism (I can reproduce your command line as long as it doesn't explicitly involve host) and running everywhere, but slow by default.
  • nvcc defaults to sm_52 which means Maxwell. That's akin to what @benvanik suggests above with defaulting to something like avx2 as a baseline. That's still deterministic and still runs in 100% of currently supported devices, while providing better performance than generic.
    • The caveat here is that by providing improved performance over generic, this makes the residual performance gap less discoverable, while potentially still very large.
  • hipcc seems to be defaulting to native (i.e. host) in my experience (but I wasn't able to find docs saying so -- all I know is that just hipcc with no flags gives me all the CDNA3 stuff when invoked on such a machine, while that fails to compile on older machines).

This fragmentation means that whichever default we pick here will be surprising to some users, unfortunately.

Given that, I think I'd like to explore ways to make host/native-as-default work, but with improved diagnostics. For example, we could check immediately on loading a module that the machine we are running on supports the features required by the module. I'm still cringing thinking of the non-reproducible bug reports (iree-compile command lines will be non-reproducible by default) but I also cringe thinking of the lost performance, so we have to pick one poison.

@stellaraccident
Copy link
Collaborator

In practice, I think they only move the nvcc version up when they physically drop support in the software for anything prior (or at least I have a recollection of seeing announcements about that).

I think we just need to make the flag required. And probably add the other diagnostics and checks you mention, because it will get screwed up.

@bjacob
Copy link
Contributor

bjacob commented Sep 26, 2024

Oh yes, just make the flag required, with diagnostic, and with helpful messages - so if people would like host, they can just copy and paste a flag from the start of the explanation, which can then proceed with enumerating all the accepted values.

@bjacob
Copy link
Contributor

bjacob commented Oct 3, 2024

@ScottTodd, @benvanik, @stellaraccident, here is what I have at the moment: #18682 . It folds some good ideas from this PR (the general idea, and defaulting CPU to "" instead of "generic", and resolving host triple in LLVMTarget::create).

@ScottTodd
Copy link
Member Author

Closing in favor of #18682. Thanks @bjacob !

@ScottTodd ScottTodd closed this Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen/llvm LLVM code generation compiler backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants