Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip aligned allocation on SV39 MMUs #949

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

orlitzky
Copy link

An attempt at fixing #939

The SV39 detection works well enough with my sample size of 1. But I also had to add some extra #ifs to mi_os_prim_alloc_aligned() to avoid the pointless alloc/free. I don't have any 32-bit hardware any more, but this should speed things up there, too.

If the host has an SV39 MMU, the usual aligned hinting strategy will
not work despite the system being 64-bit. On RISC-V linux systems, the
type of MMU can be read from /proc/cpuinfo. If we find one, a constant
is defined that will allow us to skip the aligned hinting when the
time comes.
This constant will be defined when the host has an SV39 MMU. The usual
2TiB+ hinting strategy will not work on those systems, so we skip it,
just as we would if the system was not "64-bit."
On 32-bit or SV39 systems, _mi_os_get_aligned_hint() in src/os.c is
ifdef'd to always return NULL. As a result, the OS prim allocation
routines using that hint will fail to obtain aligned memory. When that
happens, we free() the result, and overallocate.

But this is wasteful: we know that (with high probability) the first
attempt at an aligned allocation will fail. This commit modifies
mi_os_prim_alloc_aligned() to skip the doomed alloc and subsequent
free using the same ifdef condition that is used to disable the hint.
@orlitzky
Copy link
Author

@microsoft-github-policy-service agree

@kepstin
Copy link

kepstin commented Oct 20, 2024

I've tested this on my Milk-V Jupiter system (SpacemiT M1 SoC), and I'm no longer seeing the warnings, as expected.

I'm wondering, tho, if this should be a runtime check rather than a build-time check. Are there other RISC-V MMUs where the >2TiB allocation would work? If so, this could cause an issue for distro packages which may be built on a machine with a different MMU but get run on a machine with sv39.

Edit: This build-time check definitely won't work when cross-compiling.

@orlitzky
Copy link
Author

Hey, thanks for taking a look.

I know basically nothing about cross-compiling with CMake, but with autotools, you're supposed to override the (computed) settings for the build system with the values for the host system. In this case that's still possible because the setting is passed to the compiler (or preprocessor) via -DFOO. For example, if your build machine has an SV39 MMU and the host machine does not, you could append -UMI_SV39_MMU to your CFLAGS. Conversely, if the build machine does not have an SV39 MMU but the host does, -DMI_SV39_MMU=1. There could be a simpler way to handle this in CMake, though.

The binary distro thing is more of an issue. As a highly enlightened user of a source-based distro, this had not occurred to me :P

There are indeed other RISC-V MMU types where the hinting should be enabled, and distros will not be making special binaries targeted at each MMU. Runtime detection would be better in this scenario, but it also makes the value of this proposition a lot fuzzier. We're only avoiding an alloc/free with this detection -- it's not a huge savings. If we stop to parse /proc/cpuinfo at runtime, have we killed the intended benefit when there are only a small number of mallocs being made? There is a __riscv constant we could check to avoid stating /proc/cpuinfo on other architectures at least.

@daanx
Copy link
Collaborator

daanx commented Oct 28, 2024

For now I have integrated the cmake check directly (since the other os.c modifications were not quite right -- hope this is ok :-)). I added the MI_NO_ALIGNED_HINT macro to disable aligned hinting in general. I would prefer a runtime check to enable better binary distributions but only if it does not add too much complexity... not sure if parsing /proc/cpuinfo is a bit too much at runtime... it would be nice to have a virtual_address_bits field in the mi_os_mem_config.t though.

@orlitzky
Copy link
Author

For now I have integrated the cmake check directly (since the other os.c modifications were not quite right -- hope this is ok :-)).

Yep, fine by me. How wrong are the changes to os.c? (Is it something I should try to fix?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants