-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openmpi doesn't have support for cross-memory attach #118
Comments
More details here mpi4py/mpi4py#332 (reply in thread) |
@jsquyres Sorry, I need your input again. I had a quick look to ompi's That's the problem with the conda-forge binaries, they are built in Azure Pipelines within a container without the required support. I'm not an autotools expert, but looking at ompi's The other situation can also happen. You use the openmpi package from a Linux distro that was built in bare metal with CMA enabled, but then you run apps within a container, things do not work out of the box, and CMA has to be explicitly disabled (example). Would it be possible for Open MPI to move the try_run check to runtime, and if the syscalls cannot be used, then disable CMA use at runtime (maybe with a warning)? I understand other things work that way (eg. CUDA support), why not CMA support?. |
The vast majority of Open MPI's infrastructure assumes that the environment where The alternative is for Open MPI to essentially duplicate the behavior of the run-time linker (via Put differently: this is simply the nature of distributing binaries. You're building binaries in environment X and hoping that they are suitable for environment Y. In many (most?) cases, it's good enough. You've unfortunately run into a corner case where there's a pretty big performance impact because X != Y. That being said, it is true that Open MPI has one glaring exception to what I said above: CUDA. This was a bit of a debate in Open MPI at the time when it was developed, for the reasons I cited above. However, we ultimately did implement a much-simplified "load CUDA at run time" mechanism for two reasons:
I will say that it took a number of iterations before the "load CUDA at run-time" code worked in all cases. It's difficult code to write, and is even more tricky to maintain over time (as APIs are added, removed, or even -- shrudder -- changed). |
@jsquyres I totally understand your point. However, maybe the case of CMA is extremely simple, much simpler than CUDA. CMA can be invoked via syscalls, look at your own Anyway, I'm doing with @jsquyres what I hate others doing with me: asking for features without offering my time to work on them. If @YarShev is willing to offer some of his time to work on this, then I may consider working on a patch to submit upstream. Otherwise, I'm not going to push this thing any further, I don't have a strong personal interest in it. |
FWIW there are Currently Still it should be possible to use the CUDA builds of |
As part of our work on unidist we did measurements for 1. Open MPI built from source, 2. ucx-enabled Open MPI from conda-forge, 3. ucx-disabled Open MPI from conda-forge. The order of Open MPI versions I mentioned corresponds to the timings the versions show. Open MPI built from source is the fastest among them, then ucx-enabled Open MPI from conda-forge goes, and ucx-disabled Open MPI from conda-forge is slower than both the formers. I wonder what transport does ucx-enabled Open MPI from conda-forge have? |
Solution to issue cannot be found in the documentation.
Issue
I installed openmpi from conda-forge and it doesn't have support for cross-memory attach.
Installed packages
Environment info
The text was updated successfully, but these errors were encountered: