-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge overhead of IPM #29
Comments
Thanks for reporting the issue Dmitry. If possible can you please send a diff of your fix so we can be sure we understand your changes correctly. Also, it would be really helpful if you can point us to the simplest application or benchmark you have which reproduces the slow performance. Thanks, Chris. |
Hi Chris! The diff is below. $git diff
diff --git a/include/mod_mpi.h b/include/mod_mpi.h
index 135a558..a03b676 100755
--- a/include/mod_mpi.h
+++ b/include/mod_mpi.h
@@ -27,9 +27,7 @@ extern MPI_Group ipm_world_group;
#define IPM_MPI_MAP_RANK(rank_out_, rank_in_, comm_) \
do { \
- int comm_cmp_; \
- PMPI_Comm_compare(MPI_COMM_WORLD, comm_, &comm_cmp_); \
- if (comm_cmp_ == MPI_IDENT || rank_in_ == MPI_ANY_SOURCE) { \
+ if (comm_ == MPI_COMM_WORLD || rank_in_ == MPI_ANY_SOURCE) { \
rank_out_=rank_in_; \
} else { \
MPI_Group group_; \ |
Thanks. I'm happy to build and run NAMD. It usually works better to use the production application rather than creating a synthetic benchmark without first referring to the production application. Can you give me the smallest and shortest running NAMD test problem which has high IPM performance overhead (even if it is not scientifically meaningful)? I'm also a little puzzled because I thought NAMD uses Charm++ and not MPI for communication. Perhaps we can move our conversation to email? Chris |
Sorry to chime in late. Sounds a little like NAMD woes of years gone by.
Especially where it regards multiple COMMs you can over-run the hash table
(or send it into a woeful state) if the call pressure becomes to high.
Especially where it regards an MPI call you don't care about, e.g. it's not
pushing data, then one "out" is simply to de-stub it and use the underlying
call. I.e. don't nameshift for the problem call.
Best,
David
…On Fri, Mar 2, 2018 at 9:32 AM, cdaley ***@***.***> wrote:
Thanks. I'm happy to build and run NAMD. It usually works better to use
the production application rather than creating a synthetic benchmark
without first referring to the production application. Can you give me the
smallest and shortest running NAMD test problem which has high IPM
performance overhead (even if it is not scientifically meaningful)? I'm
also a little puzzled because I thought NAMD uses Charm++ and not MPI for
communication. Perhaps we can move our conversation to email?
Chris
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#29 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABUN4-ZL-z2hNMKaR3FqfMRmU62PU_q0ks5taYJIgaJpZM4SX260>
.
|
Hi,
You can measure time for MPI_Isend() and for MPI_Comm_compare() separately to understand overhead. Again, it's hardly possible to get anything visible with small core-count. Regards! |
Thanks Dmitry, I ran your comm.c application on Intel KNL nodes of Cori supercomputer (my only customization was to add an additional timer between MPI_Init and MPI_Finalize to measure run time with and without IPM). Cori has cray-mpich-7.6.2. I used 15 nodes with 68 MPI ranks per node to give a total of 1020 MPI ranks. I found minimal overhead added by IPM in this configuration: Without IPM: time between MPI_Init and MPI_Finalize = 0.24 seconds I then built OpenMPI-3.0.0 on Cori. There is now a definite slowdown when using IPM: I will investigate further. Which version of MPI did you use? There is no monitored See OpenMPI results:
|
Hi Chris, I haven't tried Intel MPI yet, we used OpenMPI. They may have different algorithms for MPI_Comm_compare. Each IPM_MPI_* function has this macro: Calling PMPI_Comm_compare for each function leads to huge overhead. Yes, we can move further conversation to email. I made my email public. Regards! |
Hi Dmitry and Chris, |
I see this has been opened for a while now. What is the status of this fix? |
Profiling an application working on 1024 processes with IPM 2.0.6 we get ~5x overhead.
Analysis showed that ~50% of time application spent in PMPI_Group_compare called from PMPI_Comm_compare().
This application calls a lot of MPI_Isend() for a communicator created by MPI_Cart_create(). Despite the fact that new communicator has the same size and the same process placement as MPI_COMM_WORLD, MPI_Comm_compare() doesn't return MPI_IDENT consuming a lot of computation power for comparison (might be algorithm of PMPI_Group_compare() is not optimal).
Since we still need to call PMPI_Group_translate_ranks() we could compare a communicator with MPI_COMM_WORLD. Something like this (mod_mpi.h):
This modification significantly reduces overhead but it's still huge:
wallclock time with IPM 2.0.2 - 2470s
wallclock time with IPM 2.0.6 - 3400s (with modification above)
Regards!
---Dmitry
The text was updated successfully, but these errors were encountered: