-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't see MPI balance and other MPI statistics #14
Comments
Hi Ordi -
|
Dear Scott, thank you for your fast reply!
Sincerely, |
Hi Ordi, Thanks for the follow-up. Good to see that things are otherwise working. Could you please also try setting:
before running your application? Cheers, |
This already done, I forgot to mention it. воскресенье, 1 мая 2016 г. пользователь Scott French написал:
|
Hi Scott, I have tried This is result with IPM_log=full:
best regards, |
Hi Ordi, Thank for the clarification. If you're certain that you're setting I'll think about this a bit more and get back to you. Cheers, |
Dear Scott, |
Hello, I am currently looking at IPM as we might want to deploy it large scale when Cori Phase II arrives. http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-modularization.html http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-modularization.html This means that we could extend IPM to work as a darshan module. If it is not much work, I would be willing to do that. However, I do not understand darshan or IPM well enough to judge this. Can someone Best Regards |
@OrdiTader - Many thanks for the update! Interesting to hear that the set of optimization flags used when building your application caused problems for the instrumentation... This might suggest that something odd was happening to the symbols we try to intercept (e.g. IPO related?). Would you mind sharing the optimization flags you were originally using? @azrael417 - Sounds like an interesting project! Would you mind opening a new issue for it? |
Hi Scott, yes, I can do that. I had a look at darshan modules for hdf5, bgq and netcdf. They are very elegant and lightweight. If we want to do that, we should start porting IPM step by step, the most important functionalities first. Basically from what I understood is that one has to provide a module shutdown functionality, which activates a mutex lock to perform a safe shutdown. This can be standardized and found in the darshan examples (clone https://xgitlab.cels.anl.gov/darshan/darshan.git and look e.g. into darshan-runtime/lib/darshan-hdf5.c): int mem_limit; and the function to which these pointers point to do certain things such as broadcasting the measurements, cleaning up, finishing stuff, (de)activating mutexes, etc. The function which are wrapped will get qualifiers, in case of HDF5: DARSHAN_FORWARD_DECL(H5Fcreate, hid_t, (const char *filename, unsigned flags, hid_t create_plist, hid_t access_plist)); That is basically everything. The locking is also relatively simple to achieve, one just needs to define something like: #define IPM_LOCK() pthread_mutex_lock(&ipm_runtime_mutex) and used before and after initialization and shutdown. However, IPM is a big package with lots of modules, so I would need your experience of what is most important (MPI calls and hardware counters) and port that first to see if it works. Best Regards
|
Hi Thorsten I don't think it's a one to two weeks project. Mostly because it's going to be fiddly to get all the book keeping aspects Probably the first step is to take a look at what you think it would take Thanks, Nick. Hello, I am currently looking at IPM as we might want to deploy it large scale http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-modularization.html This means that we could extend IPM to work as a darshan module. If it is Best Regards |
Hu Nick, Thanks for your thoughts. We do not need to port everything, basically perfcounts and MPI usage. I see that there us Cuda, some graphics stuff (I don't know what that is for) and other things we do not need at the moment. What do you think? Best
|
Dear admin (Scott),
I have IPM configured at Haswell CPU with dynamic preload.
Final picture contains empty Communication Event Statistics and communication balance picture, while there where 56 MPI tasks with MPI_reduce and Barrier operations.
Could you please tell me, what am I doing wrong?
Thank you for your time!
Please, find attached final html, if needed.
Other details:
Configuration options:
./configure --prefix=/home/ordi/inst/IPM CC=icc CXX=icpc --with-papi=/usr --with-libunwind=/usr CXX=icpc MPICC=mpiicc MPIFC=mpiifort --enable-shared LDFLAGS=-lpthread --enable-coll-details
Configuration:
IPM configuration:
MPI profiling enabled : yes
POSIX-I/O profiling enabled : no
PAPI enabled : yes
use libunwind : yes
CFLAGS : -DHAVE_DYNLOAD -DIPM_COLLECTIVE_DETAILS=1 -I/usr/include -I/usr/include -DOS_LINUX
LDFLAGS : -lpthread -L/usr/lib -Wl,-rpath=/usr/lib -L/usr/lib
LIBS : -lpapi -lunwind
MPI_STATUS_COUNT : count_lo
Fortran underscore :
Building IPM Parser : no
I set environment vars:
export IPM_HPM=PAPI_TOT_INS,PAPI_L1_DCM,PAPI_TOT_CYC
export IPM_REPORT=full
run program with preload:
LD_PRELOAD=/home/ordi/inst/IPM/lib/libipm.so mpirun -np 56 ./wemig2dmpi
and set
export IPM_KEYFILE=/home/ordi/inst/IPM/etc/ipm_key_mpi
result.tar.gz
and parse:
/home/ordi/inst/IPM/bin/ipm_parse -html wemig56.xml
Best regards,
ordi
The text was updated successfully, but these errors were encountered: