Can't see MPI balance and other MPI statistics #14

ghost · 2016-04-29T16:58:02Z

Dear admin (Scott),

I have IPM configured at Haswell CPU with dynamic preload.
Final picture contains empty Communication Event Statistics and communication balance picture, while there where 56 MPI tasks with MPI_reduce and Barrier operations.

Could you please tell me, what am I doing wrong?

Thank you for your time!

Please, find attached final html, if needed.

Other details:

Configuration options:
./configure --prefix=/home/ordi/inst/IPM CC=icc CXX=icpc --with-papi=/usr --with-libunwind=/usr CXX=icpc MPICC=mpiicc MPIFC=mpiifort --enable-shared LDFLAGS=-lpthread --enable-coll-details
Configuration:

IPM configuration:
MPI profiling enabled : yes
POSIX-I/O profiling enabled : no
PAPI enabled : yes
use libunwind : yes
CFLAGS : -DHAVE_DYNLOAD -DIPM_COLLECTIVE_DETAILS=1 -I/usr/include -I/usr/include -DOS_LINUX
LDFLAGS : -lpthread -L/usr/lib -Wl,-rpath=/usr/lib -L/usr/lib
LIBS : -lpapi -lunwind
MPI_STATUS_COUNT : count_lo
Fortran underscore :
Building IPM Parser : no

I set environment vars:
export IPM_HPM=PAPI_TOT_INS,PAPI_L1_DCM,PAPI_TOT_CYC
export IPM_REPORT=full

run program with preload:
LD_PRELOAD=/home/ordi/inst/IPM/lib/libipm.so mpirun -np 56 ./wemig2dmpi
and set
export IPM_KEYFILE=/home/ordi/inst/IPM/etc/ipm_key_mpi
result.tar.gz
and parse:
/home/ordi/inst/IPM/bin/ipm_parse -html wemig56.xml

Best regards,
ordi

swfrench · 2016-04-30T19:18:50Z

Hi Ordi -
Two quick questions for you:

Which MPI bindings does wemig2dmpi use: Fortran or C?
Do you see per-call MPI statistics printed to stdout when wemig2dmpi terminates?
Thanks,
Scott

ghost · 2016-05-01T07:58:36Z

Dear Scott,

thank you for your fast reply!

wemig2dmpi use C. Intel icc and mpiicc, to be more precise.
Yes, I can see statistic, it looks like:

IPMv2.0.5########################################################

command : ./wemig2dmpi
start : Sun May 01 10:49:16 2016 host : kp1
stop : Sun May 01 10:53:44 2016 wallclock : 267.76
mpi_tasks : 64 on 1 nodes %comm : 80.28
mem [GB] : 2.60 gflop/sec : 0.00
       :       [total]        <avg>          min          max
wallclock : 17056.15 266.50 266.25 267.76
MPI : 13693.28 213.96 186.40 222.18
%wall :
MPI : 80.28 69.84 83.44
#calls :
MPI : 614528 9602 9602 9602
mem [GB] : 2.60 0.04 0.04 0.04
                         [time]        [count]        <%wall>
MPI_Barrier 11991.79 307456 70.31
MPI_Reduce 1701.49 306816 9.98
MPI_Comm_rank 0.00 64 0.00
MPI_Comm_size 0.00 64 0.00
MPI_Init 0.00 64 0.00
MPI_Finalize 0.00 64 0.00

Sincerely,
Ordi

swfrench · 2016-05-01T17:28:33Z

Hi Ordi,

Thanks for the follow-up. Good to see that things are otherwise working. Could you please also try setting:

export IPM_LOG=full

before running your application?

Cheers,
Scott

ghost · 2016-05-01T19:59:01Z

This already done, I forgot to mention it.

воскресенье, 1 мая 2016 г. пользователь Scott French написал:

Hi Ordi,

Thanks for the follow-up. Good to see that things are otherwise working.
Could you please also try setting:

export IPM_LOG=full

before running your application?

Cheers,
Scott

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#14 (comment)

ghost · 2016-05-02T08:45:54Z

Hi Scott,
Sorry, yesterday I've answered without details:

I have tried
$ export IPM_LOG=full
$ export IPM_REPORT_MEM=yes
$ export IPM_REPORT=full

This is result with IPM_log=full:

IPMv2.0.5########################################################

command : ./wemig2dmpi
start : Mon May 02 11:41:21 2016 host : kp1
stop : Mon May 02 11:43:44 2016 wallclock : 143.13
mpi_tasks : 14 on 1 nodes %comm : 1.49
mem [GB] : 0.56 gflop/sec : 0.00
       :       [total]        <avg>          min          max
wallclock : 2003.70 143.12 143.12 143.13
MPI : 29.85 2.13 0.97 2.84
%wall :
MPI : 1.49 0.68 1.98
#calls :
MPI : 694078 49577 49577 49577
mem [GB] : 0.56 0.04 0.04 0.04

best regards,
Ordi

swfrench · 2016-05-03T04:28:17Z

Hi Ordi,

Thank for the clarification. If you're certain that you're setting IPM_LOG to "full" prior to executing your instrumented application, then I am a bit puzzled as to why ipm_parse is not listing anything in the "Communication Event Statistics" section of the generated report.

I'll think about this a bit more and get back to you.

Cheers,
Scott

ghost · 2016-05-03T07:27:07Z

Dear Scott,
yes, I'm certain that I've set IPM_LOG=full before executing.
But I found some solution. I got "Communication Event Statistics" filled, when re-compiled my application with -O2 and -g keys, and excluded some intel optimization keys (-use-intel-optimized-headers and -opt-prefetch). Then I executed profile after re-compilation and got statistic.

azrael417 · 2016-05-06T03:13:43Z

Hello,

I am currently looking at IPM as we might want to deploy it large scale when Cori Phase II arrives.
However, a requirement is that Darshan works flawlessly with IPM. Talking to the Darshan developers, they said that darshan 3.0 supports modules:

http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-modularization.html http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-modularization.html

This means that we could extend IPM to work as a darshan module. If it is not much work, I would be willing to do that. However, I do not understand darshan or IPM well enough to judge this. Can someone
who understands IPM better have a look at the darshan modularization page and tell me if this is a lot of effort, or if this can be done rather quickly (quickly means on 1-2 week timescale or so).

Best Regards
Thorsten Kurth

swfrench · 2016-05-06T04:01:24Z

@OrdiTader - Many thanks for the update! Interesting to hear that the set of optimization flags used when building your application caused problems for the instrumentation... This might suggest that something odd was happening to the symbols we try to intercept (e.g. IPO related?). Would you mind sharing the optimization flags you were originally using?

@azrael417 - Sounds like an interesting project! Would you mind opening a new issue for it?

azrael417 · 2016-05-06T04:10:36Z

Hi Scott,

yes, I can do that. I had a look at darshan modules for hdf5, bgq and netcdf. They are very elegant and lightweight. If we want to do that, we should start porting IPM step by step, the most important functionalities first.

Basically from what I understood is that one has to provide a module shutdown functionality, which activates a mutex lock to perform a safe shutdown. This can be standardized and found in the darshan examples (clone https://xgitlab.cels.anl.gov/darshan/darshan.git and look e.g. into darshan-runtime/lib/darshan-hdf5.c):

int mem_limit;
struct darshan_module_funcs hdf5_mod_fns =
{
.begin_shutdown = &hdf5_begin_shutdown,
.get_output_data = &hdf5_get_output_data,
.shutdown = &hdf5_shutdown
};

and the function to which these pointers point to do certain things such as broadcasting the measurements, cleaning up, finishing stuff, (de)activating mutexes, etc.

The function which are wrapped will get qualifiers, in case of HDF5:

DARSHAN_FORWARD_DECL(H5Fcreate, hid_t, (const char *filename, unsigned flags, hid_t create_plist, hid_t access_plist));
DARSHAN_FORWARD_DECL(H5Fopen, hid_t, (const char *filename, unsigned flags, hid_t access_plist));
DARSHAN_FORWARD_DECL(H5Fclose, herr_t, (hid_t file_id));

That is basically everything. The locking is also relatively simple to achieve, one just needs to define something like:

#define IPM_LOCK() pthread_mutex_lock(&ipm_runtime_mutex)
#define IPM_UNLOCK() pthread_mutex_unlock(&ipm_runtime_mutex)

and used before and after initialization and shutdown.

However, IPM is a big package with lots of modules, so I would need your experience of what is most important (MPI calls and hardware counters) and port that first to see if it works.

Best Regards
Thorsten Kurth

Am 05.05.2016 um 21:01 schrieb Scott French notifications@github.com:

@OrdiTader https://github.com/OrdiTader - Many thanks for the update! Interesting to hear that the set of optimization flags used when building your application caused problems for the instrumentation... This might suggest that something odd was happening to the symbols we try to intercept (e.g. IPO related?). Would you mind sharing the optimization flags you were originally using?

@azrael417 https://github.com/azrael417 - Sounds like an interesting project! Would you mind opening a new issue for it?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub #14 (comment)

azrael417 · 2016-05-06T14:44:42Z

Hi Thorsten

I don't think it's a one to two weeks project.

Mostly because it's going to be fiddly to get all the book keeping aspects
between Darshan and IPM aligned. Hard to say for sure.

Probably the first step is to take a look at what you think it would take
and let us know your plans so we can discuss.

Thanks,

Nick.
On May 5, 2016 8:13 PM, "Thorsten Kurth" tkurth@lbl.gov wrote:

Hello,

I am currently looking at IPM as we might want to deploy it large scale
when Cori Phase II arrives.
However, a requirement is that Darshan works flawlessly with IPM. Talking
to the Darshan developers, they said that darshan 3.0 supports modules:

http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-modularization.html

This means that we could extend IPM to work as a darshan module. If it is
not much work, I would be willing to do that. However, I do not understand
darshan or IPM well enough to judge this. Can someone
who understands IPM better have a look at the darshan modularization page
and tell me if this is a lot of effort, or if this can be done rather
quickly (quickly means on 1-2 week timescale or so).

Best Regards
Thorsten Kurth

azrael417 · 2016-05-06T14:57:58Z

Hu Nick,

Thanks for your thoughts. We do not need to port everything, basically perfcounts and MPI usage. I see that there us Cuda, some graphics stuff (I don't know what that is for) and other things we do not need at the moment. What do you think?

Best
Thorsten

Am 06.05.2016 um 07:44 schrieb Nick Wright njwright@lbl.gov:

Hi Thorsten

I don't think it's a one to two weeks project.

Mostly because it's going to be fiddly to get all the book keeping aspects between Darshan and IPM aligned. Hard to say for sure.

Probably the first step is to take a look at what you think it would take and let us know your plans so we can discuss.

Thanks,

Nick.

On May 5, 2016 8:13 PM, "Thorsten Kurth" tkurth@lbl.gov wrote:
Hello,

I am currently looking at IPM as we might want to deploy it large scale when Cori Phase II arrives.
However, a requirement is that Darshan works flawlessly with IPM. Talking to the Darshan developers, they said that darshan 3.0 supports modules:

http://www.mcs.anl.gov/research/projects/darshan/docs/darshan-modularization.html

This means that we could extend IPM to work as a darshan module. If it is not much work, I would be willing to do that. However, I do not understand darshan or IPM well enough to judge this. Can someone
who understands IPM better have a look at the darshan modularization page and tell me if this is a lot of effort, or if this can be done rather quickly (quickly means on 1-2 week timescale or so).

Best Regards
Thorsten Kurth

jscarretero mentioned this issue May 24, 2016

ipm_parse does not generate correct HTML file #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't see MPI balance and other MPI statistics #14

Can't see MPI balance and other MPI statistics #14

ghost commented Apr 29, 2016

swfrench commented Apr 30, 2016 •

edited

Loading

ghost commented May 1, 2016 •

edited by ghost

Loading

swfrench commented May 1, 2016

ghost commented May 1, 2016

ghost commented May 2, 2016 •

edited by ghost

Loading

swfrench commented May 3, 2016

ghost commented May 3, 2016

azrael417 commented May 6, 2016

swfrench commented May 6, 2016

azrael417 commented May 6, 2016

azrael417 commented May 6, 2016

azrael417 commented May 6, 2016

Can't see MPI balance and other MPI statistics #14

Can't see MPI balance and other MPI statistics #14

Comments

ghost commented Apr 29, 2016

swfrench commented Apr 30, 2016 • edited Loading

ghost commented May 1, 2016 • edited by ghost Loading

swfrench commented May 1, 2016

ghost commented May 1, 2016

ghost commented May 2, 2016 • edited by ghost Loading

swfrench commented May 3, 2016

ghost commented May 3, 2016

azrael417 commented May 6, 2016

swfrench commented May 6, 2016

azrael417 commented May 6, 2016

azrael417 commented May 6, 2016

azrael417 commented May 6, 2016

swfrench commented Apr 30, 2016 •

edited

Loading

ghost commented May 1, 2016 •

edited by ghost

Loading

ghost commented May 2, 2016 •

edited by ghost

Loading