[mpich-discuss] MPE logging with OpenMPI

Brian Wainscott brian at lstc.com
Thu Apr 1 19:04:50 CDT 2010


Anthony,

> 
> Brian,
> 
> Do you have a small/simple program that shows the problem ?

I was afraid someone would ask me that!  I'll try to make some time to see if I
can come up with one, but it probably won't be until next week.  My REAL program
is a commercial application of > 2M lines, so that won't do.... But hopefully I
can put together a skeleton that demonstrates the problem....

> My guess is that this MPE's MPI_Pcontrol bug may have something
> to do with MPE's internal buffer.  BTW, did you try your program
> with MPICH2's MPE, i.e. does the same problem occur with 
> MPICH2's MPE ?

I haven't tried it.  I believe we do have mpich2 installed here, so I can try
that.  That would be an acceptable alternative, since for this kind of
performance analysis I don't really care which version of MPI I'm using.

> 
> A.Chan
> 
> ----- "Brian Wainscott" <brian at lstc.com> wrote:
> 
>> I have a Fortran application running with OpenMPI 1.4 and am trying to
>> use
>> mpe2-1.1.1 and jumpshot to do some program analysis.
>>
>> My issue is with MPI_Pcontrol.  I REALLY don't want logging on the
>> whole time --
>> there is just too much stuff.  I want to start with it off, run for a
>> while, turn
>> it on briefly, then terminate.
>>
>> The problem is, if I call MPI_Pcontrol very early on, then I end up
>> with an error
>> like this after I call MPI_Pcontrol(1,ierr):
>>
>> ^@clog_commset.c:CLOG_CommSet_get_IDs() -
>>         PMPI_Comm_get_attr() fails!
>> Backtrace of the callstack at rank 3:
>> ^@      At [0]: program(CLOG_Util_abort+0x92)[0x4006a06]
>> ^@      At [1]: program(CLOG_CommSet_get_IDs+0x5f)[0x4002ad3]
>> ^@      At [2]: program(MPI_Isend+0x279)[0x3fd9e0e]
>> ^@      At [3]: program(mpi_isend_+0x6f)[0x3fbfbe0]
>>
>> Strangely, if I wait until a much later point in the program and call
>> MPI_Pcontrol(0,ierr), then it does seem to turn logging off, and I
>> don't have
>> problems.  But if I call it too soon, I get this error.  If I don't
>> call it at
>> all, of course things work fine too.
>>
>> The functions I'm calling (and the order I'm calling them in) are:
>>
>> MPI_INIT
>> (MPI_Pcontrol -- turning it off here causes errors later)
>> MPE_Log_get_state_eventIDs
>> MPE_Describe_state
>> MPE_Log_get_solo_eventID
>> MPE_Describe_event
>> (MPI_Pcontrol -- turning it off here causes errors later)
>> < let the program run through some of initialization>
>> (MPI_Pcontrol -- turning it off HERE causes errors later)
>> < let the program finish initialization and start cycling)
>> MPI_Pcontrol -- turning it off here WORKS
>> < let the program run for a while>
>> MPI_Pcontrol(1,ierr) to turn on logging
>> ..... execution, including calls to MPE_Log_event
>> MPI_Pcontrol(0,ierr) to turn logging off
>> <end program>
>>
>> This LOOKs like a bug in MPE to me -- like something is not being
>> properly
>> initialized or processed while logging is off, but which is later
>> assumed to have
>> been done?
>>
>> I also tried going into the mce source code and changing some of the
>> initial
>> flags so that logging was off by default, but that caused the same
>> error as
>> calling MPI_Pcontrol very early.
>>
>> So, am I doing something wrong (and what)?  Or who can help fix this
>> issue?
>>
>> Thanks!
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list