[mpich-discuss] Problems with MPI_Pcontrol and MPE2

Brian Wainscott brian at lstc.com
Fri Apr 23 14:51:31 CDT 2010


Hi Chan,

I got your changes to log_mpi_core.c, and things are better....but I think not
quite right.  Now the code is blowing up when I call MPI_COMM_FREE and logging is
disabled.  In this case, the communicator being freed was created via COMM_DUP,
in case that makes any difference.  I looked through log_mpi_core, and COMM_DUP
seems to be treated like COMM_CREATE as far as I can see.  On the other hand, it
is likely this is just the first communicator I'm freeing so how it was created
may not matter.

I rebuilt MPE2 with debugging enabled, and got this for my traceback:


#0  0x0000000004052373 in CLOG_Buffer_save_header (buffer=0xcb81de0,
    commIDs=0xe9000898, thd=0, rectype=9) at clog_buffer.c:630
#1  0x0000000004052b90 in CLOG_Buffer_save_commevt (buffer=0xcb81de0,
    commIDs=0xe9000898, thd=0, etype=10, guid=0x44a28a0 "", icomm=-999999999,
    comm_rank=-1, world_rank=-1) at clog_buffer.c:900
#2  0x000000000404c070 in MPE_Log_commIDs_nullcomm (commIDs=0xe9000898,
local_thread=0,
    comm_etype=10) at mpe_log.c:224
#3  0x00000000040140a2 in MPI_Comm_free (comm=0x7fffe9000848) at log_mpi_core.c:2477


The problem seems to be that CLOG_Buffer_save_header has these lines:

    hdr->icomm       = commIDs->local_ID;
    hdr->rank        = commIDs->comm_rank;

but commIDs is not a valid memory address.  It is never properly set in the macro
MPE_LOG_INTRACOMM -- in fact, it looks as though it is known to be logging an
action for MPI_COMM_NULL, (based on the name of the function used,
MPE_Log_commIDs_nullcomm), but it is still trying to dereference this thing.

I hope that makes sense to you...

BTW -- I'm running with 2.1.1, plus your version of log_mpi_core.c.  Should I try
something newer?

Brian


> Hi Brian,
> 
> I've modified log_mpi_core.c to address this MPI_Pcontrol of MPI
> communicator function within MPE.  Could you recompile MPE by
> updating your log_mpi_core.c with
> 
> https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpe2/src/wrappers/src/log_mpi_core.c
> 
> and see if this solves your problem.
> 
> A.Chan
> 
> ----- chan at mcs.anl.gov wrote:
> 
>> > Hi Brian,
>> > 
>> > MPE logging needs to know that the user program makes communicator
>> > creation calls, e.g. MPI_Comm_create/MPI_Comm_split/MPI_Comm_dup,....
>> > otherwise any subsequent MPI calls that uses these communicators
>> > can't be logged by MPE.  There is a mechanism in MPE that bypasses
>> > the actual logging but still keeps track of communicator
>> > creation/destruction.  It is likely the mechanism has bug.
>> > Do you have a small program that shows your use of communicators
>> > so I can make sure whatever fixes that I applied will solve your
>> > problem ?
>> > 
>> > PS. Thanks for spending time to track down the problem.
>> > 
>> > A.Chan
>> > ----- "Brian Wainscott" <brian at lstc.com> wrote:
>> > 
>>> > > I posted previously with the subject "MPE logging with OpenMPI"
>>> > > describing some
>>> > > issues I was having getting MPI_Pcontrol to work.  Anthony Chan
>>> > > suggested I try
>>> > > MPICH instead of OpenMPI, which I've finally had time to do.  It
>> > also
>>> > > doesn't work.
>>> > >
>>> > > I looked through the source code for mpe2, and suspect I know the
>>> > > issue, and am
>>> > > looking for help/confirmation/hopefully a fix or workaround:
>>> > >
>>> > > According to these comments in log_mpi_core.c
>>> > > (src/mpe2/src/wrappers/src):
>>> > >
>>> > >  * MPI_Init checks for logging control options and environment
>>> > > variables,
>>> > >  * and MPI_Pcontrol allows control over logging (allowing the user
>> > to
>>> > >  * turn logging on and off).  Note that some routines are ALWAYS
>>> > > logged;
>>> > >  * principly, these are the communicator constuction routines
>> > (needed
>>> > > to
>>> > >  * avoid using the "context_id" which may not exist in some MPI
>>> > >  * implementations).
>>> > >
>>> > > and this comment:
>>> > >
>>> > > /*
>>> > >   level = 1 turns on tracing,
>>> > >   level = 0 turns it off.
>>> > >
>>> > >   Still to do: in some cases, must log communicator operations even
>>> > > if
>>> > >   logging is off.
>>> > >  */
>>> > > int MPI_Pcontrol( const int level, ... )
>>> > >
>>> > > I suspect the problem is related to a conflict with MPI_Pcontrol and
>>> > > certain
>>> > > communicator construction operations?
>>> > >
>>> > > If tried modifying the problem I am running, in such a way that it
>>> > > should not
>>> > > create many (any?) communicators after initialization, and then
>>> > > everything
>>> > > behaves as I'd like: I can call MPI_Pcontrol(0) early on, and later
>>> > > call
>>> > > MPI_Pcontrol(1) then MPI_Pcontrol(0), and get one nice window into
>> > the
>>> > > execution,
>>> > > without a LOT of stuff I'm not interested in.
>>> > >
>>> > > With my original problem, which does create communicators, I call
>>> > > MPI_Pcontrol(0)
>>> > > right after initialization, then MPI_Pcontrol(1) later, then
>>> > > immediately get this
>>> > > error:
>>> > >
>>> > > clog_commset.c:CLOG_CommSet_get_IDs() -
>>> > >         PMPI_Comm_get_attr() fails!
>>> > >
>>> > >
>>> > >
>>> > > I tried putting calls to MPI_Pcontrol(1) just before (and
>>> > > MPI_Pcontrol(0) just
>>> > > after) every call to MPI_COMM_CREATE/MPI_COMM_DUP/MPI_COMM_FREE, but
>>> > > that didn't
>>> > > work (or maybe I missed one....)  Or maybe this is a red herring,
>> > and
>>> > > the smaller
>>> > > problem ran for some other unrelated reason.
>>> > >
>>> > > Suggestions of anything else to try?
>>> > >
>>> > > Does anyone know exactly WHICH calls must always be made?  It should
>>> > > be a simple
>>> > > matter to ignore the "is_mpilog_on" flag for just a few calls, if
>> > that
>>> > > is all
>>> > > that is needed....I just need to know WHICH ones.
>>> > >
>>> > > Thanks!
>>> > >
>>> > > Brian
>>> > >
>>> > > _______________________________________________
>>> > > mpich-discuss mailing list
>>> > > mpich-discuss at mcs.anl.gov
>>> > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list