[mpich-discuss] Problems with MPI_Pcontrol and MPE2

Anthony Chan chan at mcs.anl.gov
Fri Apr 23 15:17:42 CDT 2010



I assume you are getting segfault when MPI_Comm_dup wasn't logged, 
was MPI_Comm_free() of the dup'ed communicator not being logged as well ?

----- "Brian Wainscott" <brian at lstc.com> wrote:

> Hi Chan,
> 
> I got your changes to log_mpi_core.c, and things are better....but I
> think not
> quite right.  Now the code is blowing up when I call MPI_COMM_FREE and
> logging is
> disabled.  In this case, the communicator being freed was created via
> COMM_DUP,
> in case that makes any difference.  I looked through log_mpi_core, and
> COMM_DUP
> seems to be treated like COMM_CREATE as far as I can see.  On the
> other hand, it
> is likely this is just the first communicator I'm freeing so how it
> was created
> may not matter.
> 
> I rebuilt MPE2 with debugging enabled, and got this for my traceback:
> 
> 
> #0  0x0000000004052373 in CLOG_Buffer_save_header (buffer=0xcb81de0,
>     commIDs=0xe9000898, thd=0, rectype=9) at clog_buffer.c:630
> #1  0x0000000004052b90 in CLOG_Buffer_save_commevt (buffer=0xcb81de0,
>     commIDs=0xe9000898, thd=0, etype=10, guid=0x44a28a0 "",
> icomm=-999999999,
>     comm_rank=-1, world_rank=-1) at clog_buffer.c:900
> #2  0x000000000404c070 in MPE_Log_commIDs_nullcomm
> (commIDs=0xe9000898,
> local_thread=0,
>     comm_etype=10) at mpe_log.c:224
> #3  0x00000000040140a2 in MPI_Comm_free (comm=0x7fffe9000848) at
> log_mpi_core.c:2477
> 
> 
> The problem seems to be that CLOG_Buffer_save_header has these lines:
> 
>     hdr->icomm       = commIDs->local_ID;
>     hdr->rank        = commIDs->comm_rank;
> 
> but commIDs is not a valid memory address.  It is never properly set
> in the macro
> MPE_LOG_INTRACOMM -- in fact, it looks as though it is known to be
> logging an
> action for MPI_COMM_NULL, (based on the name of the function used,
> MPE_Log_commIDs_nullcomm), but it is still trying to dereference this
> thing.
> 
> I hope that makes sense to you...
> 
> BTW -- I'm running with 2.1.1, plus your version of log_mpi_core.c. 
> Should I try
> something newer?
> 
> Brian
> 
> 
> > Hi Brian,
> > 
> > I've modified log_mpi_core.c to address this MPI_Pcontrol of MPI
> > communicator function within MPE.  Could you recompile MPE by
> > updating your log_mpi_core.c with
> > 
> >
> https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpe2/src/wrappers/src/log_mpi_core.c
> > 
> > and see if this solves your problem.
> > 
> > A.Chan
> > 
> > ----- chan at mcs.anl.gov wrote:
> > 
> >> > Hi Brian,
> >> > 
> >> > MPE logging needs to know that the user program makes
> communicator
> >> > creation calls, e.g.
> MPI_Comm_create/MPI_Comm_split/MPI_Comm_dup,....
> >> > otherwise any subsequent MPI calls that uses these communicators
> >> > can't be logged by MPE.  There is a mechanism in MPE that
> bypasses
> >> > the actual logging but still keeps track of communicator
> >> > creation/destruction.  It is likely the mechanism has bug.
> >> > Do you have a small program that shows your use of communicators
> >> > so I can make sure whatever fixes that I applied will solve your
> >> > problem ?
> >> > 
> >> > PS. Thanks for spending time to track down the problem.
> >> > 
> >> > A.Chan
> >> > ----- "Brian Wainscott" <brian at lstc.com> wrote:
> >> > 
> >>> > > I posted previously with the subject "MPE logging with
> OpenMPI"
> >>> > > describing some
> >>> > > issues I was having getting MPI_Pcontrol to work.  Anthony
> Chan
> >>> > > suggested I try
> >>> > > MPICH instead of OpenMPI, which I've finally had time to do. 
> It
> >> > also
> >>> > > doesn't work.
> >>> > >
> >>> > > I looked through the source code for mpe2, and suspect I know
> the
> >>> > > issue, and am
> >>> > > looking for help/confirmation/hopefully a fix or workaround:
> >>> > >
> >>> > > According to these comments in log_mpi_core.c
> >>> > > (src/mpe2/src/wrappers/src):
> >>> > >
> >>> > >  * MPI_Init checks for logging control options and
> environment
> >>> > > variables,
> >>> > >  * and MPI_Pcontrol allows control over logging (allowing the
> user
> >> > to
> >>> > >  * turn logging on and off).  Note that some routines are
> ALWAYS
> >>> > > logged;
> >>> > >  * principly, these are the communicator constuction routines
> >> > (needed
> >>> > > to
> >>> > >  * avoid using the "context_id" which may not exist in some
> MPI
> >>> > >  * implementations).
> >>> > >
> >>> > > and this comment:
> >>> > >
> >>> > > /*
> >>> > >   level = 1 turns on tracing,
> >>> > >   level = 0 turns it off.
> >>> > >
> >>> > >   Still to do: in some cases, must log communicator operations
> even
> >>> > > if
> >>> > >   logging is off.
> >>> > >  */
> >>> > > int MPI_Pcontrol( const int level, ... )
> >>> > >
> >>> > > I suspect the problem is related to a conflict with
> MPI_Pcontrol and
> >>> > > certain
> >>> > > communicator construction operations?
> >>> > >
> >>> > > If tried modifying the problem I am running, in such a way
> that it
> >>> > > should not
> >>> > > create many (any?) communicators after initialization, and
> then
> >>> > > everything
> >>> > > behaves as I'd like: I can call MPI_Pcontrol(0) early on, and
> later
> >>> > > call
> >>> > > MPI_Pcontrol(1) then MPI_Pcontrol(0), and get one nice window
> into
> >> > the
> >>> > > execution,
> >>> > > without a LOT of stuff I'm not interested in.
> >>> > >
> >>> > > With my original problem, which does create communicators, I
> call
> >>> > > MPI_Pcontrol(0)
> >>> > > right after initialization, then MPI_Pcontrol(1) later, then
> >>> > > immediately get this
> >>> > > error:
> >>> > >
> >>> > > clog_commset.c:CLOG_CommSet_get_IDs() -
> >>> > >         PMPI_Comm_get_attr() fails!
> >>> > >
> >>> > >
> >>> > >
> >>> > > I tried putting calls to MPI_Pcontrol(1) just before (and
> >>> > > MPI_Pcontrol(0) just
> >>> > > after) every call to
> MPI_COMM_CREATE/MPI_COMM_DUP/MPI_COMM_FREE, but
> >>> > > that didn't
> >>> > > work (or maybe I missed one....)  Or maybe this is a red
> herring,
> >> > and
> >>> > > the smaller
> >>> > > problem ran for some other unrelated reason.
> >>> > >
> >>> > > Suggestions of anything else to try?
> >>> > >
> >>> > > Does anyone know exactly WHICH calls must always be made?  It
> should
> >>> > > be a simple
> >>> > > matter to ignore the "is_mpilog_on" flag for just a few calls,
> if
> >> > that
> >>> > > is all
> >>> > > that is needed....I just need to know WHICH ones.
> >>> > >
> >>> > > Thanks!
> >>> > >
> >>> > > Brian
> >>> > >
> >>> > > _______________________________________________
> >>> > > mpich-discuss mailing list
> >>> > > mpich-discuss at mcs.anl.gov
> >>> > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list