[mpich-discuss] Problem with MPE and MPICH2

Anthony Chan chan at mcs.anl.gov
Sun Mar 1 14:14:12 CST 2009


When linking -mpe=mpilog, you don't call MPE_Init_log/MPE_Finalize_log
(which are done by MPI_Init/MPI_Finalize already).  However, when
using -mpe=log, you'll need MPE_Init_log/MPE_Finalize_log.
You See <mpich2-xxx>/src/mpe2/README, or example programs in
<mpich2-install-dir>/share/examples_logging, e.g. cpilog.c.

A.Chan

----- "Manuel Holtgrewe" <holtgrewe at ira.uka.de> wrote:

> Okay, my problem seems to be that I am linking the program with
> -mpe=mpilog. Maybe this causes MPI_Finalize() to finalize the logs a
> second time.
> 
> The workaround is: Drop the MPE_Init_log() and and MPE_Finalize_log()
> calls. The filename can then be selected with an environment variable
> like this:
> 
> $ MPE_LOGFILE_PREFIX=myfile mpiexec -n 4 ./mpe
> 
> I know that this is the list on mpich and not MPE. However, if I am
> right and this caused by finalizing the logging two times, then I
> would say that it is unexpected behaviour for a program to work when
> not being linked with -mpe=mpilog and failing to work when this is
> activated. Some might consider this a bug.
> 
> Bests,
> -- Manuel
> 
> 
> 
> 2009/3/1 Manuel Holtgrewe <holtgrewe at ira.uka.de>:
> > Hi,
> >
> > I have a problem using MPE with MPICH2. The following C++ program:
> >
> > --8<--------
> > #include <cstdio>
> >
> > #include <mpi.h>
> > #include <mpe.h>
> >
> > int main(int argc, char **argv)
> > {
> >  MPI_Init(&argc, &argv);
> >  int rank;
> >  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> >  printf("Rank: %d\n", rank);
> >
> >  int res = MPE_Init_log();
> >  printf("R%02d, MPE_Init_log() == %d\n", rank, res);
> >
> >  res = MPE_Finish_log("foo");
> >  printf("R%02d, MPE_Finish_log(\"foo\") == %d\n", rank, res);
> >  MPI_Finalize();
> >  return 0;
> > }
> > --8<--------
> >
> > Crashes as follows:
> >
> > $ mpiexec -n 1 ./mpe
> > Rank: 0
> > R00, MPE_Init_log() == 0
> > Enabling the Default clock synchronization...
> > R00, MPE_Finish_log("foo") == 0
> > rank 0 in job 51  **HOST**_53437   caused collective abort of all
> ranks
> >  exit status of rank 0: killed by signal 10
> >
> > $ mpiexec -n 2 ./mpe
> > Rank: 0
> > R00, MPE_Init_log() == 0
> > Rank: 1
> > R01, MPE_Init_log() == 0
> > Enabling the Default clock synchronization...
> > R00, MPE_Finish_log("foo") == 0
> > R01, MPE_Finish_log("foo") == 0
> > rank 0 in job 52  **HOST**_53437   caused collective abort of all
> ranks
> >  exit status of rank 0: killed by signal 10
> >
> > If I remove the "MPE_Finish_log()" line, the program does not
> crash.
> >
> > The problem occurs on Mac Os X and Linux using mpich2 1.0.8 with g++
> 4.3.
> >
> > Bests,
> > -- Manuel
> >


More information about the mpich-discuss mailing list