[mpich-discuss] Problem with MPE and MPICH2

Manuel Holtgrewe holtgrewe at ira.uka.de
Sun Mar 1 10:32:00 CST 2009


Okay, my problem seems to be that I am linking the program with
-mpe=mpilog. Maybe this causes MPI_Finalize() to finalize the logs a
second time.

The workaround is: Drop the MPE_Init_log() and and MPE_Finalize_log()
calls. The filename can then be selected with an environment variable
like this:

$ MPE_LOGFILE_PREFIX=myfile mpiexec -n 4 ./mpe

I know that this is the list on mpich and not MPE. However, if I am
right and this caused by finalizing the logging two times, then I
would say that it is unexpected behaviour for a program to work when
not being linked with -mpe=mpilog and failing to work when this is
activated. Some might consider this a bug.

Bests,
-- Manuel



2009/3/1 Manuel Holtgrewe <holtgrewe at ira.uka.de>:
> Hi,
>
> I have a problem using MPE with MPICH2. The following C++ program:
>
> --8<--------
> #include <cstdio>
>
> #include <mpi.h>
> #include <mpe.h>
>
> int main(int argc, char **argv)
> {
>  MPI_Init(&argc, &argv);
>  int rank;
>  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>  printf("Rank: %d\n", rank);
>
>  int res = MPE_Init_log();
>  printf("R%02d, MPE_Init_log() == %d\n", rank, res);
>
>  res = MPE_Finish_log("foo");
>  printf("R%02d, MPE_Finish_log(\"foo\") == %d\n", rank, res);
>  MPI_Finalize();
>  return 0;
> }
> --8<--------
>
> Crashes as follows:
>
> $ mpiexec -n 1 ./mpe
> Rank: 0
> R00, MPE_Init_log() == 0
> Enabling the Default clock synchronization...
> R00, MPE_Finish_log("foo") == 0
> rank 0 in job 51  **HOST**_53437   caused collective abort of all ranks
>  exit status of rank 0: killed by signal 10
>
> $ mpiexec -n 2 ./mpe
> Rank: 0
> R00, MPE_Init_log() == 0
> Rank: 1
> R01, MPE_Init_log() == 0
> Enabling the Default clock synchronization...
> R00, MPE_Finish_log("foo") == 0
> R01, MPE_Finish_log("foo") == 0
> rank 0 in job 52  **HOST**_53437   caused collective abort of all ranks
>  exit status of rank 0: killed by signal 10
>
> If I remove the "MPE_Finish_log()" line, the program does not crash.
>
> The problem occurs on Mac Os X and Linux using mpich2 1.0.8 with g++ 4.3.
>
> Bests,
> -- Manuel
>


More information about the mpich-discuss mailing list