[mpich-discuss] crash mpiexec

Calin Iaru calin at dolphinics.no
Thu Nov 8 04:04:58 CST 2012


This error code indicates an access violation inside MPI_Finalize(). I 
suggest you look at the core file.

if(myid == 0) {
    struct rlimit rl;
    if(getrlimit(RLIMIT_CORE, &rl) == 0) {
        if(rl.rlim_cur == 0) {
            rl.rlim_cur = rl.rlim_max;
            setrlimit(RLIMIT_CORE,&rl);
        }
    }
}



From: NARDI Luigi
Sent: Wednesday, November 07, 2012 2:08 PM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] crash mpiexec


Hello,



I have an error using mpiexec (MPICH2 1.4.1p). Hope somebody may help.

The crash is random, i.e. the same executable may crash or not.



Context:

5 nodes heterogeneous cluster:

4 nodes with CARMA (CUDA on ARM) on Ubuntu 11.4: the carrier board basically 
consists of an ARM Cortex A9 processor and a Quadro 1000M NVIDIA GPU card.

1 node with one XEON E5620 processor on Windows XP + cygwin.

Standard Ethernet network.

Names of the 5 nodes:

lnardi

carma1

carma2

carma3

carma4



The command line on the master node lnardi (Windows node) is:

mpiexec -channel sock -n 1 -host lnardi a.out :

-n 1 -host carma1 -path /home/lnardi/ a.out :

-n 1 -host carma2 -path /home/lnardi/ a.out :

-n 1 -host carma3 -path /home/lnardi/ a.out :

-n 1 –host carma4 -path /home/lnardi/ a.out



Notice that the same sample runs on a full linux cluster with the following 
characteristics: MVAPICH2-1.8a1p1 (mpirun) + MELLANOX infiniband + XEON 
X5675 + NVIDIA GPUs M2090 + Red Hat Enterprise Linux Server release 6.2.



I was running a more complicated code but I have reproduced the error on a 
trivial code:

#include <mpi.h>

#include <stdio.h>

#include <string.h>



#define BUFSIZE 128

#define TAG 0



int main(int argc, char *argv[])

{

   char idstr[32];

   char buff[BUFSIZE];

   int numprocs;

   int myid;

   int i;



   MPI_Status stat;

   MPI_Init(&argc,&argv);

   MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

   MPI_Comm_rank(MPI_COMM_WORLD,&myid);



   if(myid == 0)

   {

      printf("%d: We have %d processors\n", myid, numprocs);

      for(i=1;i<numprocs;i++)

      {

         sprintf(buff, "Hello %d! ", i);

         MPI_Send(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD);

      }

      for(i=1;i<numprocs;i++)

      {

         MPI_Recv(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD, &stat);

         printf("%d: %s\n", myid, buff);

      }

   }

   else

   {

      MPI_Recv(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD, &stat);

      sprintf(idstr, "Processor %d ", myid);

      strncat(buff, idstr, BUFSIZE-1);

      strncat(buff, "reporting for duty\n", BUFSIZE-1);

      MPI_Send(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD);

   }



   MPI_Finalize();

   return 0;

}



The error:

0: We have 5 processors

0: Hello 1! Processor 1 reporting for duty

0: Hello 2! Processor 2 reporting for duty

0: Hello 3! Processor 3 reporting for duty

0: Hello 4! Processor 4 reporting for duty



job aborted:

rank: node: exit code[: error message]

0: lnardi: -1073741819: process 0 exited without calling finalize

1: carma1: -2

2: carma2: -2

3: carma3: -2

4: carma4: -2



I guess the problem comes from either the sock channel or mpiexec or ARM.

What do you think about?



Thanks

Dr Luigi Nardi









*******************************

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. 
If you have received this e-mail in error please notify immediately the 
sender and delete the original email received, any attachments and all 
copies from your system.

_______________________________________________
mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
To manage subscription options or unsubscribe:
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 



More information about the mpich-discuss mailing list