This error code indicates an access violation inside MPI_Finalize(). I 
suggest you look at the core file.

if(myid == 0) {
    struct rlimit rl;
    if(getrlimit(RLIMIT_CORE, &rl) == 0) {
        if(rl.rlim_cur == 0) {
            rl.rlim_cur = rl.rlim_max;

I have an error using mpiexec (MPICH2 1.4.1p). Hope somebody may help.

The crash is random, i.e. the same executable may crash or not.


5 nodes heterogeneous cluster:

4 nodes with CARMA (CUDA on ARM) on Ubuntu 11.4: the carrier board basically 
consists of an ARM Cortex A9 processor and a Quadro 1000M NVIDIA GPU card.

1 node with one XEON E5620 processor on Windows XP + cygwin.

Standard Ethernet network.

Names of the 5 nodes:






The command line on the master node lnardi (Windows node) is:

mpiexec -channel sock -n 1 -host lnardi a.out :

-n 1 -host carma1 -path /home/lnardi/ a.out :

-n 1 -host carma2 -path /home/lnardi/ a.out :

-n 1 -host carma3 -path /home/lnardi/ a.out :

-n 1 –host carma4 -path /home/lnardi/ a.out

Notice that the same sample runs on a full linux cluster with the following 
characteristics: MVAPICH2-1.8a1p1 (mpirun) + MELLANOX infiniband + XEON 
X5675 + NVIDIA GPUs M2090 + Red Hat Enterprise Linux Server release 6.2.

I was running a more complicated code but I have reproduced the error on a 
trivial code:

#include <mpi.h>

#include <stdio.h>

#include <string.h>

#define BUFSIZE 128

#define TAG 0

int main(int argc, char *argv[])


   char idstr[32];

   char buff[BUFSIZE];

   int numprocs;

   int myid;

   int i;

   MPI_Status stat;




   if(myid == 0)


      printf("%d: We have %d processors\n", myid, numprocs);



         sprintf(buff, "Hello %d! ", i);

         MPI_Send(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD);




         MPI_Recv(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD, &stat);

         printf("%d: %s\n", myid, buff);





      MPI_Recv(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD, &stat);

      sprintf(idstr, "Processor %d ", myid);

      strncat(buff, idstr, BUFSIZE-1);

      strncat(buff, "reporting for duty\n", BUFSIZE-1);




   return 0;


The error:

0: We have 5 processors

0: Hello 1! Processor 1 reporting for duty

0: Hello 2! Processor 2 reporting for duty

0: Hello 3! Processor 3 reporting for duty

0: Hello 4! Processor 4 reporting for duty

job aborted:

rank: node: exit code[: error message]

0: lnardi: -1073741819: process 0 exited without calling finalize

1: carma1: -2

2: carma2: -2

3: carma3: -2

4: carma4: -2

I guess the problem comes from either the sock channel or mpiexec or ARM.

What do you think about?


