[mpich-discuss] mpich2 : Fatal error in MPI_Comm_size: Invalid communicator

Samir Khanal skhanal at bgsu.edu
Mon Feb 23 10:22:44 CST 2009


Hi All

Does it make sense to compile by mpich-1.2.7 and execute using mpiexec in mpich2?

My program runs well (gets compiled and gets submitted) in Mpiexec (OSC) 0.75 and mpich 1.2.5 GCC 4.1.1 torque 1.0.1p5 x86 gentoo

I am trying to port into 64 bit cluster (i am compiling it there) with GC 4.1.2, mpiexec (OSC) 0.83, mpich2(with nemesis channel) (tried mpich 1.2.7 and open mpi) and Torque 2.3.6

Are there any obvious changes required, or the best combination on the new system.
Basically i am able to compile but not execute my code. Have spent about 3 hours on this but without any clue, tried all the combinations, the mpich-1.2.7 and mpich2's  mpiexec verion works, but only till processor's no is about 6-8, more than that there is a problem with all sorts of P4 errors.


Also 

[comet ~]$ mpiexec -n 2 ./Ring

Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(112): MPI_Comm_size(comm=0x5b, size=0x7fff4fdc906c) failed
MPI_Comm_size(70).: Invalid communicatorrank 0 in job 30  comet.cs.bgsu.edu_35155   caused collective abort of all ranks
  exit status of rank 0: killed by signal 9

What does this error mean, i get this when i use mpich2 to compile and the built in mpiexec for mpich2 to run this program

It runs well with mpich 1.2.5 and mpiexec 0.75 version.

How is 1.2.5 different from mpich2 1.0.8 (what are the precautions or possible code changes need to be done?, this is a code written in 2007)

I am totally frustrated with all this.

Thanks
Samir


More information about the mpich-discuss mailing list