[mpich-discuss] General Questions

Dave Goodell goodell at mcs.anl.gov
Mon Feb 23 11:09:06 CST 2009


On Feb 23, 2009, at 10:49 AM, Samir Khanal wrote:

> Does it make sense to compile by mpich-1.2.7 and execute using  
> mpiexec in mpich2?

No.  If you build with MPICH then you should use the MPICH mpiexec.   
If you build with MPICH2 then you should use the MPICH2 mpiexec.  They  
are not interchangeable.

In general, you should use MPICH2 and not MPICH.  MPICH is very old  
and not actively supported.  MPICH2 is actively developed and supported.

> My program runs well (gets compiled and gets submitted) in Mpiexec  
> (OSC) 0.75 and mpich 1.2.5 GCC 4.1.1 torque 1.0.1p5 x86 gentoo

The OSC mpiexec is a third mpiexec in this set.  It is distinct from  
the mpiexecs that are included in MPICH and MPICH2, although it will  
work with both as far as I know.

> I am trying to port into 64 bit cluster (i am compiling it there)  
> with GC 4.1.2, mpiexec (OSC) 0.83, mpich2(with nemesis channel)  
> (tried mpich 1.2.7 and open mpi) and Torque 2.3.6
>
> Are there any obvious changes required, or the best combination on  
> the new system.
> Basically i am able to compile but not execute my code. Have spent  
> about 3 hours on this but without any clue, tried all the  
> combinations, the mpich-1.2.7 and mpich2's  mpiexec verion works,  
> but only till processor's no is about 6-8, more than that there is a  
> problem with all sorts of P4 errors.
>
>
> Also
>
> [comet ~]$ mpiexec -n 2 ./Ring
>
> Fatal error in MPI_Comm_size: Invalid communicator, error stack:
> MPI_Comm_size(112): MPI_Comm_size(comm=0x5b, size=0x7fff4fdc906c)  
> failed
> MPI_Comm_size(70).: Invalid communicatorrank 0 in job 30   
> comet.cs.bgsu.edu_35155   caused collective abort of all ranks
>  exit status of rank 0: killed by signal 9
>
> What does this error mean, i get this when i use mpich2 to compile  
> and the built in mpiexec for mpich2 to run this program

The "comm=0x5b" indicates that the program has probably been compiled  
with an mpi.h or mpif.h belonging to a different mpi implementation.   
Make sure that you are using the MPICH2 mpicc/mpif77/mpif90.

> It runs well with mpich 1.2.5 and mpiexec 0.75 version.
>
> How is 1.2.5 different from mpich2 1.0.8 (what are the precautions  
> or possible code changes need to be done?, this is a code written in  
> 2007)
>
> I am totally frustrated with all this.

It's hard to say what changes you might need to make.  MPICH2 works  
fine on both 32 and 64-bit systems, but your application may not.  You  
will have problems anywhere that type sizes are assumed rather than  
checked or if you are not using MPI datatypes correctly.   
Unfortunately you will have to debug each problem on a case by case  
basis, as there is no good general advice that we can give you.

-Dave



More information about the mpich-discuss mailing list