[MPICH] mpich1 X mpich2 rsh troubles

Márcio Ricardo Pivello pivello at gmail.com
Thu Mar 2 08:02:45 CST 2006


Hi

I'm starting to work with mpich now, and I'm having some problems when
running an application. I'm using mpich2-1.0.2p1 to compile a code which
uses petsc 2.3.0, hypre 1.9.0b and ParMetis 3.1. The code has f77 and f90
files. Compilation runs without any problem,  but when I try to run it i
have the same error, no matters what machines are used in the cluster:

-- Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(110): MPI_Comm_size(comm=0x1, size=0xbfffe340) failed
MPI_Comm_size(69): Invalid communicator
rank 0 in job 4  no43_33215   caused collective abort of all ranks
  exit status of rank 0: return code 13

in this case, no43 is the hostname of the machine. It happens since the
first host listed in the hosts file, and the process stops immediately.
I'm using the following sequence to launch the application:

mpdboot -np 4 -h hosts --rsh=rsh
rsh <some_machine_listed_in_file_hosts>
mpdrun -np 4 ../bin/linux-gnu-opt/SolverGP.x -ksp_gmres_restart 90 -ksp_rtol
0.000001 -log_info -ksp_monitor -log_summary >& run-2p-b.log &

What is the problem here?


Thanks in Advance

-Márcio Ricardo Pivello

PS.: I tried to run this code with mpi1 1.2.6, but the code I'm trying to
compile needs the library libmpichf90.a, which I didn't find in mpich 1. Is
there any alternative for this library, or could I add this library to
mpich1 libraries?



Márcio Ricardo Pivello
Mechanical Engineer, MSc.
LTCM and CFD Lab
Mechanical Engineering School
Federal University of Uberlândia
Uberlândia - MG - Brazil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20060302/6aa4a483/attachment.htm>


More information about the mpich-discuss mailing list