[mpich-discuss] mpirun - heterogeneous environment

Krishna Chaitanya kris.c1986 at gmail.com
Tue Jul 15 22:01:35 CDT 2008


Hi,
           I am trying to run an MPI code across two clusters and I have
read the man page for mpirun which describes the procedure. In one cluster,
I have 8 intel(linux) machines and in the other cluster, I have 4
sun(solaris) machines. I am able to launch the application when I combine
anyone machine from the intel cluster and 3 machines from the sun cluster.
However, if I include two (or more)  intel machines and two sun machines, I
get the following error :

m_1147:  p4_error: Could not gethostbyname for host intel2; may be invalid
name
: 61
p1_9009:  p4_error: net_recv read:  probable EOF on socket: 14
p2_2657:  p4_error: net_recv recv:  EOF on socket: 14

I have compiled the same file on both the clusters to create the executable
files sample.SUN and sample.SMP . I have issued the mpirun command from an
intel machine :
$> mpirun -machinefile hostfile -arch SMP -n 2 -arch SUN -n 2 sample.%a
The hostfile contains : (un-successful run)
intel1
intel2
sun2
sun3

             All the machines are able to see each other,within a cluster.
The program exits successfully when I run it on any number of machines on
any one of the clusters, at a time.
What could be the problem?

Thanks,
Krishna Chaitanya K


-- 
In the middle of difficulty, lies opportunity
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080715/529853f5/attachment.htm>


More information about the mpich-discuss mailing list