[mpich-discuss] mpirun - heterogeneous environment
Krishna Chaitanya
kris.c1986 at gmail.com
Tue Jul 15 22:01:35 CDT 2008
Hi,
I am trying to run an MPI code across two clusters and I have
read the man page for mpirun which describes the procedure. In one cluster,
I have 8 intel(linux) machines and in the other cluster, I have 4
sun(solaris) machines. I am able to launch the application when I combine
anyone machine from the intel cluster and 3 machines from the sun cluster.
However, if I include two (or more) intel machines and two sun machines, I
get the following error :
m_1147: p4_error: Could not gethostbyname for host intel2; may be invalid
name
: 61
p1_9009: p4_error: net_recv read: probable EOF on socket: 14
p2_2657: p4_error: net_recv recv: EOF on socket: 14
I have compiled the same file on both the clusters to create the executable
files sample.SUN and sample.SMP . I have issued the mpirun command from an
intel machine :
$> mpirun -machinefile hostfile -arch SMP -n 2 -arch SUN -n 2 sample.%a
The hostfile contains : (un-successful run)
intel1
intel2
sun2
sun3
All the machines are able to see each other,within a cluster.
The program exits successfully when I run it on any number of machines on
any one of the clusters, at a time.
What could be the problem?
Thanks,
Krishna Chaitanya K
--
In the middle of difficulty, lies opportunity
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080715/529853f5/attachment.htm>
More information about the mpich-discuss
mailing list