<div dir="ltr">Hi,<br> I am trying to run an MPI code across two clusters and I have read the man page for mpirun which describes the procedure. In one cluster, I have 8 intel(linux) machines and in the other cluster, I have 4 sun(solaris) machines. I am able to launch the application when I combine anyone machine from the intel cluster and 3 machines from the sun cluster. However, if I include two (or more) intel machines and two sun machines, I get the following error : <br>
<br><span style="color: rgb(0, 0, 153);">m_1147: p4_error: Could not gethostbyname for host intel2; may be invalid name</span><br style="color: rgb(0, 0, 153);"><span style="color: rgb(0, 0, 153);">: 61</span><br style="color: rgb(0, 0, 153);">
<span style="color: rgb(0, 0, 153);">p1_9009: p4_error: net_recv read: probable EOF on socket: 14</span><br style="color: rgb(0, 0, 153);"><span style="color: rgb(0, 0, 153);">p2_2657: p4_error: net_recv recv: EOF on socket: 14<br>
<br></span><span style="color: rgb(0, 0, 153);"><span style="color: rgb(0, 0, 0);">I
have compiled the same file on both the clusters to create the executable
files sample.SUN and sample.SMP . I have issued the mpirun command from
an intel machine :<br>
$> mpirun -machinefile hostfile -arch SMP -n 2 -arch SUN -n 2 sample.%a<br>
The hostfile contains : (un-successful run)<br>
intel1<br>
intel2<br>
sun2<br>
sun3</span></span><br><span style="color: rgb(0, 0, 153);"><span style="color: rgb(0, 0, 0);"><br> All the machines are able to see each other,within a cluster. The program exits successfully when I run it on any number of machines on any one of the clusters, at a time. <br>
What could be the problem? <br></span></span><br>Thanks,<br>Krishna Chaitanya K <br> <br clear="all"><br>-- <br>In the middle of difficulty, lies opportunity
</div>