And here is the output of hostname on beowulf.node1<br><br>[root@beowulf ~]# ssh 172.16.20.32<br>Last login: Wed May 2 05:33:27 2012 from beowulf.master<br>[root@beowulf ~]# hostname<br>beowulf.node1<br>[root@beowulf ~]#<br>
<br><br><div class="gmail_quote">On Tue, May 1, 2012 at 11:57 PM, Pavan Balaji <span dir="ltr"><<a href="mailto:balaji@mcs.anl.gov" target="_blank">balaji@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Can you run "hostname" on beowulf.node1 and see what is returns? Also, can you send us the output of:<br>
<br>
mpiexec -verbose -f hosts -n 4 /opt/mpich2-1.4.1p1/examples/.<u></u>/cpi<br>
<br>
-- Pavan<div class="im"><br>
<br>
On 05/01/2012 01:25 PM, Albert Spade wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
Yes I am sure.<br>
I created one file by name hosts in /root and its contents are<br>
beowulf.master<br>
beowulf.node1<br>
beowulf.node2<br>
beowulf.node3<br>
beowulf.node4<br>
<br>
I have one more file in /etc by name hosts and its contents are:<br>
<br>
[root@beowulf ~]# cat /etc/hosts<br>
127.0.0.1 localhost localhost.localdomain localhost4<br>
localhost4.localdomain4<br>
::1 localhost localhost.localdomain localhost6<br>
localhost6.localdomain6<br>
172.16.20.31 beowulf.master<br>
172.16.20.32 beowulf.node1<br>
172.16.20.33 beowulf.node2<br>
172.16.20.34 beowulf.node3<br>
172.16.20.35 beowulf.node4<br>
[root@beowulf ~]#<br>
<br>
On Tue, May 1, 2012 at 11:15 PM, Pavan Balaji <<a href="mailto:balaji@mcs.anl.gov" target="_blank">balaji@mcs.anl.gov</a><br></div><div class="im">
<mailto:<a href="mailto:balaji@mcs.anl.gov" target="_blank">balaji@mcs.anl.gov</a>>> wrote:<br>
<br>
<br>
On 05/01/2012 12:39 PM, Albert Spade wrote:<br>
<br>
[root@beowulf ~]# mpiexec -f hosts -n 4<br></div>
/opt/mpich2-1.4.1p1/examples/.<u></u>__/cpi<div class="im"><br>
Process 0 of 4 is on beowulf.master<br>
Process 3 of 4 is on beowulf.master<br>
Process 1 of 4 is on beowulf.master<br>
Process 2 of 4 is on beowulf.master<br>
Fatal error in PMPI_Reduce: Other MPI error, error stack:<br></div>
PMPI_Reduce(1270).............<u></u>__..: MPI_Reduce(sbuf=0xbff0fd08,<div class="im"><br>
rbuf=0xbff0fd00, count=1, MPI_DOUBLE, MPI_SUM, root=0,<br>
MPI_COMM_WORLD)<br>
failed<br></div>
MPIR_Reduce_impl(1087)........<u></u>__..:<br>
MPIR_Reduce_intra(895)........<u></u>__..:<br>
MPIR_Reduce_binomial(144).....<u></u>__..:<br>
MPIDI_CH3U_Recvq_FDU_or_AEP(__<u></u>380): Communication error with rank 2<br>
MPIR_Reduce_binomial(144).....<u></u>__..:<br>
MPIDI_CH3U_Recvq_FDU_or_AEP(__<u></u>380): Communication error with rank 1<div class="im"><br>
^CCtrl-C caught... cleaning up processes<br>
<br>
<br>
In your previous email you said that your host file contains this:<br>
<br>
<br>
beowulf.master<br>
beowulf.node1<br>
beowulf.node2<br>
beowulf.node3<br>
beowulf.node4<br>
<br>
The above output does not match this. Process 1 should be scheduled<br>
on node1. So something is not correct here. Are you sure the<br>
information you gave us is right?<br>
<br>
<br>
--<br>
Pavan Balaji<br></div>
<a href="http://www.mcs.anl.gov/%7Ebalaji" target="_blank">http://www.mcs.anl.gov/~balaji</a> <<a href="http://www.mcs.anl.gov/%7Ebalaji" target="_blank">http://www.mcs.anl.gov/%<u></u>7Ebalaji</a>><br>
<br>
<br>
</blockquote><div class="HOEnZb"><div class="h5">
<br>
-- <br>
Pavan Balaji<br>
<a href="http://www.mcs.anl.gov/%7Ebalaji" target="_blank">http://www.mcs.anl.gov/~balaji</a><br>
</div></div></blockquote></div><br>