[mpich-discuss] Not able to run MPI program parallely...
Pavan Balaji
balaji at mcs.anl.gov
Tue May 1 13:27:47 CDT 2012
Can you run "hostname" on beowulf.node1 and see what is returns? Also,
can you send us the output of:
mpiexec -verbose -f hosts -n 4 /opt/mpich2-1.4.1p1/examples/./cpi
-- Pavan
On 05/01/2012 01:25 PM, Albert Spade wrote:
> Yes I am sure.
> I created one file by name hosts in /root and its contents are
> beowulf.master
> beowulf.node1
> beowulf.node2
> beowulf.node3
> beowulf.node4
>
> I have one more file in /etc by name hosts and its contents are:
>
> [root at beowulf ~]# cat /etc/hosts
> 127.0.0.1 localhost localhost.localdomain localhost4
> localhost4.localdomain4
> ::1 localhost localhost.localdomain localhost6
> localhost6.localdomain6
> 172.16.20.31 beowulf.master
> 172.16.20.32 beowulf.node1
> 172.16.20.33 beowulf.node2
> 172.16.20.34 beowulf.node3
> 172.16.20.35 beowulf.node4
> [root at beowulf ~]#
>
> On Tue, May 1, 2012 at 11:15 PM, Pavan Balaji <balaji at mcs.anl.gov
> <mailto:balaji at mcs.anl.gov>> wrote:
>
>
> On 05/01/2012 12:39 PM, Albert Spade wrote:
>
> [root at beowulf ~]# mpiexec -f hosts -n 4
> /opt/mpich2-1.4.1p1/examples/.__/cpi
> Process 0 of 4 is on beowulf.master
> Process 3 of 4 is on beowulf.master
> Process 1 of 4 is on beowulf.master
> Process 2 of 4 is on beowulf.master
> Fatal error in PMPI_Reduce: Other MPI error, error stack:
> PMPI_Reduce(1270).............__..: MPI_Reduce(sbuf=0xbff0fd08,
> rbuf=0xbff0fd00, count=1, MPI_DOUBLE, MPI_SUM, root=0,
> MPI_COMM_WORLD)
> failed
> MPIR_Reduce_impl(1087)........__..:
> MPIR_Reduce_intra(895)........__..:
> MPIR_Reduce_binomial(144).....__..:
> MPIDI_CH3U_Recvq_FDU_or_AEP(__380): Communication error with rank 2
> MPIR_Reduce_binomial(144).....__..:
> MPIDI_CH3U_Recvq_FDU_or_AEP(__380): Communication error with rank 1
> ^CCtrl-C caught... cleaning up processes
>
>
> In your previous email you said that your host file contains this:
>
>
> beowulf.master
> beowulf.node1
> beowulf.node2
> beowulf.node3
> beowulf.node4
>
> The above output does not match this. Process 1 should be scheduled
> on node1. So something is not correct here. Are you sure the
> information you gave us is right?
>
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%7Ebalaji>
>
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list