[mpich-discuss] Not able to run MPI program parallely...

Pavan Balaji balaji at mcs.anl.gov
Tue May 1 13:27:47 CDT 2012


Can you run "hostname" on beowulf.node1 and see what is returns?  Also, 
can you send us the output of:

mpiexec -verbose -f hosts -n 4 /opt/mpich2-1.4.1p1/examples/./cpi

  -- Pavan

On 05/01/2012 01:25 PM, Albert Spade wrote:
> Yes I am sure.
> I created one file by name hosts in /root and its contents are
> beowulf.master
> beowulf.node1
> beowulf.node2
> beowulf.node3
> beowulf.node4
>
> I have one more file in /etc by name hosts and its contents are:
>
> [root at beowulf ~]# cat /etc/hosts
> 127.0.0.1   localhost localhost.localdomain localhost4
> localhost4.localdomain4
> ::1         localhost localhost.localdomain localhost6
> localhost6.localdomain6
> 172.16.20.31 beowulf.master
> 172.16.20.32 beowulf.node1
> 172.16.20.33 beowulf.node2
> 172.16.20.34 beowulf.node3
> 172.16.20.35 beowulf.node4
> [root at beowulf ~]#
>
> On Tue, May 1, 2012 at 11:15 PM, Pavan Balaji <balaji at mcs.anl.gov
> <mailto:balaji at mcs.anl.gov>> wrote:
>
>
>     On 05/01/2012 12:39 PM, Albert Spade wrote:
>
>         [root at beowulf ~]# mpiexec -f hosts -n 4
>         /opt/mpich2-1.4.1p1/examples/.__/cpi
>         Process 0 of 4 is on beowulf.master
>         Process 3 of 4 is on beowulf.master
>         Process 1 of 4 is on beowulf.master
>         Process 2 of 4 is on beowulf.master
>         Fatal error in PMPI_Reduce: Other MPI error, error stack:
>         PMPI_Reduce(1270).............__..: MPI_Reduce(sbuf=0xbff0fd08,
>         rbuf=0xbff0fd00, count=1, MPI_DOUBLE, MPI_SUM, root=0,
>         MPI_COMM_WORLD)
>         failed
>         MPIR_Reduce_impl(1087)........__..:
>         MPIR_Reduce_intra(895)........__..:
>         MPIR_Reduce_binomial(144).....__..:
>         MPIDI_CH3U_Recvq_FDU_or_AEP(__380): Communication error with rank 2
>         MPIR_Reduce_binomial(144).....__..:
>         MPIDI_CH3U_Recvq_FDU_or_AEP(__380): Communication error with rank 1
>         ^CCtrl-C caught... cleaning up processes
>
>
>     In your previous email you said that your host file contains this:
>
>
>     beowulf.master
>     beowulf.node1
>     beowulf.node2
>     beowulf.node3
>     beowulf.node4
>
>     The above output does not match this.  Process 1 should be scheduled
>     on node1.  So something is not correct here.  Are you sure the
>     information you gave us is right?
>
>
>     --
>     Pavan Balaji
>     http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%7Ebalaji>
>
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list