[mpich-discuss] Not able to run MPI program parallely...

Albert Spade albert.spade at gmail.com
Tue May 1 13:42:11 CDT 2012


And here is the output of hostname on beowulf.node1

[root at beowulf ~]# ssh 172.16.20.32
Last login: Wed May  2 05:33:27 2012 from beowulf.master
[root at beowulf ~]# hostname
beowulf.node1
[root at beowulf ~]#


On Tue, May 1, 2012 at 11:57 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:

>
> Can you run "hostname" on beowulf.node1 and see what is returns?  Also,
> can you send us the output of:
>
> mpiexec -verbose -f hosts -n 4 /opt/mpich2-1.4.1p1/examples/.**/cpi
>
>  -- Pavan
>
>
> On 05/01/2012 01:25 PM, Albert Spade wrote:
>
>> Yes I am sure.
>> I created one file by name hosts in /root and its contents are
>> beowulf.master
>> beowulf.node1
>> beowulf.node2
>> beowulf.node3
>> beowulf.node4
>>
>> I have one more file in /etc by name hosts and its contents are:
>>
>> [root at beowulf ~]# cat /etc/hosts
>> 127.0.0.1   localhost localhost.localdomain localhost4
>> localhost4.localdomain4
>> ::1         localhost localhost.localdomain localhost6
>> localhost6.localdomain6
>> 172.16.20.31 beowulf.master
>> 172.16.20.32 beowulf.node1
>> 172.16.20.33 beowulf.node2
>> 172.16.20.34 beowulf.node3
>> 172.16.20.35 beowulf.node4
>> [root at beowulf ~]#
>>
>> On Tue, May 1, 2012 at 11:15 PM, Pavan Balaji <balaji at mcs.anl.gov
>> <mailto:balaji at mcs.anl.gov>> wrote:
>>
>>
>>    On 05/01/2012 12:39 PM, Albert Spade wrote:
>>
>>        [root at beowulf ~]# mpiexec -f hosts -n 4
>>        /opt/mpich2-1.4.1p1/examples/.**__/cpi
>>
>>        Process 0 of 4 is on beowulf.master
>>        Process 3 of 4 is on beowulf.master
>>        Process 1 of 4 is on beowulf.master
>>        Process 2 of 4 is on beowulf.master
>>        Fatal error in PMPI_Reduce: Other MPI error, error stack:
>>        PMPI_Reduce(1270).............**__..: MPI_Reduce(sbuf=0xbff0fd08,
>>
>>        rbuf=0xbff0fd00, count=1, MPI_DOUBLE, MPI_SUM, root=0,
>>        MPI_COMM_WORLD)
>>        failed
>>        MPIR_Reduce_impl(1087)........**__..:
>>        MPIR_Reduce_intra(895)........**__..:
>>        MPIR_Reduce_binomial(144).....**__..:
>>        MPIDI_CH3U_Recvq_FDU_or_AEP(__**380): Communication error with
>> rank 2
>>        MPIR_Reduce_binomial(144).....**__..:
>>        MPIDI_CH3U_Recvq_FDU_or_AEP(__**380): Communication error with
>> rank 1
>>
>>        ^CCtrl-C caught... cleaning up processes
>>
>>
>>    In your previous email you said that your host file contains this:
>>
>>
>>    beowulf.master
>>    beowulf.node1
>>    beowulf.node2
>>    beowulf.node3
>>    beowulf.node4
>>
>>    The above output does not match this.  Process 1 should be scheduled
>>    on node1.  So something is not correct here.  Are you sure the
>>    information you gave us is right?
>>
>>
>>    --
>>    Pavan Balaji
>>    http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%**7Ebalaji<http://www.mcs.anl.gov/%7Ebalaji>
>> >
>>
>>
>>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120502/e5ae3f04/attachment.htm>


More information about the mpich-discuss mailing list