[mpich-discuss] Not able to run MPI program parallely...
Albert Spade
albert.spade at gmail.com
Tue May 1 13:42:11 CDT 2012
And here is the output of hostname on beowulf.node1
[root at beowulf ~]# ssh 172.16.20.32
Last login: Wed May 2 05:33:27 2012 from beowulf.master
[root at beowulf ~]# hostname
beowulf.node1
[root at beowulf ~]#
On Tue, May 1, 2012 at 11:57 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>
> Can you run "hostname" on beowulf.node1 and see what is returns? Also,
> can you send us the output of:
>
> mpiexec -verbose -f hosts -n 4 /opt/mpich2-1.4.1p1/examples/.**/cpi
>
> -- Pavan
>
>
> On 05/01/2012 01:25 PM, Albert Spade wrote:
>
>> Yes I am sure.
>> I created one file by name hosts in /root and its contents are
>> beowulf.master
>> beowulf.node1
>> beowulf.node2
>> beowulf.node3
>> beowulf.node4
>>
>> I have one more file in /etc by name hosts and its contents are:
>>
>> [root at beowulf ~]# cat /etc/hosts
>> 127.0.0.1 localhost localhost.localdomain localhost4
>> localhost4.localdomain4
>> ::1 localhost localhost.localdomain localhost6
>> localhost6.localdomain6
>> 172.16.20.31 beowulf.master
>> 172.16.20.32 beowulf.node1
>> 172.16.20.33 beowulf.node2
>> 172.16.20.34 beowulf.node3
>> 172.16.20.35 beowulf.node4
>> [root at beowulf ~]#
>>
>> On Tue, May 1, 2012 at 11:15 PM, Pavan Balaji <balaji at mcs.anl.gov
>> <mailto:balaji at mcs.anl.gov>> wrote:
>>
>>
>> On 05/01/2012 12:39 PM, Albert Spade wrote:
>>
>> [root at beowulf ~]# mpiexec -f hosts -n 4
>> /opt/mpich2-1.4.1p1/examples/.**__/cpi
>>
>> Process 0 of 4 is on beowulf.master
>> Process 3 of 4 is on beowulf.master
>> Process 1 of 4 is on beowulf.master
>> Process 2 of 4 is on beowulf.master
>> Fatal error in PMPI_Reduce: Other MPI error, error stack:
>> PMPI_Reduce(1270).............**__..: MPI_Reduce(sbuf=0xbff0fd08,
>>
>> rbuf=0xbff0fd00, count=1, MPI_DOUBLE, MPI_SUM, root=0,
>> MPI_COMM_WORLD)
>> failed
>> MPIR_Reduce_impl(1087)........**__..:
>> MPIR_Reduce_intra(895)........**__..:
>> MPIR_Reduce_binomial(144).....**__..:
>> MPIDI_CH3U_Recvq_FDU_or_AEP(__**380): Communication error with
>> rank 2
>> MPIR_Reduce_binomial(144).....**__..:
>> MPIDI_CH3U_Recvq_FDU_or_AEP(__**380): Communication error with
>> rank 1
>>
>> ^CCtrl-C caught... cleaning up processes
>>
>>
>> In your previous email you said that your host file contains this:
>>
>>
>> beowulf.master
>> beowulf.node1
>> beowulf.node2
>> beowulf.node3
>> beowulf.node4
>>
>> The above output does not match this. Process 1 should be scheduled
>> on node1. So something is not correct here. Are you sure the
>> information you gave us is right?
>>
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%**7Ebalaji<http://www.mcs.anl.gov/%7Ebalaji>
>> >
>>
>>
>>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120502/e5ae3f04/attachment.htm>
More information about the mpich-discuss
mailing list