[mpich-discuss] Problem while running example program Cpi with morethan 1 task
Pavan Balaji
balaji at mcs.anl.gov
Thu Sep 2 17:13:50 CDT 2010
Hydra is the default process manager in 1.3b1, btw.
On 09/02/2010 05:13 PM, Pavan Balaji wrote:
>
> Are you sure you are using 1.3b1? With Hydra, if you just give "mpiexec
> -n 7", and not specify a hostfile, you will run all processes on the
> local node, and not use k1, k2, k3, etc.
>
> -- Pavan
>
> On 09/02/2010 04:47 PM, Thejna Tharammal wrote:
>> Hi,
>> I tried with 1.3b1,
>> until 6 tasks it works fine like
>> =================
>> -bash-3.2$ mpiexec -n 6 ./cpi
>> Process 0 of 6 is on k1
>> Process 3 of 6 is on k4
>> Process 1 of 6 is on k2
>> Process 2 of 6 is on k3
>> Process 4 of 6 is on k5
>> Process 5 of 6 is on k6
>> pi is approximately 3.1415926544231239, Error is 0.0000000008333307
>> wall clock time = 0.001316
>> =======================================
>>
>> but from 7 it shows : (I have attached a detailed error log),
>>
>>
>> -bash-3.2$ mpiexec -n 7 ./cpi
>> Process 0 of 7 is on k1
>> Process 6 of 7 is on k1
>> Process 1 of 7 is on k2
>> Process 2 of 7 is on k3
>> Process 3 of 7 is on k4
>> Process 4 of 7 is on k5
>> Process 5 of 7 is on k6
>> pi is approximately 3.1415926544231234, Error is 0.0000000008333303
>> wall clock time = 0.002738
>> Fatal error in MPI_Finalize: Other MPI error, error stack:
>> MPI_Finalize(302).................: MPI_Finalize failed
>> MPI_Finalize(210).................:
>> MPID_Finalize(110)................:
>> MPIDI_CH3U_VC_WaitForClose(343)...: an error occurred while the device was
>> waiting for all open connections to close
>> MPIDI_CH3I_Progress(184)..........:
>> MPID_nem_mpich2_blocking_recv(895):
>> MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0:
>> Fatal error in MPI_Finalize: Other MPI error, error stack:
>> ....
>>
>> MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0:
>> APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
>> =====================================
>> Thanks,
>> Thejna
>>
>> ----------------original message-----------------
>> From: "Pavan Balaji" balaji at mcs.anl.gov
>> To: mpich-discuss at mcs.anl.gov
>> Date: Thu, 02 Sep 2010 11:17:06 -0500
>> -------------------------------------------------
>>
>>
>>>
>>> On 09/02/2010 11:16 AM, Pavan Balaji wrote:
>>>>
>>>> The difference in the working case and the failing cases is the use of
>>>> shared memory -- the working case has no shared memory usage since you
>>>> don't have multiple processes on the same node.
>>>>
>>>> Can you try out the 1.3b1 release of Hydra before we go digging into
>> this?
>>>
>>> 1.3b1 release of MPICH2, not Hydra, sorry.
>>>
>>> --
>>> Pavan Balaji
>>> http://www.mcs.anl.gov/~balaji
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list