[mpich-discuss] Problem while running example program Cpi with morethan 1 task

Pavan Balaji balaji at mcs.anl.gov
Thu Sep 2 17:13:24 CDT 2010


Are you sure you are using 1.3b1? With Hydra, if you just give "mpiexec 
-n 7", and not specify a hostfile, you will run all processes on the 
local node, and not use k1, k2, k3, etc.

  -- Pavan

On 09/02/2010 04:47 PM, Thejna Tharammal wrote:
>   Hi,
> I tried with 1.3b1,
> until 6 tasks it works fine like
> =================
> -bash-3.2$ mpiexec -n 6 ./cpi
> Process 0 of 6 is on k1
> Process 3 of 6 is on k4
> Process 1 of 6 is on k2
> Process 2 of 6 is on k3
> Process 4 of 6 is on k5
> Process 5 of 6 is on k6
> pi is approximately 3.1415926544231239, Error is 0.0000000008333307
> wall clock time = 0.001316
> =======================================
>
>   but from 7 it shows : (I have attached a detailed error log),
>
>
> -bash-3.2$ mpiexec -n 7 ./cpi
> Process 0 of 7 is on k1
> Process 6 of 7 is on k1
> Process 1 of 7 is on k2
> Process 2 of 7 is on k3
> Process 3 of 7 is on k4
> Process 4 of 7 is on k5
> Process 5 of 7 is on k6
> pi is approximately 3.1415926544231234, Error is 0.0000000008333303
> wall clock time = 0.002738
> Fatal error in MPI_Finalize: Other MPI error, error stack:
> MPI_Finalize(302).................: MPI_Finalize failed
> MPI_Finalize(210).................:
> MPID_Finalize(110)................:
> MPIDI_CH3U_VC_WaitForClose(343)...: an error occurred while the device was
> waiting for all open connections to close
> MPIDI_CH3I_Progress(184)..........:
> MPID_nem_mpich2_blocking_recv(895):
> MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0:
> Fatal error in MPI_Finalize: Other MPI error, error stack:
> ....
>
>   MPID_nem_tcp_connpoll(1746).......: Communication error with rank 0:
> APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
> =====================================
> Thanks,
> Thejna
>
> ----------------original message-----------------
> From: "Pavan Balaji" balaji at mcs.anl.gov
> To: mpich-discuss at mcs.anl.gov
> Date: Thu, 02 Sep 2010 11:17:06 -0500
> -------------------------------------------------
>
>
>>
>> On 09/02/2010 11:16 AM, Pavan Balaji wrote:
>>>
>>> The difference in the working case and the failing cases is the use of
>>> shared memory -- the working case has no shared memory usage since you
>>> don't have multiple processes on the same node.
>>>
>>> Can you try out the 1.3b1 release of Hydra before we go digging into
> this?
>>
>> 1.3b1 release of MPICH2, not Hydra, sorry.
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list