[mpich-discuss] Problem while running example program Cpi with morethan 1 task

Pavan Balaji balaji at mcs.anl.gov
Thu Sep 9 11:08:47 CDT 2010


This looks like a shared memory issue.

Darius: can you look into this?

  -- Pavan

On 09/09/2010 10:54 AM, Thejna Tharammal wrote:
>   Hi Pavan,
> This is the result of the test you suggested(1.3b,nemesis,hydra),
>   second one without the env mpich_no_local,(it gives the same error for>1
> tasks on one node)
> -bash-3.2$ mpiexec -n 7 -env MPICH_NO_LOCAL=1 ./cpi
> Process 0 of 7 is on k1
> Process 1 of 7 is on k1
> Process 2 of 7 is on k1
> Process 3 of 7 is on k1
> Process 5 of 7 is on k1
> Process 6 of 7 is on k1
> Process 4 of 7 is on k1
> pi is approximately 3.1415926544231239, Error is 0.0000000008333307
> wall clock time = 0.001221
>
> -bash-3.2$ mpiexec -n 7 ./cpi
> Process 0 of 7 is on k1
> Process 1 of 7 is on k1
> Process 4 of 7 is on k1
> Process 5 of 7 is on k1
> Process 6 of 7 is on k1
> Process 3 of 7 is on k1
> Process 2 of 7 is on k1
> pi is approximately 3.1415926544231239, Error is 0.0000000008333307
> wall clock time = 0.000221
> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
>
> Thanks,
> Thejna
>
> ----------------original message-----------------
> From: "Pavan Balaji" balaji at mcs.anl.gov
> To: "Thejna Tharammal" ttharammal at marum.de
> CC: mpich-discuss at mcs.anl.gov
> Date: Wed, 08 Sep 2010 11:47:16 -0500
> -------------------------------------------------
>
>
>> Thejna,
>>
>>  From the output it looks like all the processes finalized fine, but
>> aborted after that. Also, it looks like you have again gone back to the
>> multi-node case from the single node case which was also failing and
>> easier to debug. What's the strange output you see with the -verbose
>> option? The output seems fine to me.
>>
>> Thanks for trying out ch3:sock instead of the default ch3:nemesis; I was
>> about to ask you to try that next.
>>
>> Can you go back to ch3:nemesis (default) and 1.3b1, and try to run the
>> application with the environment MPICH_NO_LOCAL set to 1. Let's just use
>> a single node for the time being:
>>
>> % mpiexec -n 7 -env MPICH_NO_LOCAL=1 ./cpi
>>
>> -- Pavan
>>
>> On 09/08/2010 09:48 AM, Thejna Tharammal wrote:
>>> Hi Pavan,
>>> Thank you for the reply,
>>> I ran them from k1 itself,
>>> Now I went back one step and configured 1.2.1p1 and 1.3b1 with
>>> --with-device=ch3:sock option, then no errors are showing up with cpi (I
>>> used hydra for both)
>>> I am attaching the files - results , (with 6 hosts,48 processes)
>>> But when I use -verbose option I see some strange messages.
>>> I used mpiexec -n 48 ./cpi&
>>> mpiexec -verbose -n 48 ./cpi
>>> Thanks,
>>> Thejna
>>> ----------------original message-----------------
>>> From: "Pavan Balaji" balaji at mcs.anl.gov
>>> To: "Thejna Tharammal" ttharammal at marum.de
>>> CC: mpich-discuss at mcs.anl.gov
>>> Date: Tue, 07 Sep 2010 20:33:00 -0500
>>> -------------------------------------------------
>>>
>>>
>>>>
>>>> Sorry for the delay in getting back to this.
>>>>
>>>> On 09/03/2010 07:43 AM, Thejna Tharammal wrote:
>>>>> Ok, I tried that,
>>>>>
>>>>> No.of hosts 1:
>>>>> -bash-3.2$ mpiexec -n 7 ./cpi
>>>>> Process 1 of 7 is on k1
>>>>> Process 4 of 7 is on k1
>>>>> Process 5 of 7 is on k1
>>>>> Process 2 of 7 is on k1
>>>>> Process 6 of 7 is on k1
>>>>> Process 0 of 7 is on k1
>>>>> Process 3 of 7 is on k1
>>>>> pi is approximately 3.1415926544231239, Error is
>>>>> 0.0000000008333307
>>>>> wall clock time = 0.000198
>>>>> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal
>>>>> 15)
>>>>
>>>> It looks like even on node is having problems. Are you executing the
>>>> mpiexec from k1? Can you try executing it from k1?
>>>>
>>>> Thanks,
>>>>
>>>> -- Pavan
>>>>
>>>> --
>>>> Pavan Balaji
>>>> http://www.mcs.anl.gov/~balaji
>>>>
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list