[mpich-discuss] Problem while running example program Cpi with morethan 1 task

Pavan Balaji balaji at mcs.anl.gov
Wed Sep 8 11:47:16 CDT 2010


Thejna,

 From the output it looks like all the processes finalized fine, but 
aborted after that. Also, it looks like you have again gone back to the 
multi-node case from the single node case which was also failing and 
easier to debug. What's the strange output you see with the -verbose 
option? The output seems fine to me.

Thanks for trying out ch3:sock instead of the default ch3:nemesis; I was 
about to ask you to try that next.

Can you go back to ch3:nemesis (default) and 1.3b1, and try to run the 
application with the environment MPICH_NO_LOCAL set to 1. Let's just use 
a single node for the time being:

% mpiexec -n 7 -env MPICH_NO_LOCAL=1 ./cpi

  -- Pavan

On 09/08/2010 09:48 AM, Thejna Tharammal wrote:
>   Hi Pavan,
> Thank you for the reply,
>   I ran them from k1 itself,
> Now I went back one step and configured 1.2.1p1 and 1.3b1 with
> --with-device=ch3:sock option, then no errors are showing up with cpi (I
> used hydra for both)
> I am attaching the files - results , (with 6 hosts,48 processes)
> But when I use -verbose option I see some strange messages.
> I used mpiexec -n 48 ./cpi&
> mpiexec -verbose -n 48 ./cpi
> Thanks,
> Thejna
> ----------------original message-----------------
> From: "Pavan Balaji" balaji at mcs.anl.gov
> To: "Thejna Tharammal" ttharammal at marum.de
> CC: mpich-discuss at mcs.anl.gov
> Date: Tue, 07 Sep 2010 20:33:00 -0500
> -------------------------------------------------
>
>
>>
>> Sorry for the delay in getting back to this.
>>
>> On 09/03/2010 07:43 AM, Thejna Tharammal wrote:
>>> Ok, I tried that,
>>>
>>> No.of hosts 1:
>>> -bash-3.2$ mpiexec -n 7 ./cpi
>>> Process 1 of 7 is on k1
>>> Process 4 of 7 is on k1
>>> Process 5 of 7 is on k1
>>> Process 2 of 7 is on k1
>>> Process 6 of 7 is on k1
>>> Process 0 of 7 is on k1
>>> Process 3 of 7 is on k1
>>> pi is approximately 3.1415926544231239, Error is 0.0000000008333307
>>> wall clock time = 0.000198
>>> APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
>>
>> It looks like even on node is having problems. Are you executing the
>> mpiexec from k1? Can you try executing it from k1?
>>
>> Thanks,
>>
>> -- Pavan
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list