[mpich-discuss] Error with more than 5 nodes

Pavan Balaji balaji at mcs.anl.gov
Wed Apr 21 08:48:57 CDT 2010


Did you try using Hydra (mpiexec.hydra)?

% mpiexec.hydra -f hostfile -n 10 ./foo

If there's a real networking setup issue, even Hydra will not be able to 
help, but it might give you a better hint at what the problem is.

  -- Pavan

On 04/21/2010 08:42 AM, Prashantha Hebbar wrote:
> Hi Pavan,
> 
> There is no problem with ssh with mlscubl11-desktop from mlscubl3-desktop or
> vice versa. Mpdchek with both systems work fine. Even if I keep
> mlscubl11-desktop alone in mpd.hosts, mpdboot -n 2 -f mpd.hosts works fine.
> 
> I found this problem when I try to connect nodes more than 5. It is not that
> only mlscubl11-desktop does this error. If I place any node in 5th line of
> mpd.hosts the same error comes.
> 
> My doubt is whether concurrent login limit to the user
> mpiub at mlscubl3-desktop (unbuntu linux) is creating problem or something
> else?
> 
> I have also edited /etc/security/limits.conf and made following changes
> 
> @mpiub - maxlogins 20
> @mpiub - maxsyslogins 20
> 
> But no change in the error.
> 
> Thanks for the help.
> 
> Regards,
> Prashantha
> 
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Pavan Balaji
> Sent: Wednesday, April 21, 2010 6:33 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Error with more than 5 nodes
> 
> 
> This typically points to a networking setup issue on your system. For 
> example, is mlscubl11 able to connect/ssh to mlscubl3? You can use the 
> mpdcheck utility that comes with mpich2 to check for such issues.
> 
> Alternatively, you can just the Hydra process manager (using 
> mpiexec.hydra instead of mpiexec). Usage instructions for Hydra are 
> available here: 
> http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
> 
> I've added a FAQ entry for this, with some more details: 
> http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q:_Why_i
> s_mpdboot_failing_with_a_erroneous_handshake.3F
> 
>   -- Pavan
> 
> On 04/21/2010 07:53 AM, Prashantha Hebbar wrote:
>> Hi Friends,
>>
>>  
>>
>> I have come across  an error with mpdboot -n 6 -f mpd.hosts command. 
>> Following is the error message
>>
>>  
>>
>> mpdboot_mlscubl3-desktop (handle_mpd_output 407): failed to handshake 
>> with mpd on mlscubl11-desktop; recvd output={}
>>
>>  
>>
>> With mpdboot -n 5 -f mpd.hosts everything works fine. I also checked to 
>> verify whether it is a problem with node, mlscubl11-desktop by keeping 
>> mlscubl11-desktop alone in the mpd.hosts and executing mpdboot -n 2 -f 
>> mpd.hosts. That  works fine. I have 15 nodes with me wherein 
>> mlscubl3-desktop is master. There is no ssh problem. Somehow I have to 
>> cluster all the 15. I tried in all the combination of nodes. But mpdboot 
>> with more than 5 nodes not working properly.
>>
>>  
>>
>> Now my doubt is whether it is limitation of MPICH2 or linux (Ubuntu). 
>>  Can you please help me to solve this?
>>
>>  
>>
>> Thanking you in anticipation.
>>
>> Regards,
>>
>> Prashantha
>>
>> ------------------------------------------------------------------------
>> This e-mail is privileged and confidential. If you are not the
>> intended recipient please delete the message and notify the sender.
>> Any views or opinions presented are solely those of the author.
>> ------------------------------------------------------------------------
>>
>> *Please do not print this email unless it is absolutely necessary. *
>>
>> The information contained in this electronic message and any attachments 
>> to this message are intended for the exclusive use of the addressee(s) 
>> and may contain proprietary, confidential or privileged information. If 
>> you are not the intended recipient, you should not disseminate, 
>> distribute or copy this e-mail. Please notify the sender immediately and 
>> destroy all copies of this message and any attachments.
>>
>> *WARNING:* Computer viruses can be transmitted via email. The recipient 
>> should check this email and any attachments for the presence of viruses. 
>> The company accepts no liability for any damage caused by any virus 
>> transmitted by this email.
>>
>> www.manipal.edu <http://www.manipal.edu>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list