[mpich-discuss] Error with more than 5 nodes

Prashantha Hebbar prashantha.hebbar at manipal.edu
Wed Apr 21 08:42:18 CDT 2010


Hi Pavan,

There is no problem with ssh with mlscubl11-desktop from mlscubl3-desktop or
vice versa. Mpdchek with both systems work fine. Even if I keep
mlscubl11-desktop alone in mpd.hosts, mpdboot -n 2 -f mpd.hosts works fine.

I found this problem when I try to connect nodes more than 5. It is not that
only mlscubl11-desktop does this error. If I place any node in 5th line of
mpd.hosts the same error comes.

My doubt is whether concurrent login limit to the user
mpiub at mlscubl3-desktop (unbuntu linux) is creating problem or something
else?

I have also edited /etc/security/limits.conf and made following changes

@mpiub - maxlogins 20
@mpiub - maxsyslogins 20

But no change in the error.

Thanks for the help.

Regards,
Prashantha

-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Pavan Balaji
Sent: Wednesday, April 21, 2010 6:33 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Error with more than 5 nodes


This typically points to a networking setup issue on your system. For 
example, is mlscubl11 able to connect/ssh to mlscubl3? You can use the 
mpdcheck utility that comes with mpich2 to check for such issues.

Alternatively, you can just the Hydra process manager (using 
mpiexec.hydra instead of mpiexec). Usage instructions for Hydra are 
available here: 
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager

I've added a FAQ entry for this, with some more details: 
http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q:_Why_i
s_mpdboot_failing_with_a_erroneous_handshake.3F

  -- Pavan

On 04/21/2010 07:53 AM, Prashantha Hebbar wrote:
> Hi Friends,
> 
>  
> 
> I have come across  an error with mpdboot -n 6 -f mpd.hosts command. 
> Following is the error message
> 
>  
> 
> mpdboot_mlscubl3-desktop (handle_mpd_output 407): failed to handshake 
> with mpd on mlscubl11-desktop; recvd output={}
> 
>  
> 
> With mpdboot -n 5 -f mpd.hosts everything works fine. I also checked to 
> verify whether it is a problem with node, mlscubl11-desktop by keeping 
> mlscubl11-desktop alone in the mpd.hosts and executing mpdboot -n 2 -f 
> mpd.hosts. That  works fine. I have 15 nodes with me wherein 
> mlscubl3-desktop is master. There is no ssh problem. Somehow I have to 
> cluster all the 15. I tried in all the combination of nodes. But mpdboot 
> with more than 5 nodes not working properly.
> 
>  
> 
> Now my doubt is whether it is limitation of MPICH2 or linux (Ubuntu). 
>  Can you please help me to solve this?
> 
>  
> 
> Thanking you in anticipation.
> 
> Regards,
> 
> Prashantha
> 
> ------------------------------------------------------------------------
> This e-mail is privileged and confidential. If you are not the
> intended recipient please delete the message and notify the sender.
> Any views or opinions presented are solely those of the author.
> ------------------------------------------------------------------------
> 
> *Please do not print this email unless it is absolutely necessary. *
> 
> The information contained in this electronic message and any attachments 
> to this message are intended for the exclusive use of the addressee(s) 
> and may contain proprietary, confidential or privileged information. If 
> you are not the intended recipient, you should not disseminate, 
> distribute or copy this e-mail. Please notify the sender immediately and 
> destroy all copies of this message and any attachments.
> 
> *WARNING:* Computer viruses can be transmitted via email. The recipient 
> should check this email and any attachments for the presence of viruses. 
> The company accepts no liability for any damage caused by any virus 
> transmitted by this email.
> 
> www.manipal.edu <http://www.manipal.edu>
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


######################################################################
Attention: 
This e-mail message is privileged and confidential. If you are not the 
intended recipient please delete the message and notify the sender. 
Any views or opinions presented are solely those of the author.

######################################################################

Please do not print this email unless it is absolutely necessary. 
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 
WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 
www.manipal.edu


More information about the mpich-discuss mailing list