[mpich-discuss] Error with more than 5 nodes

Pavan Balaji balaji at mcs.anl.gov
Wed Apr 21 08:02:37 CDT 2010


This typically points to a networking setup issue on your system. For 
example, is mlscubl11 able to connect/ssh to mlscubl3? You can use the 
mpdcheck utility that comes with mpich2 to check for such issues.

Alternatively, you can just the Hydra process manager (using 
mpiexec.hydra instead of mpiexec). Usage instructions for Hydra are 
available here: 
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager

I've added a FAQ entry for this, with some more details: 
http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q:_Why_is_mpdboot_failing_with_a_erroneous_handshake.3F

  -- Pavan

On 04/21/2010 07:53 AM, Prashantha Hebbar wrote:
> Hi Friends,
> 
>  
> 
> I have come across  an error with mpdboot –n 6 –f mpd.hosts command. 
> Following is the error message
> 
>  
> 
> mpdboot_mlscubl3-desktop (handle_mpd_output 407): failed to handshake 
> with mpd on mlscubl11-desktop; recvd output={}
> 
>  
> 
> With mpdboot –n 5 –f mpd.hosts everything works fine. I also checked to 
> verify whether it is a problem with node, mlscubl11-desktop by keeping 
> mlscubl11-desktop alone in the mpd.hosts and executing mpdboot –n 2 –f 
> mpd.hosts. That  works fine. I have 15 nodes with me wherein 
> mlscubl3-desktop is master. There is no ssh problem. Somehow I have to 
> cluster all the 15. I tried in all the combination of nodes. But mpdboot 
> with more than 5 nodes not working properly.
> 
>  
> 
> Now my doubt is whether it is limitation of MPICH2 or linux (Ubuntu). 
>  Can you please help me to solve this?
> 
>  
> 
> Thanking you in anticipation.
> 
> Regards,
> 
> Prashantha
> 
> ------------------------------------------------------------------------
> This e-mail is privileged and confidential. If you are not the
> intended recipient please delete the message and notify the sender.
> Any views or opinions presented are solely those of the author.
> ------------------------------------------------------------------------
> 
> *Please do not print this email unless it is absolutely necessary. *
> 
> The information contained in this electronic message and any attachments 
> to this message are intended for the exclusive use of the addressee(s) 
> and may contain proprietary, confidential or privileged information. If 
> you are not the intended recipient, you should not disseminate, 
> distribute or copy this e-mail. Please notify the sender immediately and 
> destroy all copies of this message and any attachments.
> 
> *WARNING:* Computer viruses can be transmitted via email. The recipient 
> should check this email and any attachments for the presence of viruses. 
> The company accepts no liability for any damage caused by any virus 
> transmitted by this email.
> 
> www.manipal.edu <http://www.manipal.edu>
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list