[mpich-discuss] Error with more than 5 nodes
Pavan Balaji
balaji at mcs.anl.gov
Wed Apr 21 08:48:57 CDT 2010
Did you try using Hydra (mpiexec.hydra)?
% mpiexec.hydra -f hostfile -n 10 ./foo
If there's a real networking setup issue, even Hydra will not be able to
help, but it might give you a better hint at what the problem is.
-- Pavan
On 04/21/2010 08:42 AM, Prashantha Hebbar wrote:
> Hi Pavan,
>
> There is no problem with ssh with mlscubl11-desktop from mlscubl3-desktop or
> vice versa. Mpdchek with both systems work fine. Even if I keep
> mlscubl11-desktop alone in mpd.hosts, mpdboot -n 2 -f mpd.hosts works fine.
>
> I found this problem when I try to connect nodes more than 5. It is not that
> only mlscubl11-desktop does this error. If I place any node in 5th line of
> mpd.hosts the same error comes.
>
> My doubt is whether concurrent login limit to the user
> mpiub at mlscubl3-desktop (unbuntu linux) is creating problem or something
> else?
>
> I have also edited /etc/security/limits.conf and made following changes
>
> @mpiub - maxlogins 20
> @mpiub - maxsyslogins 20
>
> But no change in the error.
>
> Thanks for the help.
>
> Regards,
> Prashantha
>
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Pavan Balaji
> Sent: Wednesday, April 21, 2010 6:33 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Error with more than 5 nodes
>
>
> This typically points to a networking setup issue on your system. For
> example, is mlscubl11 able to connect/ssh to mlscubl3? You can use the
> mpdcheck utility that comes with mpich2 to check for such issues.
>
> Alternatively, you can just the Hydra process manager (using
> mpiexec.hydra instead of mpiexec). Usage instructions for Hydra are
> available here:
> http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
>
> I've added a FAQ entry for this, with some more details:
> http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q:_Why_i
> s_mpdboot_failing_with_a_erroneous_handshake.3F
>
> -- Pavan
>
> On 04/21/2010 07:53 AM, Prashantha Hebbar wrote:
>> Hi Friends,
>>
>>
>>
>> I have come across an error with mpdboot -n 6 -f mpd.hosts command.
>> Following is the error message
>>
>>
>>
>> mpdboot_mlscubl3-desktop (handle_mpd_output 407): failed to handshake
>> with mpd on mlscubl11-desktop; recvd output={}
>>
>>
>>
>> With mpdboot -n 5 -f mpd.hosts everything works fine. I also checked to
>> verify whether it is a problem with node, mlscubl11-desktop by keeping
>> mlscubl11-desktop alone in the mpd.hosts and executing mpdboot -n 2 -f
>> mpd.hosts. That works fine. I have 15 nodes with me wherein
>> mlscubl3-desktop is master. There is no ssh problem. Somehow I have to
>> cluster all the 15. I tried in all the combination of nodes. But mpdboot
>> with more than 5 nodes not working properly.
>>
>>
>>
>> Now my doubt is whether it is limitation of MPICH2 or linux (Ubuntu).
>> Can you please help me to solve this?
>>
>>
>>
>> Thanking you in anticipation.
>>
>> Regards,
>>
>> Prashantha
>>
>> ------------------------------------------------------------------------
>> This e-mail is privileged and confidential. If you are not the
>> intended recipient please delete the message and notify the sender.
>> Any views or opinions presented are solely those of the author.
>> ------------------------------------------------------------------------
>>
>> *Please do not print this email unless it is absolutely necessary. *
>>
>> The information contained in this electronic message and any attachments
>> to this message are intended for the exclusive use of the addressee(s)
>> and may contain proprietary, confidential or privileged information. If
>> you are not the intended recipient, you should not disseminate,
>> distribute or copy this e-mail. Please notify the sender immediately and
>> destroy all copies of this message and any attachments.
>>
>> *WARNING:* Computer viruses can be transmitted via email. The recipient
>> should check this email and any attachments for the presence of viruses.
>> The company accepts no liability for any damage caused by any virus
>> transmitted by this email.
>>
>> www.manipal.edu <http://www.manipal.edu>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list