[MPICH] mpdboot error : fail to ping

Ralph M. Butler rbutler at mtsu.edu
Sun Apr 9 18:55:10 CDT 2006


Yes this might be a bit difficult to pick up on.
The install guide discusses an assumption of shared file
systems, e.g. via NFS, or the need to copy files.
I can duplicate this problem only by having different secretwords
on 2 machines.  Perhaps you can simply copy one file to the other
to verify that the 2 secretwords are identical.

> Date: Mon, 10 Apr 2006 05:47:35 +0700
> From: Misora Itsumo <mitsuru.adachi at gmail.com>
> To: Ralph Butler <rbutler at mtsu.edu>
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] mpdboot error : fail to ping
> 
> ah , because i dont know whether secrets in both node must be similar or not
> , i have already tried to make it the same , but it produced the same error.
>
> Thanks,
> Tiep.
>
> On 4/10/06, Misora Itsumo <mitsuru.adachi at gmail.com> wrote:
>>
>>
>> yes , as ralph said , both my secretwords in two nodes are integer.
>> i changed my secret and it didn't inform last error anymore,
>> but sadly , it produced new error
>>
>> in node hewonty i did
>> [hewonty at hewonty doc]$ mpd &
>> [1] 4293
>> [hewonty at hewonty doc]$ mpdtrace -l
>> hewonty.homelinux.org_32800 (192.168.2.2)
>> [hewonty at hewonty doc]$ hewonty.homelinux.org_32800(handle_rhs_challenge_response 788): INVALID response in rhs response
>> msg=:{'ifhn': '192.168.2.3', 'cmd': 'challenge_response', 'port': 32774,
>> 'response': 'X\xc5\x8f\x9ccfS\x8e\xaa\r\xde6$Y+\x81'}:
>>
>> in node vm1
>> [hewonty at vm1 ~]$ mpd -h hewonty -p 32800
>> vm1_32774 (connect_lhs 635): NOT OK to enter ring; one likely cause:
>> mismatched secretwords
>> vm1_32774 (enter_ring 566): lhs connect failed
>> vm1_32774 (run 233): failed to enter ring
>>
>> ah for testing , my secret for hewonty is asdfghjkl1 , for vm1 is
>> qwertyuiop1
>>
>> Thanks,
>> Tiep.
>>
>>
>> On 4/9/06, Ralph Butler <rbutler at mtsu.edu> wrote:
>>>
>>> This seems to be a new bug.  I do not want to ask your secretword,
>>> but will guess that it is an integer.   If so,
>>> please make it a non-integer.  It's OK to have digits in there, but
>>> not to have the secretword be all digits.
>>> Let me know if this fixes the problem and I will fix it in mpd for
>>> the next release.
>>>
>>> Thanks.
>>> --ralph
>>>
>>> On Apr 8, 2006, at 1:31 PM, Misora Itsumo wrote:
>>>
>>>> i have already tried mpdcheck .
>>>>
>>>> [hewonty at hewonty ~]$ mpdcheck -s
>>>> server listening at INADDR_ANY on: hewonty 32775
>>>> server has conn on <socket._socketobject object at 0xb7f7838c> from
>>>> (' 192.168.2.3', 56366)
>>>> server successfully recvd msg from client: hello_from_client_to_server
>>>> [hewonty at vm1 ~]$ mpdcheck -c hewonty 32775
>>>> client successfully recvd ack from server: ack_from_server_to_client
>>>>
>>>> [hewonty at vm1 ~]$ mpdcheck -s
>>>> server listening at INADDR_ANY on: vm1 32771
>>>> server has conn on <socket._socketobject object at 0xb7f5920c> from
>>>> ('192.168.2.2 ', 33169)
>>>> server successfully recvd msg from client: hello_from_client_to_server
>>>> [hewonty at hewonty ~]$ mpdcheck -c vm1 32771
>>>> client successfully recvd ack from server: ack_from_server_to_client
>>>>
>>>> The next thing , i tried to run mpd by hand , but got error like
>>>> the last post.
>>>>
>>>> [hewonty at vm1 ~]$ mpd &
>>>> [1] 2056
>>>> [hewonty at vm1 ~]$ mpdtrace -l
>>>> vm1_32772 ( 192.168.2.3)
>>>>
>>>>
>>>> [hewonty at hewonty ~]$ mpd -h vm1 -p 32772
>>>> hewonty_32846: mpd_uncaught_except_tb handling:
>>>>   exceptions.TypeError: sequence item 0: expected string, int found
>>>>     /usr/local/mpich2/bin/mpdlib.py  627  connect_lhs
>>>>         response = md5new(''.join([self.secretword,msg
>>>> ['randnum']])).digest()
>>>>     /usr/local/mpich2/bin/mpdlib.py  564  enter_ring
>>>>         numTries=ntries)
>>>>     /usr/local/mpich2/bin/mpd  231  run
>>>>         rhsHandler=self.handle_rhs_input)
>>>>     /usr/local/mpich2/bin/mpd  1344  ?
>>>>         mpd.run()
>>>>
>>>> If i run mpd in hewonty first i got :
>>>> [hewonty at hewonty ~]$ mpd &
>>>> [1] 4051
>>>> [hewonty at hewonty ~]$ mpdtrace -l
>>>> hewonty_32781 (192.168.2.2)
>>>>
>>>> [hewonty at vm1 ~]$mpd -h hewonty -p 32781
>>>> vm1_32776 (connect_lhs 621): invalid challenge from hewonty 32781: {}
>>>> vm1_32776 (enter_ring 566): lhs connect failed
>>>> vm1_32776 (run 233): failed to enter ring
>>>>
>>>> and in hewonty i get the error
>>>>
>>>> hewonty.homelinux.org_32781: mpd_uncaught_except_tb handling:
>>>>   exceptions.TypeError: sequence item 0: expected string, int found
>>>>     /usr/local/mpich2/bin/mpdlib.py  733
>>>> handle_ring_listener_connection
>>>>         newsock.correctChallengeResponse = \
>>>>     /usr/local/mpich2/bin/mpdlib.py  488  handle_active_streams
>>>>         handler(stream,*args)
>>>>     /usr/local/mpich2/bin/mpd  266  runmainloop
>>>>         rv = self.streamHandler.handle_active_streams (timeout=8.0)
>>>>     /usr/local/mpich2/bin/mpd  240  run
>>>>         self.runmainloop()
>>>>     /usr/local/mpich2/bin/mpd  1344  ?
>>>>         mpd.run()
>>>>
>>>> [1]+  Exit 1                  mpd
>>>>
>>>>
>>>> Regards,
>>>> Tiep.
>>>>
>>>> On 4/8/06, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
>>>> Try running the mpdcheck troubleshooting utility as described in
>>>> the installer's guide.
>>>>
>>>> Rajeev
>>>> From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich-
>>>> discuss at mcs.anl.gov] On Behalf Of Misora Itsumo
>>>> Sent: Friday, April 07, 2006 5:59 PM
>>>> To: mpich-discuss at mcs.anl.gov
>>>>
>>>> Subject: [MPICH] mpdboot error : fail to ping
>>>>
>>>> Hi
>>>> i'm new to MPICH2 and i just installed mpich2 , but i can't make it
>>>> run on a set of machines.
>>>>
>>>> i run mpich2 on 2 nodes  , hostnames are hewonty and vm1.
>>>> Here are some info
>>>>
>>>> [hewonty at hewonty ~]$ cat mpd.hosts
>>>> hewonty
>>>> vm1
>>>>
>>>> [hewonty at hewonty ~]$ cat /etc/hosts
>>>> 127.0.0.1       localhost.localdomain   localhost
>>>> 192.168.2.2     hewonty.homelinux.org   hewonty.vmnet1.org
>>>> hewonty
>>>> 192.168.2.3     vm1.hewonty.homelinux.org       vm1
>>>> 192.168.2.2     svn_server
>>>>
>>>> [hewonty at hewonty ~]$ mpdboot -n 2 -f mpd.hosts
>>>> mpdboot_hewonty (handle_mpd_output 359): failed to ping mpd on
>>>> hewonty; recvd output={}
>>>>
>>>> i can ssh to hewonty or vm1.
>>>>
>>>> I tried to run mannually by mpd and here are what i got
>>>>
>>>> [hewonty at vm1 ~]$ mpd &
>>>> [1] 2056
>>>> [hewonty at vm1 ~]$ mpdtrace -l
>>>> vm1_32772 (192.168.2.3)
>>>>
>>>>
>>>> [hewonty at hewonty ~]$ mpd -h vm1 -p 32772
>>>> hewonty_32846: mpd_uncaught_except_tb handling:
>>>>   exceptions.TypeError: sequence item 0: expected string, int found
>>>>     /usr/local/mpich2/bin/mpdlib.py  627  connect_lhs
>>>>         response = md5new(''.join([ self.secretword,msg
>>>> ['randnum']])).digest()
>>>>     /usr/local/mpich2/bin/mpdlib.py  564  enter_ring
>>>>         numTries=ntries)
>>>>     /usr/local/mpich2/bin/mpd  231  run
>>>>         rhsHandler= self.handle_rhs_input)
>>>>     /usr/local/mpich2/bin/mpd  1344  ?
>>>>         mpd.run()
>>>>
>>>> Thanks in advance.
>>>> Tiep
>>>>
>>>
>>>
>>
>




More information about the mpich-discuss mailing list