[MPICH] Problem setting up a ring

Ralph Butler rbutler at mtsu.edu
Wed Apr 4 09:07:55 CDT 2007


In your output for mpdcheck below, there are jumbled lots of error  
msgs from
an mpd as well.  Apparently you had started an mpd in that same  
window at some
point.  Anyway, it is best to make sure that all mpd processes are  
killed before doing
the mpdcheck.  Then, try it again.  As the manual suggests, it is  
pointless to try using
mpdboot unless you have cleared up all issues first.  Even starting  
an mpd ring by
hand is only recommended after successfully debugging with mpdcheck.   
So, I
suggest trying mpdcheck again, first with no options.  Then, with -s  
in one window
and -c n another.

On TueApr 3, at Tue Apr 3 10:34PM, Brett Gordon wrote:

> Hello,
>
> I have successfully installed mpich2-1.0.5 on two linux boxes. Both
> succeed in the standard tests involving one host solving the 'cpi'
> program.
>
> However, I'm running into two (probably related) problems:
>
> 1) When I try to run mpd as a server and client on the same computer
> (as on page 31 of the install documentation), I get the following:
>
> brgordon at veritas:~> mpdcheck -s
> server listening at INADDR_ANY on: veritas 23761
> brgordon at veritas:~> mpdcheck -c veritas 23761
> veritas_23761 (recv_dict_msg 549):recv_dict_msg: errmsg=:invalid
> literal for int(): hello_fr:
>  mpdtb:
>    /home/brgordon/mpich2-install/bin/mpdlib.py,  549,  recv_dict_msg
>    /home/brgordon/mpich2-install/bin/mpdlib.py,  989,
> handle_ring_listener_connection
>    /home/brgordon/mpich2-install/bin/mpdlib.py,  743,   
> handle_active_streams
>    /home/brgordon/mpich2-install/bin/mpd,  286,  runmainloop
>    /home/brgordon/mpich2-install/bin/mpd,  255,  run
>    /home/brgordon/mpich2-install/bin/mpd,  1470,  ?
>
> veritas_23761 (handle_ring_listener_connection 993): INVALID msg from
> new connection :('128.2.93.142', 16587): msg=:{}:
> Traceback (most recent call last):
>  File "/home/brgordon/mpich2-install/bin/mpdcheck", line 105, in ?
>    msg = sock.recv(64)
> socket.error: (104, 'Connection reset by peer')
>
> 2) I also can't get a ring to work. I have setup ssh to work without
> using passwords ('ssh veritas date' works fine). The workaround for
> mpdboot on page 9 of the install doc does not work for me, nor does
> running 'mpdcheck -f mpd.hosts -ssh'.
>
> When I try to run mpdboot, I get
> brgordon at elaine ~]$ mpdboot -n 2 -f mpd.hosts
> mpdboot_elaine.tepper.cmu.edu (handle_mpd_output 383): failed to
> connect to mpd on veritas
>
>
> I feel like I'm getting close to having this working, so I would
> greatly appreciate any help. Please let me know if there is more
> information I can provide.
>
> Thanks,
> Brett
>




More information about the mpich-discuss mailing list