[MPICH] Problem setting up a ring

Brett Gordon brgordon at gmail.com
Wed Apr 4 09:41:15 CDT 2007


Hi Ralph,

Thanks for your response.

I did as you suggested, and it seems to work, but I still can't get
the ring running.

Terminal 1:
brgordon at veritas:~> mpdallexit    //Just to make sure nothing else is running
mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_brgordon);
possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
In case 1, you can start an mpd on this host with:
    mpd &
and you will be able to run jobs just on this host.
For more details on starting mpds on a set of hosts, see
the MPICH2 Installation Guide.
brgordon at veritas:~> mpdcheck
brgordon at veritas:~> mpdcheck -s
server listening at INADDR_ANY on: veritas 23768
server has conn on <socket._socketobject object at 0x2aaaaab42650>
from ('128.2.93.142', 25125)
server successfully recvd msg from client: hello_from_client_to_server
brgordon at veritas:~>

Terminal 2:
brgordon at veritas:~> mpdcheck -c veritas 23768
client successfully recvd ack from server: ack_from_server_to_client
brgordon at veritas:~>

I then tried to run mpdboot from the computer 'veritas', hoping to
bring up a ring with 'veritas' and 'elaine', and got the following:

brgordon at veritas:~> mpdboot -n 2 -f mpd.hosts --user=brgordon --verbose  --chkup
checking elaine
there are 2 hosts up (counting local)
running mpdallexit on veritas
LAUNCHED mpd on veritas  via
RUNNING: mpd on veritas
LAUNCHED mpd on elaine  via  veritas
mpdboot_veritas (handle_mpd_output 383): failed to connect to mpd on elaine

brgordon at veritas:~> less mpd.hosts
elaine
brgordon at veritas:~> less .mpd.conf
secretword=<my password>

Same files exist on 'elaine', but the host is listed as 'veritas'.

Thanks,
Brett



On 4/4/07, Ralph Butler <rbutler at mtsu.edu> wrote:
> In your output for mpdcheck below, there are jumbled lots of error
> msgs from
> an mpd as well.  Apparently you had started an mpd in that same
> window at some
> point.  Anyway, it is best to make sure that all mpd processes are
> killed before doing
> the mpdcheck.  Then, try it again.  As the manual suggests, it is
> pointless to try using
> mpdboot unless you have cleared up all issues first.  Even starting
> an mpd ring by
> hand is only recommended after successfully debugging with mpdcheck.
> So, I
> suggest trying mpdcheck again, first with no options.  Then, with -s
> in one window
> and -c n another.
>
> On TueApr 3, at Tue Apr 3 10:34PM, Brett Gordon wrote:
>
> > Hello,
> >
> > I have successfully installed mpich2-1.0.5 on two linux boxes. Both
> > succeed in the standard tests involving one host solving the 'cpi'
> > program.
> >
> > However, I'm running into two (probably related) problems:
> >
> > 1) When I try to run mpd as a server and client on the same computer
> > (as on page 31 of the install documentation), I get the following:
> >
> > brgordon at veritas:~> mpdcheck -s
> > server listening at INADDR_ANY on: veritas 23761
> > brgordon at veritas:~> mpdcheck -c veritas 23761
> > veritas_23761 (recv_dict_msg 549):recv_dict_msg: errmsg=:invalid
> > literal for int(): hello_fr:
> >  mpdtb:
> >    /home/brgordon/mpich2-install/bin/mpdlib.py,  549,  recv_dict_msg
> >    /home/brgordon/mpich2-install/bin/mpdlib.py,  989,
> > handle_ring_listener_connection
> >    /home/brgordon/mpich2-install/bin/mpdlib.py,  743,
> > handle_active_streams
> >    /home/brgordon/mpich2-install/bin/mpd,  286,  runmainloop
> >    /home/brgordon/mpich2-install/bin/mpd,  255,  run
> >    /home/brgordon/mpich2-install/bin/mpd,  1470,  ?
> >
> > veritas_23761 (handle_ring_listener_connection 993): INVALID msg from
> > new connection :('128.2.93.142', 16587): msg=:{}:
> > Traceback (most recent call last):
> >  File "/home/brgordon/mpich2-install/bin/mpdcheck", line 105, in ?
> >    msg = sock.recv(64)
> > socket.error: (104, 'Connection reset by peer')
> >
> > 2) I also can't get a ring to work. I have setup ssh to work without
> > using passwords ('ssh veritas date' works fine). The workaround for
> > mpdboot on page 9 of the install doc does not work for me, nor does
> > running 'mpdcheck -f mpd.hosts -ssh'.
> >
> > When I try to run mpdboot, I get
> > brgordon at elaine ~]$ mpdboot -n 2 -f mpd.hosts
> > mpdboot_elaine.tepper.cmu.edu (handle_mpd_output 383): failed to
> > connect to mpd on veritas
> >
> >
> > I feel like I'm getting close to having this working, so I would
> > greatly appreciate any help. Please let me know if there is more
> > information I can provide.
> >
> > Thanks,
> > Brett
> >
>
>




More information about the mpich-discuss mailing list