[MPICH] Problems running mpd with n > mpd processes
Tony Keating
akeating at eng.umd.edu
Mon Sep 19 15:58:56 CDT 2005
Hi,
I'm trying to get mpd up and running on a small (2 dual processor) cluster.
I have it working fine with one processes per mpd processes (per box),
but I'm having difficulties when running two processes per mpd
processes. Here is some more info:
On the head node:
~# mpd --ifhn=192.168.1.1
barolo.umd.edu_mpdman_2: conn error in connect_lhs: Connection refused
barolo.umd.edu_mpdman_2: conn error in connect_lhs: Connection refused
barolo.umd.edu_mpdman_2: conn error in connect_lhs: Connection refused
barolo.umd.edu_mpdman_2: conn error in connect_lhs: Connection refused
barolo.umd.edu_mpdman_2: conn error in connect_lhs: Connection refused
barolo.umd.edu_mpdman_2: conn error in connect_lhs: Connection refused
barolo.umd.edu_mpdman_2: conn error in connect_lhs: Connection refused
barolo.umd.edu_mpdman_2: conn error in connect_lhs: Connection refused
barolo.umd.edu_mpdman_2 (connect_lhs 542): failed to connect to lhs at
127.0.0.1 33093
barolo.umd.edu_mpdman_2 (run 172): lhs connect failed
I tried running 2 processes which works fine, then with four things just
hang and I get the above errors and need to press ctrl-C to break out:
~# mpdrun -n 2 hostname
barolo.umd.edu
c01
~# mpdrun -n 4 hostname
mpdrun_barolo.umd.edu (mpdrun 276): mpdrun: failed to obtain sock from
manager
On the other node (c01)
~# mpd -h barolo.umd.edu -p 33450
c01_mpdman_3 (connect_lhs 554): invalid challenge from 192.168.1.1 33471: {}
c01_mpdman_3 (run 155): lhs connect failed
Anybody have any ideas? I have a feeling it has to do with the
networking setup here, but I'm not 100% sure how to fix it.
Tony.
More information about the mpich-discuss
mailing list