<div>I have a problem with MPICH2 on lenovo cluster when I start more than three nodes.</div>
<div> </div>
<div>The error info is as follows.</div>
<div>Could anyone give me some advice?Thanks </div>
<div> </div>
<div>Albert</div>
<div> </div>
<div>[root@c0107 ~]# mpdboot -n 2 -f mpd.hosts<br>mpdboot_c0107_0 (mpdboot 393): error trying to start mpd(boot) at 1 {'host': 'c0104', 'ncpus': 1, 'ifhn': ''}; output:<br> mpdboot_c0104_1 (err_exit 415): mpd failed to start correctly on c0104<br>
reason: 1: unable to ping local mpd;<br> invalid msg from mpd :{}:<br> ** mpd may have disappeared, perhaps due to mismatched secretwords<br> ** see msgs logged in syslog and /tmp/mpd2.logfile* on c0104<br> last printed output from mpd before becoming a daemon:<br>
37857<br> <br> mpdboot_c0104_1 (err_exit 421): contents of mpd logfile in /tmp:<br> logfile for mpd with pid 32501<br> c0104_37857: conn error in connect_lhs: No route to host<br> c0104_37857 (connect_lhs 542): failed to connect to lhs at c0107 46288<br>
c0104_37857 (enter_ring 500): lhs connect failed<br> c0104_37857 (run 215): failed to enter ring<br>mpdboot_c0107_0 (err_exit 415): mpd failed to start correctly on c0107<br>[root@c0107 ~]# ssh c0104<br>Last login: Wed Sep 29 19:29:06 2010 from console<br>
[root@c0104 ~]# mpdboot -n 2 -f mpd.hosts<br>[root@c0104 ~]# mpdtrace<br>c0104<br>c0107<br>[root@c0104 ~]# mpdboot -n 3 -f mpd.hosts<br>mpdboot_c0104_0 (mpdboot 406): error trying to start mpd(boot) at 2 {'host': 'c0108', 'ncpus': 1, 'ifhn': ''}; output:<br>
mpdboot_c0108_2 (err_exit 415): mpd failed to start correctly on c0108<br> reason: 2: unable to ping local mpd;<br> invalid msg from mpd :{}:<br> ** mpd may have disappeared, perhaps due to mismatched secretwords<br>
** see msgs logged in syslog and /tmp/mpd2.logfile* on c0108<br> last printed output from mpd before becoming a daemon:<br> 41819<br> <br> mpdboot_c0108_2 (err_exit 421): contents of mpd logfile in /tmp:<br> logfile for mpd with pid 4894<br>
c0108_41819: conn error in connect_rhs: No route to host<br> c0108_41819 (connect_rhs 602): failed to connect to rhs at 192.168.1.7 49518<br> c0108_41819 (enter_ring 513): rhs connect failed<br> c0108_41819 (run 215): failed to enter ring<br>
mpdboot_c0104_0 (err_exit 415): mpd failed to start correctly on c0104<br></div>