Hello everyone!<br>
<br>
I am trying to run my parallel program on a 9 machines, each with 2
Opteron processors. I am accessing all machines via ssh and I can 'ssh'
from one machine to another without the password.<br>
mpdboot command (as described in the documentation) produced a similar problem that I saw some other users in this list had: <br>
mpdboot_lx64a170 (handle_mpd_output 374): failed to ping mpd on lxsrv171; recvd output={}<br>
<br>
I tried mpdcheck -l to see what would happen and it didn't produce any output (is this good or bad?)<br>
<br>
When I 'manually' set the hosts and ports on machines lxsrv171 to lxsrv178 with<br>
mpd -n -h host -p port, where host and port I got via:<br>
mpdtrace -l on the machine that I
am calling mpiexec from (lxsrv170), the execution was finally
possible, however, did not give expected results - it seems that most
of the processes are not communicating with each other.<br>
(I tried a simple "ring" program to make sure this is not due to my code, but it behaves exactly the same).<br>
<br>
BTW, my hostfile looks like<br>
lxsrv170:2<br>
lxsrv171:2<br>
lxsrv172:2<br>
<br>
I would be most grateful if someone could help. Thanks in advance.<br>
<br>
Regards,<br>
Jovana<br>
...<br>
<br>
<br>
<br>