<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=us-ascii" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.18828"></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=078443601-20102009><FONT color=#0000ff
size=2 face=Arial>Did you follow all the debugging steps with mpdcheck as
described in Appendix A.2 of the installation guide?</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=078443601-20102009><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=078443601-20102009><FONT color=#0000ff
size=2 face=Arial>Rajeev</FONT></SPAN></DIV><BR>
<BLOCKQUOTE
style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px">
<DIV dir=ltr lang=en-us class=OutlookMessageHeader align=left>
<HR tabIndex=-1>
<FONT size=2 face=Tahoma><B>From:</B> mpich2-dev-bounces@mcs.anl.gov
[mailto:mpich2-dev-bounces@mcs.anl.gov] <B>On Behalf Of </B>Jovana
Knezevic<BR><B>Sent:</B> Monday, October 19, 2009 9:14 AM<BR><B>To:</B>
mpich2-dev@mcs.anl.gov<BR><B>Subject:</B> [mpich2-dev] mpiexec (+mpdboot,
mpdcheck...) problem<BR></FONT><BR></DIV>
<DIV></DIV>Hello everyone!<BR><BR>I am trying to run my parallel program on a
9 machines, each with 2 Opteron processors. I am accessing all machines via
ssh and I can 'ssh' from one machine to another without the
password.<BR>mpdboot command (as described in the documentation) produced a
similar problem that I saw some other users in this list had:
<BR>mpdboot_lx64a170 (handle_mpd_output 374): failed to ping mpd on lxsrv171;
recvd output={}<BR><BR>I tried mpdcheck -l to see what would happen and it
didn't produce any output (is this good or bad?)<BR><BR>When I 'manually' set
the hosts and ports on machines lxsrv171 to lxsrv178 with<BR>mpd -n -h host -p
port, where host and port I got via:<BR>mpdtrace -l on the
machine that I am calling mpiexec from (lxsrv170), the
execution was finally possible, however, did not give expected results - it
seems that most of the processes are not communicating with each other.<BR>(I
tried a simple "ring" program to make sure this is not due to my code, but it
behaves exactly the same).<BR><BR>BTW, my hostfile looks
like<BR>lxsrv170:2<BR>lxsrv171:2<BR>lxsrv172:2<BR><BR>I would be most grateful
if someone could help. Thanks in
advance.<BR><BR>Regards,<BR>Jovana<BR>...<BR><BR> <BR><BR></BLOCKQUOTE></BODY></HTML>