Dear all,
<div><br></div><div>I've been googling and browsing this list's archives and still have a problem in using mpiexec.hydra to run; i'm starting to use hydra, so I'm just using two nodes to make sure I get it all right. There was a thread on this subject last year, I checked all those issues, I covered them all, but the thread appears to have died.</div>
<div><br></div><div>The machines are at 192.168.0.1 and 192.168.0.2, both running ubuntu 10.10, both have the program I want use up and running (nwchem, just in case; it runs fine in each machine using mpiexec.hydra). I'm using the last version of mpich2 (grabbed it last monday).</div>
<div>I can ssh from each one into the other and into itself (if it matters...). The hosts file i use is:</div><div><a href="http://192.168.0.3:6">192.168.0.3:6</a></div><div><a href="http://192.168.0.1:4">192.168.0.1:4</a></div>
<div>I've also tried names, but no good.</div><div><br></div><div>When I try to use both nodes from machine 3 I get </div><div><div>[proxy@sm] main (./pm/pmiserv/pmi_proxy.c:108): unable to connect to the main server</div>
</div><div><br></div><div>From machine 1, the message is similar (machine 1 is sm, machine 3 is sm-comp).</div><div><br></div><div>FYI, the command I use is</div><div><div>mpiexec.hydra -bootstrap ssh -f hosts -np 10 -v /opt/nwchem-6.0/bin/LINUX64/nwchem test.nw</div>
</div><div><br></div><div>and the output is pasted at the end of this message.</div><div><br></div><div>Any hint is welcome,</div><div><br></div><div>All the best,</div><div>Gonçalo</div><div><br></div><div><div>==================================================================================================</div>
<div>mpiexec options:</div><div>----------------</div><div> Base path: /usr/bin/</div><div> Proxy port: 9899</div><div> Bootstrap server: ssh</div><div> Debug level: 1</div><div> Enable X: -1</div><div> Working dir: /wip</div>
<div><br></div><div> Global environment:</div><div> -------------------</div><div> LDFLAGS=-Lopt/gromacs-4.5.1-mopac7</div><div> MPILIB=-lfmpich -lmpich -lpmpich</div><div> MPI_INCLUDE=/usr/include/mpich2/</div>
<div> TERM=xterm</div><div> SHELL=/bin/bash</div><div> XDG_SESSION_COOKIE=d59b7913861333d766cd7af400018fd6-1296733763.439332-419157494</div><div> SSH_CLIENT=192.168.0.1 40484 22</div><div> SSH_TTY=/dev/pts/3</div>
<div> USER=goncalo</div><div> LD_LIBRARY_PATH=/opt/intel/Compiler/11.1/072/lib/intel64</div><div> NWCHEM_TOP=/opt/nwchem-6.0/</div><div> LS_COLORS=rs</div><div> MPI_LIB=(null)</div><div> USE_MPI=y</div>
<div>
LIB_DEFINES=-DDFLT_TOT_MEM</div><div> TOOLROOT=/opt/x86_open64-4.2.4</div><div> LIBS=-lmopac</div><div> MAIL=/var/mail/goncalo</div><div> PATH=/opt/x86_open64-4.2.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/bin-linux-tinker:/home/goncalo/g03</div>
<div> NWCHEM_MODULES=all</div><div> PWD=/wip</div><div> LANG=en_US.UTF-8</div><div> GAUSS_SCRDIR=/home/goncalo/scratch</div><div> GAUSS_EXEDIR=/home/goncalo/g03</div><div> MPI_LOC=/usr/share/mpich2/</div>
<div> g03root=/home/goncalo/g03</div><div> SHLVL=1</div><div> HOME=/home/goncalo</div><div> FC=/opt/intel/Compiler/11.1/072/bin/intel64/ifort</div><div> LOGNAME=goncalo</div><div> LARGE_FILES=TRUE</div>
<div>
SSH_CONNECTION=192.168.0.1 40484 192.168.0.3 22</div><div> LESSOPEN=| /usr/bin/lesspipe %s</div><div> CC=/opt/intel/Compiler/11.1/072/bin/intel64/icc</div><div> LESSCLOSE=/usr/bin/lesspipe %s %s</div><div> NWCHEM_TARGET=LINUX64</div>
<div> _=/usr/bin/mpiexec.hydra</div><div> OLDPWD=/home/goncalo/DATA-RECOVERY</div><div><br></div><div><br></div><div> Executable information:</div><div> **********************</div><div> Executable ID: 1</div>
<div> -----------------</div><div> Process count: 10</div><div> Executable: /opt/nwchem-6.0/bin/LINUX64/nwchem test.nw </div><div><br></div><div> Proxy information:</div><div> *********************</div>
<div> Proxy ID: 1</div><div> -----------------</div><div> Proxy name: 192.168.0.3</div><div> Process count: 6</div><div> Start PID: 0</div><div><br></div><div> Proxy exec list:</div>
<div> ....................</div><div> Exec: /opt/nwchem-6.0/bin/LINUX64/nwchem; Process count: 6</div><div> Exec: /opt/nwchem-6.0/bin/LINUX64/nwchem; Process count: 3</div><div> Proxy ID: 2</div>
<div> -----------------</div><div> Proxy name: 192.168.0.1</div><div> Process count: 1</div><div> Start PID: 6</div><div><br></div><div> Proxy exec list:</div><div> ....................</div>
<div> Exec: /opt/nwchem-6.0/bin/LINUX64/nwchem; Process count: 1</div><div><br></div><div>==================================================================================================</div><div><br></div><div>
[mpiexec@sm-comp] Timeout set to -1 (-1 means infinite)</div><div>[mpiexec@sm-comp] Got a PMI port string of sm-comp:46338</div><div>[mpiexec@sm-comp] Got a proxy port string of sm-comp:59737</div><div>Arguments being passed to proxy 0:</div>
<div>--version 1.2.1p1 --hostname 192.168.0.3 --global-core-count 7 --wdir /wip --pmi-port-str sm-comp:46338 --binding HYDRA_NULL --bindlib plpa --ckpointlib none --ckpoint-prefix HYDRA_NULL --global-inherited-env 38 'LDFLAGS=-Lopt/gromacs-4.5.1-mopac7' 'MPILIB=-lfmpich -lmpich -lpmpich' 'MPI_INCLUDE=/usr/include/mpich2/' 'TERM=xterm' 'SHELL=/bin/bash' 'XDG_SESSION_COOKIE=d59b7913861333d766cd7af400018fd6-1296733763.439332-419157494' 'SSH_CLIENT=192.168.0.1 40484 22' 'SSH_TTY=/dev/pts/3' 'USER=goncalo' 'LD_LIBRARY_PATH=/opt/intel/Compiler/11.1/072/lib/intel64' 'NWCHEM_TOP=/opt/nwchem-6.0/' 'LS_COLORS=rs' 'MPI_LIB=' 'USE_MPI=y' 'LIB_DEFINES=-DDFLT_TOT_MEM' 'TOOLROOT=/opt/x86_open64-4.2.4' 'LIBS=-lmopac' 'MAIL=/var/mail/goncalo' 'PATH=/opt/x86_open64-4.2.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/bin-linux-tinker:/home/goncalo/g03' 'NWCHEM_MODULES=all' 'PWD=/wip' 'LANG=en_US.UTF-8' 'GAUSS_SCRDIR=/home/goncalo/scratch' 'GAUSS_EXEDIR=/home/goncalo/g03' 'MPI_LOC=/usr/share/mpich2/' 'g03root=/home/goncalo/g03' 'SHLVL=1' 'HOME=/home/goncalo' 'FC=/opt/intel/Compiler/11.1/072/bin/intel64/ifort' 'LOGNAME=goncalo' 'LARGE_FILES=TRUE' 'SSH_CONNECTION=192.168.0.1 40484 192.168.0.3 22' 'LESSOPEN=| /usr/bin/lesspipe %s' 'CC=/opt/intel/Compiler/11.1/072/bin/intel64/icc' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'NWCHEM_TARGET=LINUX64' '_=/usr/bin/mpiexec.hydra' 'OLDPWD=/home/goncalo/DATA-RECOVERY' --global-user-env 0 --global-system-env 0 --genv-prop all --start-pid 0 --proxy-core-count 6 --exec --exec-proc-count 6 --exec-local-env 0 --exec-env-prop HYDRA_NULL /opt/nwchem-6.0/bin/LINUX64/nwchem test.nw --exec --exec-proc-count 3 --exec-local-env 0 --exec-env-prop HYDRA_NULL /opt/nwchem-6.0/bin/LINUX64/nwchem test.nw </div>
<div><br></div><div>Arguments being passed to proxy 1:</div><div>--version 1.2.1p1 --hostname 192.168.0.1 --global-core-count 7 --wdir /wip --pmi-port-str sm-comp:46338 --binding HYDRA_NULL --bindlib plpa --ckpointlib none --ckpoint-prefix HYDRA_NULL --global-inherited-env 38 'LDFLAGS=-Lopt/gromacs-4.5.1-mopac7' 'MPILIB=-lfmpich -lmpich -lpmpich' 'MPI_INCLUDE=/usr/include/mpich2/' 'TERM=xterm' 'SHELL=/bin/bash' 'XDG_SESSION_COOKIE=d59b7913861333d766cd7af400018fd6-1296733763.439332-419157494' 'SSH_CLIENT=192.168.0.1 40484 22' 'SSH_TTY=/dev/pts/3' 'USER=goncalo' 'LD_LIBRARY_PATH=/opt/intel/Compiler/11.1/072/lib/intel64' 'NWCHEM_TOP=/opt/nwchem-6.0/' 'LS_COLORS=rs' 'MPI_LIB=' 'USE_MPI=y' 'LIB_DEFINES=-DDFLT_TOT_MEM' 'TOOLROOT=/opt/x86_open64-4.2.4' 'LIBS=-lmopac' 'MAIL=/var/mail/goncalo' 'PATH=/opt/x86_open64-4.2.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/bin-linux-tinker:/home/goncalo/g03' 'NWCHEM_MODULES=all' 'PWD=/wip' 'LANG=en_US.UTF-8' 'GAUSS_SCRDIR=/home/goncalo/scratch' 'GAUSS_EXEDIR=/home/goncalo/g03' 'MPI_LOC=/usr/share/mpich2/' 'g03root=/home/goncalo/g03' 'SHLVL=1' 'HOME=/home/goncalo' 'FC=/opt/intel/Compiler/11.1/072/bin/intel64/ifort' 'LOGNAME=goncalo' 'LARGE_FILES=TRUE' 'SSH_CONNECTION=192.168.0.1 40484 192.168.0.3 22' 'LESSOPEN=| /usr/bin/lesspipe %s' 'CC=/opt/intel/Compiler/11.1/072/bin/intel64/icc' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'NWCHEM_TARGET=LINUX64' '_=/usr/bin/mpiexec.hydra' 'OLDPWD=/home/goncalo/DATA-RECOVERY' --global-user-env 0 --global-system-env 0 --genv-prop all --start-pid 6 --proxy-core-count 1 --exec --exec-proc-count 1 --exec-local-env 0 --exec-env-prop HYDRA_NULL /opt/nwchem-6.0/bin/LINUX64/nwchem test.nw </div>
<div><br></div><div>[mpiexec@sm-comp] Launching process: /usr/bin/ssh -x 192.168.0.3 /usr/bin/pmi_proxy --launch-mode 1 --proxy-port sm-comp:59737 --debug --bootstrap ssh --proxy-id 0 </div><div>[mpiexec@sm-comp] Launching process: /usr/bin/ssh -x 192.168.0.1 /usr/bin/pmi_proxy --launch-mode 1 --proxy-port sm-comp:59737 --debug --bootstrap ssh --proxy-id 1 </div>
<div>[proxy@sm] HYDU_sock_connect (./utils/sock/sock.c:128): unable to get host address (Connection timed out)</div><div>[proxy@sm] main (./pm/pmiserv/pmi_proxy.c:108): unable to connect to the main server</div><div><br>
</div>
</div>