[mpich-discuss] using mpich2/hydra with ssh
Gonçalo C. Justino
goncalo.justino at gmail.com
Thu Feb 3 11:14:21 CST 2011
Dear all,
I've been googling and browsing this list's archives and still have a
problem in using mpiexec.hydra to run; i'm starting to use hydra, so I'm
just using two nodes to make sure I get it all right. There was a thread on
this subject last year, I checked all those issues, I covered them all, but
the thread appears to have died.
The machines are at 192.168.0.1 and 192.168.0.2, both running ubuntu 10.10,
both have the program I want use up and running (nwchem, just in case; it
runs fine in each machine using mpiexec.hydra). I'm using the last version
of mpich2 (grabbed it last monday).
I can ssh from each one into the other and into itself (if it matters...).
The hosts file i use is:
192.168.0.3:6
192.168.0.1:4
I've also tried names, but no good.
When I try to use both nodes from machine 3 I get
[proxy at sm] main (./pm/pmiserv/pmi_proxy.c:108): unable to connect to the
main server
>From machine 1, the message is similar (machine 1 is sm, machine 3 is
sm-comp).
FYI, the command I use is
mpiexec.hydra -bootstrap ssh -f hosts -np 10
-v /opt/nwchem-6.0/bin/LINUX64/nwchem test.nw
and the output is pasted at the end of this message.
Any hint is welcome,
All the best,
Gonçalo
==================================================================================================
mpiexec options:
----------------
Base path: /usr/bin/
Proxy port: 9899
Bootstrap server: ssh
Debug level: 1
Enable X: -1
Working dir: /wip
Global environment:
-------------------
LDFLAGS=-Lopt/gromacs-4.5.1-mopac7
MPILIB=-lfmpich -lmpich -lpmpich
MPI_INCLUDE=/usr/include/mpich2/
TERM=xterm
SHELL=/bin/bash
XDG_SESSION_COOKIE=d59b7913861333d766cd7af400018fd6-1296733763.439332-419157494
SSH_CLIENT=192.168.0.1 40484 22
SSH_TTY=/dev/pts/3
USER=goncalo
LD_LIBRARY_PATH=/opt/intel/Compiler/11.1/072/lib/intel64
NWCHEM_TOP=/opt/nwchem-6.0/
LS_COLORS=rs
MPI_LIB=(null)
USE_MPI=y
LIB_DEFINES=-DDFLT_TOT_MEM
TOOLROOT=/opt/x86_open64-4.2.4
LIBS=-lmopac
MAIL=/var/mail/goncalo
PATH=/opt/x86_open64-4.2.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/bin-linux-tinker:/home/goncalo/g03
NWCHEM_MODULES=all
PWD=/wip
LANG=en_US.UTF-8
GAUSS_SCRDIR=/home/goncalo/scratch
GAUSS_EXEDIR=/home/goncalo/g03
MPI_LOC=/usr/share/mpich2/
g03root=/home/goncalo/g03
SHLVL=1
HOME=/home/goncalo
FC=/opt/intel/Compiler/11.1/072/bin/intel64/ifort
LOGNAME=goncalo
LARGE_FILES=TRUE
SSH_CONNECTION=192.168.0.1 40484 192.168.0.3 22
LESSOPEN=| /usr/bin/lesspipe %s
CC=/opt/intel/Compiler/11.1/072/bin/intel64/icc
LESSCLOSE=/usr/bin/lesspipe %s %s
NWCHEM_TARGET=LINUX64
_=/usr/bin/mpiexec.hydra
OLDPWD=/home/goncalo/DATA-RECOVERY
Executable information:
**********************
Executable ID: 1
-----------------
Process count: 10
Executable: /opt/nwchem-6.0/bin/LINUX64/nwchem test.nw
Proxy information:
*********************
Proxy ID: 1
-----------------
Proxy name: 192.168.0.3
Process count: 6
Start PID: 0
Proxy exec list:
....................
Exec: /opt/nwchem-6.0/bin/LINUX64/nwchem; Process count: 6
Exec: /opt/nwchem-6.0/bin/LINUX64/nwchem; Process count: 3
Proxy ID: 2
-----------------
Proxy name: 192.168.0.1
Process count: 1
Start PID: 6
Proxy exec list:
....................
Exec: /opt/nwchem-6.0/bin/LINUX64/nwchem; Process count: 1
==================================================================================================
[mpiexec at sm-comp] Timeout set to -1 (-1 means infinite)
[mpiexec at sm-comp] Got a PMI port string of sm-comp:46338
[mpiexec at sm-comp] Got a proxy port string of sm-comp:59737
Arguments being passed to proxy 0:
--version 1.2.1p1 --hostname 192.168.0.3 --global-core-count 7 --wdir /wip
--pmi-port-str sm-comp:46338 --binding HYDRA_NULL --bindlib plpa
--ckpointlib none --ckpoint-prefix HYDRA_NULL --global-inherited-env 38
'LDFLAGS=-Lopt/gromacs-4.5.1-mopac7' 'MPILIB=-lfmpich -lmpich -lpmpich'
'MPI_INCLUDE=/usr/include/mpich2/' 'TERM=xterm' 'SHELL=/bin/bash'
'XDG_SESSION_COOKIE=d59b7913861333d766cd7af400018fd6-1296733763.439332-419157494'
'SSH_CLIENT=192.168.0.1 40484 22' 'SSH_TTY=/dev/pts/3' 'USER=goncalo'
'LD_LIBRARY_PATH=/opt/intel/Compiler/11.1/072/lib/intel64'
'NWCHEM_TOP=/opt/nwchem-6.0/' 'LS_COLORS=rs' 'MPI_LIB=' 'USE_MPI=y'
'LIB_DEFINES=-DDFLT_TOT_MEM' 'TOOLROOT=/opt/x86_open64-4.2.4' 'LIBS=-lmopac'
'MAIL=/var/mail/goncalo'
'PATH=/opt/x86_open64-4.2.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/bin-linux-tinker:/home/goncalo/g03'
'NWCHEM_MODULES=all' 'PWD=/wip' 'LANG=en_US.UTF-8'
'GAUSS_SCRDIR=/home/goncalo/scratch' 'GAUSS_EXEDIR=/home/goncalo/g03'
'MPI_LOC=/usr/share/mpich2/' 'g03root=/home/goncalo/g03' 'SHLVL=1'
'HOME=/home/goncalo' 'FC=/opt/intel/Compiler/11.1/072/bin/intel64/ifort'
'LOGNAME=goncalo' 'LARGE_FILES=TRUE' 'SSH_CONNECTION=192.168.0.1 40484
192.168.0.3 22' 'LESSOPEN=| /usr/bin/lesspipe %s'
'CC=/opt/intel/Compiler/11.1/072/bin/intel64/icc'
'LESSCLOSE=/usr/bin/lesspipe %s %s' 'NWCHEM_TARGET=LINUX64'
'_=/usr/bin/mpiexec.hydra' 'OLDPWD=/home/goncalo/DATA-RECOVERY'
--global-user-env 0 --global-system-env 0 --genv-prop all --start-pid 0
--proxy-core-count 6 --exec --exec-proc-count 6 --exec-local-env 0
--exec-env-prop HYDRA_NULL /opt/nwchem-6.0/bin/LINUX64/nwchem test.nw --exec
--exec-proc-count 3 --exec-local-env 0 --exec-env-prop HYDRA_NULL
/opt/nwchem-6.0/bin/LINUX64/nwchem test.nw
Arguments being passed to proxy 1:
--version 1.2.1p1 --hostname 192.168.0.1 --global-core-count 7 --wdir /wip
--pmi-port-str sm-comp:46338 --binding HYDRA_NULL --bindlib plpa
--ckpointlib none --ckpoint-prefix HYDRA_NULL --global-inherited-env 38
'LDFLAGS=-Lopt/gromacs-4.5.1-mopac7' 'MPILIB=-lfmpich -lmpich -lpmpich'
'MPI_INCLUDE=/usr/include/mpich2/' 'TERM=xterm' 'SHELL=/bin/bash'
'XDG_SESSION_COOKIE=d59b7913861333d766cd7af400018fd6-1296733763.439332-419157494'
'SSH_CLIENT=192.168.0.1 40484 22' 'SSH_TTY=/dev/pts/3' 'USER=goncalo'
'LD_LIBRARY_PATH=/opt/intel/Compiler/11.1/072/lib/intel64'
'NWCHEM_TOP=/opt/nwchem-6.0/' 'LS_COLORS=rs' 'MPI_LIB=' 'USE_MPI=y'
'LIB_DEFINES=-DDFLT_TOT_MEM' 'TOOLROOT=/opt/x86_open64-4.2.4' 'LIBS=-lmopac'
'MAIL=/var/mail/goncalo'
'PATH=/opt/x86_open64-4.2.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/bin-linux-tinker:/home/goncalo/g03'
'NWCHEM_MODULES=all' 'PWD=/wip' 'LANG=en_US.UTF-8'
'GAUSS_SCRDIR=/home/goncalo/scratch' 'GAUSS_EXEDIR=/home/goncalo/g03'
'MPI_LOC=/usr/share/mpich2/' 'g03root=/home/goncalo/g03' 'SHLVL=1'
'HOME=/home/goncalo' 'FC=/opt/intel/Compiler/11.1/072/bin/intel64/ifort'
'LOGNAME=goncalo' 'LARGE_FILES=TRUE' 'SSH_CONNECTION=192.168.0.1 40484
192.168.0.3 22' 'LESSOPEN=| /usr/bin/lesspipe %s'
'CC=/opt/intel/Compiler/11.1/072/bin/intel64/icc'
'LESSCLOSE=/usr/bin/lesspipe %s %s' 'NWCHEM_TARGET=LINUX64'
'_=/usr/bin/mpiexec.hydra' 'OLDPWD=/home/goncalo/DATA-RECOVERY'
--global-user-env 0 --global-system-env 0 --genv-prop all --start-pid 6
--proxy-core-count 1 --exec --exec-proc-count 1 --exec-local-env 0
--exec-env-prop HYDRA_NULL /opt/nwchem-6.0/bin/LINUX64/nwchem test.nw
[mpiexec at sm-comp] Launching process: /usr/bin/ssh -x 192.168.0.3
/usr/bin/pmi_proxy --launch-mode 1 --proxy-port sm-comp:59737 --debug
--bootstrap ssh --proxy-id 0
[mpiexec at sm-comp] Launching process: /usr/bin/ssh -x 192.168.0.1
/usr/bin/pmi_proxy --launch-mode 1 --proxy-port sm-comp:59737 --debug
--bootstrap ssh --proxy-id 1
[proxy at sm] HYDU_sock_connect (./utils/sock/sock.c:128): unable to get host
address (Connection timed out)
[proxy at sm] main (./pm/pmiserv/pmi_proxy.c:108): unable to connect to the
main server
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110203/14a6dab0/attachment.htm>
More information about the mpich-discuss
mailing list