[mpich-discuss] Run on Amazon AWS

Renato Tegon Forti re.tf at acm.org
Thu Apr 5 09:27:53 CDT 2012


Hi all,

I am trying running my system on Amazon AWS, but I can't have success!

Some background information:

System: Linux Ubuntu

Linux ip-10-248-13-102 3.2.0-12-virtual #21-Ubuntu SMP Tue Jan 31 20:42:32
UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

I have 2 nodes! My hosts.txt have:

10.248.13.102 # zone:  sa-east-1a
10.252.20.3 # zone:  sa-east-1b

And I have one NFS dir that hold all of needed files, exported from
"10.248.13.102" visible on "10.252.20.3"

But when I try run:

mpiexec -launcher ssh -verbose -f
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/Resources/hosts.txt -n 5
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/doksafe_process_engine
--app-cnf-file=/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/Resources
/nodes.cnf

Any idea?

Verbose output:

----------------------------------------------------------------------------
-----------------------------------------

ubuntu at ip-10-248-13-102:/Hades/Doksafe/System/3.0/Debug/Parallel/Processing$
./start-t.sh

host: 10.248.13.102

host: 10.252.20.3

 

============================================================================
======================

mpiexec options:

----------------

  Base path: /usr/bin/

  Launcher: ssh

  Debug level: 1

  Enable X: -1

 

  Global environment:

  -------------------

    SHELL=/bin/bash

    TERM=xterm

    SSH_CLIENT=189.55.11.94 58417 22

 
PERL5LIB=/home/ubuntu/perl5/lib/perl5/x86_64-linux-gnu-thread-multi:/home/ub
untu/perl5/lib/perl5

    PERL_MB_OPT=--install_base /home/ubuntu/perl5

    SSH_TTY=/dev/pts/1

    USER=ubuntu

 
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;0
1:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37
;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.l
zma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31
:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=
01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31
:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.r
z=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=0
1;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;3
5:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*
.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.m
p4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=0
1;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35
:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm
=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;
36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:
*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx
=00;36:*.xspf=00;36:

 
PATH=/home/ubuntu/perl5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bi
n:/sbin:/bin:/usr/games

    MAIL=/var/mail/ubuntu

    _=/usr/bin/mpiexec

    PWD=/Hades/Doksafe/System/3.0/Debug/Parallel/Processing

    LANG=en_US.UTF-8

    HOME=/home/ubuntu

    SHLVL=2

    PERL_LOCAL_LIB_ROOT=/home/ubuntu/perl5

    LOGNAME=ubuntu

    SSH_CONNECTION=189.55.11.94 58417 10.248.13.102 22

    LESSOPEN=| /usr/bin/lesspipe %s

    LESSCLOSE=/usr/bin/lesspipe %s %s

    PERL_MM_OPT=INSTALL_BASE=/home/ubuntu/perl5

 

  Hydra internal environment:

  ---------------------------

    GFORTRAN_UNBUFFERED_PRECONNECTED=y

 

 

    Proxy information:

    *********************

      [1] proxy: 10.248.13.102 (1 cores)

      Exec list:
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/doksafe_process_engine
(1 processes);
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/doksafe_process_engine
(1 processes);
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/doksafe_process_engine
(1 processes);

 

      [2] proxy: 10.252.20.3 (1 cores)

      Exec list:
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/doksafe_process_engine
(1 processes);
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/doksafe_process_engine
(1 processes);

 

 

============================================================================
======================

 

[mpiexec at ip-10-248-13-102] Timeout set to -1 (-1 means infinite)

[mpiexec at ip-10-248-13-102] Got a control port string of 10.248.13.102:47781

 

Proxy launch args: /usr/bin/hydra_pmi_proxy --control-port
10.248.13.102:47781 --debug --rmk user --launcher ssh --demux poll --pgid 0
--retries 10 --proxy-id

 

[mpiexec at ip-10-248-13-102] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1

Arguments being passed to proxy 0:

--version 1.4.1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME --hostname
10.248.13.102 --global-core-map 0,1,1 --filler-process-map 0,1,1
--global-process-count 5 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname
kvs_2973_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
--global-inherited-env 21 'SHELL=/bin/bash' 'TERM=xterm'
'SSH_CLIENT=189.55.11.94 58417 22'
'PERL5LIB=/home/ubuntu/perl5/lib/perl5/x86_64-linux-gnu-thread-multi:/home/u
buntu/perl5/lib/perl5' 'PERL_MB_OPT=--install_base /home/ubuntu/perl5'
'SSH_TTY=/dev/pts/1' 'USER=ubuntu'
'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;
01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=3
7;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.
lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;3
1:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2
=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;3
1:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.
rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=
01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;
35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:
*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.
mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=
01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;3
5:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cg
m=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00
;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36
:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.sp
x=00;36:*.xspf=00;36:'
'PATH=/home/ubuntu/perl5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/b
in:/sbin:/bin:/usr/games' 'MAIL=/var/mail/ubuntu' '_=/usr/bin/mpiexec'
'PWD=/Hades/Doksafe/System/3.0/Debug/Parallel/Processing' 'LANG=en_US.UTF-8'
'HOME=/home/ubuntu' 'SHLVL=2' 'PERL_LOCAL_LIB_ROOT=/home/ubuntu/perl5'
'LOGNAME=ubuntu' 'SSH_CONNECTION=189.55.11.94 58417 10.248.13.102 22'
'LESSOPEN=| /usr/bin/lesspipe %s' 'LESSCLOSE=/usr/bin/lesspipe %s %s'
'PERL_MM_OPT=INSTALL_BASE=/home/ubuntu/perl5' --global-user-env 0
--global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
--proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1
--exec-local-env 0 --exec-wdir
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing --exec-args 2
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/doksafe_process_engine
--app-cnf-file=/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/Resources
/nodes.cnf --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0
--exec-wdir /Hades/Doksafe/System/3.0/Debug/Parallel/Processing --exec-args
2 /Hades/Doksafe/System/3.0/Debug/Parallel/Processing/doksafe_process_engine
--app-cnf-file=/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/Resources
/nodes.cnf --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0
--exec-wdir /Hades/Doksafe/System/3.0/Debug/Parallel/Processing --exec-args
2 /Hades/Doksafe/System/3.0/Debug/Parallel/Processing/doksafe_process_engine
--app-cnf-file=/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/Resources
/nodes.cnf

 

[mpiexec at ip-10-248-13-102] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1

Arguments being passed to proxy 1:

--version 1.4.1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME --hostname
10.252.20.3 --global-core-map 1,1,0 --filler-process-map 1,1,0
--global-process-count 5 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname
kvs_2973_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
--global-inherited-env 21 'SHELL=/bin/bash' 'TERM=xterm'
'SSH_CLIENT=189.55.11.94 58417 22'
'PERL5LIB=/home/ubuntu/perl5/lib/perl5/x86_64-linux-gnu-thread-multi:/home/u
buntu/perl5/lib/perl5' 'PERL_MB_OPT=--install_base /home/ubuntu/perl5'
'SSH_TTY=/dev/pts/1' 'USER=ubuntu'
'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;
01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=3
7;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.
lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;3
1:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2
=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;3
1:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.
rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=
01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;
35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:
*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.
mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=
01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;3
5:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cg
m=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00
;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36
:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.sp
x=00;36:*.xspf=00;36:'
'PATH=/home/ubuntu/perl5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/b
in:/sbin:/bin:/usr/games' 'MAIL=/var/mail/ubuntu' '_=/usr/bin/mpiexec'
'PWD=/Hades/Doksafe/System/3.0/Debug/Parallel/Processing' 'LANG=en_US.UTF-8'
'HOME=/home/ubuntu' 'SHLVL=2' 'PERL_LOCAL_LIB_ROOT=/home/ubuntu/perl5'
'LOGNAME=ubuntu' 'SSH_CONNECTION=189.55.11.94 58417 10.248.13.102 22'
'LESSOPEN=| /usr/bin/lesspipe %s' 'LESSCLOSE=/usr/bin/lesspipe %s %s'
'PERL_MM_OPT=INSTALL_BASE=/home/ubuntu/perl5' --global-user-env 0
--global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
--proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1
--exec-local-env 0 --exec-wdir
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing --exec-args 2
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/doksafe_process_engine
--app-cnf-file=/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/Resources
/nodes.cnf --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0
--exec-wdir /Hades/Doksafe/System/3.0/Debug/Parallel/Processing --exec-args
2 /Hades/Doksafe/System/3.0/Debug/Parallel/Processing/doksafe_process_engine
--app-cnf-file=/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/Resources
/nodes.cnf

 

[mpiexec at ip-10-248-13-102] Launch arguments: /usr/bin/hydra_pmi_proxy
--control-port 10.248.13.102:47781 --debug --rmk user --launcher ssh --demux
poll --pgid 0 --retries 10 --proxy-id 0

[mpiexec at ip-10-248-13-102] Launch arguments: /usr/bin/ssh -x
ubuntu at 10.252.20.3 "/usr/bin/hydra_pmi_proxy" --control-port
10.248.13.102:47781 --debug --rmk user --launcher ssh --demux poll --pgid 0
--retries 10 --proxy-id 1

Loading app conf file:
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/Resources/nodes.cnf

Loading app conf file:
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/Resources/nodes.cnf

OpenMP active threads: 4

Loading app conf file:
/Hades/Doksafe/System/3.0/Debug/Parallel/Processing/Resources/nodes.cnf

OpenMP active threads: 4

OpenMP active threads: 4

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 0): init

pmi_version=1 pmi_subversion=1

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=response_to_init
pmi_version=1 pmi_subversion=1 rc=0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 8): init

pmi_version=1 pmi_subversion=1

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=response_to_init
pmi_version=1 pmi_subversion=1 rc=0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 0): get_maxes

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 0): get_appnum

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=appnum appnum=0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 0): get_my_kvsname

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=my_kvsname kvsname=kvs_2973_0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 0): get_my_kvsname

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=my_kvsname kvsname=kvs_2973_0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 0): get

kvsname=kvs_2973_0 key=PMI_process_mapping

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=get_result rc=0 msg=success
value=(vector,(0,2,1))

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 0): put

kvsname=kvs_2973_0 key=sharedFilename[0] value=/dev/shm/mpich_shar_tmpYmeyxK

[proxy:0:0 at ip-10-248-13-102] we don't understand this command put;
forwarding upstream

[mpiexec at ip-10-248-13-102] [pgid: 0] got PMI command: cmd=put
kvsname=kvs_2973_0 key=sharedFilename[0] value=/dev/shm/mpich_shar_tmpYmeyxK

[mpiexec at ip-10-248-13-102] PMI response to fd 6 pid 0: cmd=put_result rc=0
msg=success

[proxy:0:0 at ip-10-248-13-102] we don't understand the response put_result;
forwarding downstream

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 8): get_maxes

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 8): get_appnum

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=appnum appnum=0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 8): get_my_kvsname

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=my_kvsname kvsname=kvs_2973_0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 8): get_my_kvsname

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=my_kvsname kvsname=kvs_2973_0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 0): barrier_in

 

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 8): get

kvsname=kvs_2973_0 key=PMI_process_mapping

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=get_result rc=0 msg=success
value=(vector,(0,2,1))

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 8): barrier_in

 

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 6): init

pmi_version=1 pmi_subversion=1

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=response_to_init
pmi_version=1 pmi_subversion=1 rc=0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 6): get_maxes

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 6): get_appnum

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=appnum appnum=0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 6): get_my_kvsname

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=my_kvsname kvsname=kvs_2973_0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 6): get_my_kvsname

 

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=my_kvsname kvsname=kvs_2973_0

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 6): get

kvsname=kvs_2973_0 key=PMI_process_mapping

[proxy:0:0 at ip-10-248-13-102] PMI response: cmd=get_result rc=0 msg=success
value=(vector,(0,2,1))

[proxy:0:0 at ip-10-248-13-102] got pmi command (from 6): barrier_in

 

[proxy:0:0 at ip-10-248-13-102] forwarding command (cmd=barrier_in) upstream

[mpiexec at ip-10-248-13-102] [pgid: 0] got PMI command: cmd=barrier_in

Permission denied (publickey).

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120405/3e2f7513/attachment-0001.htm>


More information about the mpich-discuss mailing list