[mpich-discuss] Not able to run MPI program parallely...
Pavan Balaji
balaji at mcs.anl.gov
Tue May 1 13:41:10 CDT 2012
Can you please respond to both questions?
On 05/01/2012 01:39 PM, Albert Spade wrote:
> Here is the output of :
> #mpiexec -verbose -f hosts -n 4 /opt/mpich2-1.4.1p1/examples/./cpi>>abc
> #cat abc
>
> --version 1.4.1p1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME
> --hostname beowulf.master --global-core-map 0,1,3 --filler-process-map
> 0,1,3 --global-process-count 4 --auto-cleanup 1 --pmi-rank -1
> --pmi-kvsname kvs_9793_0 --pmi-process-mapping (vector,(0,5,1))
> --ckpoint-num -1 --global-inherited-env 27 'HOSTNAME=beowulf.master'
> 'SELINUX_ROLE_REQUESTED=' 'TERM=xterm' 'SHELL=/bin/bash' 'HISTSIZE=1000'
> 'SSH_CLIENT=115.242.17.227 55995 22' 'SELINUX_USE_CURRENT_RANGE='
> 'QTDIR=/usr/lib/qt-3.3' 'QTINC=/usr/lib/qt-3.3/include'
> 'SSH_TTY=/dev/pts/0' 'USER=root'
> 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;3
5:*.xcf=0
1;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:'
> 'MAIL=/var/spool/mail/root'
> 'PATH=/opt/mpich2-1.4.1p1/bin:/usr/lib/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin'
> 'PWD=/root' 'LANG=en_US.UTF-8' 'SELINUX_LEVEL_REQUESTED='
> 'HISTCONTROL=ignoredups' 'SHLVL=1' 'HOME=/root' 'LOGNAME=root'
> 'QTLIB=/usr/lib/qt-3.3/lib' 'CVS_RSH=ssh' 'SSH_CONNECTION=115.242.17.227
> 55995 172.16.20.31 22' 'LESSOPEN=|/usr/bin/lesspipe.sh %s'
> 'G_BROKEN_FILENAMES=1' '_=/opt/mpich2-1.4.1p1/bin/mpiexec'
> --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /root
> --exec-args 1 /opt/mpich2-1.4.1p1/examples/./cpi
>
> [mpiexec at beowulf.master] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
> Arguments being passed to proxy 1:
> --version 1.4.1p1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME
> --hostname beowulf.node1 --global-core-map 1,1,2 --filler-process-map
> 1,1,2 --global-process-count 4 --auto-cleanup 1 --pmi-rank -1
> --pmi-kvsname kvs_9793_0 --pmi-process-mapping (vector,(0,5,1))
> --ckpoint-num -1 --global-inherited-env 27 'HOSTNAME=beowulf.master'
> 'SELINUX_ROLE_REQUESTED=' 'TERM=xterm' 'SHELL=/bin/bash' 'HISTSIZE=1000'
> 'SSH_CLIENT=115.242.17.227 55995 22' 'SELINUX_USE_CURRENT_RANGE='
> 'QTDIR=/usr/lib/qt-3.3' 'QTINC=/usr/lib/qt-3.3/include'
> 'SSH_TTY=/dev/pts/0' 'USER=root'
> 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;3
5:*.xcf=0
1;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:'
> 'MAIL=/var/spool/mail/root'
> 'PATH=/opt/mpich2-1.4.1p1/bin:/usr/lib/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin'
> 'PWD=/root' 'LANG=en_US.UTF-8' 'SELINUX_LEVEL_REQUESTED='
> 'HISTCONTROL=ignoredups' 'SHLVL=1' 'HOME=/root' 'LOGNAME=root'
> 'QTLIB=/usr/lib/qt-3.3/lib' 'CVS_RSH=ssh' 'SSH_CONNECTION=115.242.17.227
> 55995 172.16.20.31 22' 'LESSOPEN=|/usr/bin/lesspipe.sh %s'
> 'G_BROKEN_FILENAMES=1' '_=/opt/mpich2-1.4.1p1/bin/mpiexec'
> --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /root
> --exec-args 1 /opt/mpich2-1.4.1p1/examples/./cpi
>
> [mpiexec at beowulf.master] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
> Arguments being passed to proxy 2:
> --version 1.4.1p1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME
> --hostname beowulf.node2 --global-core-map 2,1,1 --filler-process-map
> 2,1,1 --global-process-count 4 --auto-cleanup 1 --pmi-rank -1
> --pmi-kvsname kvs_9793_0 --pmi-process-mapping (vector,(0,5,1))
> --ckpoint-num -1 --global-inherited-env 27 'HOSTNAME=beowulf.master'
> 'SELINUX_ROLE_REQUESTED=' 'TERM=xterm' 'SHELL=/bin/bash' 'HISTSIZE=1000'
> 'SSH_CLIENT=115.242.17.227 55995 22' 'SELINUX_USE_CURRENT_RANGE='
> 'QTDIR=/usr/lib/qt-3.3' 'QTINC=/usr/lib/qt-3.3/include'
> 'SSH_TTY=/dev/pts/0' 'USER=root'
> 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;3
5:*.xcf=0
1;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:'
> 'MAIL=/var/spool/mail/root'
> 'PATH=/opt/mpich2-1.4.1p1/bin:/usr/lib/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin'
> 'PWD=/root' 'LANG=en_US.UTF-8' 'SELINUX_LEVEL_REQUESTED='
> 'HISTCONTROL=ignoredups' 'SHLVL=1' 'HOME=/root' 'LOGNAME=root'
> 'QTLIB=/usr/lib/qt-3.3/lib' 'CVS_RSH=ssh' 'SSH_CONNECTION=115.242.17.227
> 55995 172.16.20.31 22' 'LESSOPEN=|/usr/bin/lesspipe.sh %s'
> 'G_BROKEN_FILENAMES=1' '_=/opt/mpich2-1.4.1p1/bin/mpiexec'
> --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /root
> --exec-args 1 /opt/mpich2-1.4.1p1/examples/./cpi
>
> [mpiexec at beowulf.master] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
> Arguments being passed to proxy 3:
> --version 1.4.1p1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME
> --hostname beowulf.node3 --global-core-map 3,1,0 --filler-process-map
> 3,1,0 --global-process-count 4 --auto-cleanup 1 --pmi-rank -1
> --pmi-kvsname kvs_9793_0 --pmi-process-mapping (vector,(0,5,1))
> --ckpoint-num -1 --global-inherited-env 27 'HOSTNAME=beowulf.master'
> 'SELINUX_ROLE_REQUESTED=' 'TERM=xterm' 'SHELL=/bin/bash' 'HISTSIZE=1000'
> 'SSH_CLIENT=115.242.17.227 55995 22' 'SELINUX_USE_CURRENT_RANGE='
> 'QTDIR=/usr/lib/qt-3.3' 'QTINC=/usr/lib/qt-3.3/include'
> 'SSH_TTY=/dev/pts/0' 'USER=root'
> 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;3
5:*.xcf=0
1;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:'
> 'MAIL=/var/spool/mail/root'
> 'PATH=/opt/mpich2-1.4.1p1/bin:/usr/lib/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin'
> 'PWD=/root' 'LANG=en_US.UTF-8' 'SELINUX_LEVEL_REQUESTED='
> 'HISTCONTROL=ignoredups' 'SHLVL=1' 'HOME=/root' 'LOGNAME=root'
> 'QTLIB=/usr/lib/qt-3.3/lib' 'CVS_RSH=ssh' 'SSH_CONNECTION=115.242.17.227
> 55995 172.16.20.31 22' 'LESSOPEN=|/usr/bin/lesspipe.sh %s'
> 'G_BROKEN_FILENAMES=1' '_=/opt/mpich2-1.4.1p1/bin/mpiexec'
> --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /root
> --exec-args 1 /opt/mpich2-1.4.1p1/examples/./cpi
>
> [mpiexec at beowulf.master] Launch arguments:
> /opt/mpich2-1.4.1p1/bin/hydra_pmi_proxy --control-port
> beowulf.master:60190 --debug --rmk user --launcher ssh --demux poll
> --pgid 0 --retries 10 --proxy-id 0
> [mpiexec at beowulf.master] Launch arguments:
> /opt/mpich2-1.4.1p1/bin/hydra_pmi_proxy --control-port
> beowulf.master:60190 --debug --rmk user --launcher ssh --demux poll
> --pgid 0 --retries 10 --proxy-id 1
> [mpiexec at beowulf.master] Launch arguments:
> /opt/mpich2-1.4.1p1/bin/hydra_pmi_proxy --control-port
> beowulf.master:60190 --debug --rmk user --launcher ssh --demux poll
> --pgid 0 --retries 10 --proxy-id 2
> [mpiexec at beowulf.master] Launch arguments:
> /opt/mpich2-1.4.1p1/bin/hydra_pmi_proxy --control-port
> beowulf.master:60190 --debug --rmk user --launcher ssh --demux poll
> --pgid 0 --retries 10 --proxy-id 3
> [proxy:0:2 at beowulf.master] got pmi command (from 0): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:2 at beowulf.master] PMI response: cmd=response_to_init
> pmi_version=1 pmi_subversion=1 rc=0
> [proxy:0:2 at beowulf.master] got pmi command (from 0): get_maxes
>
> [proxy:0:2 at beowulf.master] PMI response: cmd=maxes kvsname_max=256
> keylen_max=64 vallen_max=1024
> [proxy:0:2 at beowulf.master] got pmi command (from 0): get_appnum
>
> [proxy:0:2 at beowulf.master] PMI response: cmd=appnum appnum=0
> [proxy:0:2 at beowulf.master] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:2 at beowulf.master] PMI response: cmd=my_kvsname kvsname=kvs_9793_0
> [proxy:0:2 at beowulf.master] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:2 at beowulf.master] PMI response: cmd=my_kvsname kvsname=kvs_9793_0
> [proxy:0:2 at beowulf.master] got pmi command (from 0): get
> kvsname=kvs_9793_0 key=PMI_process_mapping
> [proxy:0:2 at beowulf.master] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,5,1))
> [proxy:0:2 at beowulf.master] got pmi command (from 0): barrier_in
>
> [proxy:0:2 at beowulf.master] forwarding command (cmd=barrier_in) upstream
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:1 at beowulf.master] got pmi command (from 0): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:1 at beowulf.master] PMI response: cmd=response_to_init
> pmi_version=1 pmi_subversion=1 rc=0
> [proxy:0:0 at beowulf.master] got pmi command (from 0): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:0 at beowulf.master] PMI response: cmd=response_to_init
> pmi_version=1 pmi_subversion=1 rc=0
> [proxy:0:0 at beowulf.master] got pmi command (from 0): get_maxes
>
> [proxy:0:0 at beowulf.master] PMI response: cmd=maxes kvsname_max=256
> keylen_max=64 vallen_max=1024
> [proxy:0:0 at beowulf.master] got pmi command (from 0): get_appnum
>
> [proxy:0:0 at beowulf.master] PMI response: cmd=appnum appnum=0
> [proxy:0:0 at beowulf.master] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:0 at beowulf.master] PMI response: cmd=my_kvsname kvsname=kvs_9793_0
> [proxy:0:0 at beowulf.master] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:0 at beowulf.master] PMI response: cmd=my_kvsname kvsname=kvs_9793_0
> [proxy:0:0 at beowulf.master] got pmi command (from 0): get
> kvsname=kvs_9793_0 key=PMI_process_mapping
> [proxy:0:0 at beowulf.master] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,5,1))
> [proxy:0:0 at beowulf.master] got pmi command (from 0): barrier_in
>
> [proxy:0:0 at beowulf.master] forwarding command (cmd=barrier_in) upstream
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:1 at beowulf.master] got pmi command (from 0): get_maxes
>
> [proxy:0:1 at beowulf.master] PMI response: cmd=maxes kvsname_max=256
> keylen_max=64 vallen_max=1024
> [proxy:0:3 at beowulf.master] got pmi command (from 0): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:3 at beowulf.master] PMI response: cmd=response_to_init
> pmi_version=1 pmi_subversion=1 rc=0
> [proxy:0:1 at beowulf.master] got pmi command (from 0): get_appnum
>
> [proxy:0:1 at beowulf.master] PMI response: cmd=appnum appnum=0
> [proxy:0:1 at beowulf.master] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:1 at beowulf.master] PMI response: cmd=my_kvsname kvsname=kvs_9793_0
> [proxy:0:1 at beowulf.master] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:1 at beowulf.master] PMI response: cmd=my_kvsname kvsname=kvs_9793_0
> [proxy:0:3 at beowulf.master] got pmi command (from 0): get_maxes
>
> [proxy:0:3 at beowulf.master] PMI response: cmd=maxes kvsname_max=256
> keylen_max=64 vallen_max=1024
> [proxy:0:1 at beowulf.master] got pmi command (from 0): get
> kvsname=kvs_9793_0 key=PMI_process_mapping
> [proxy:0:1 at beowulf.master] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,5,1))
> [proxy:0:3 at beowulf.master] got pmi command (from 0): get_appnum
>
> [proxy:0:3 at beowulf.master] PMI response: cmd=appnum appnum=0
> [proxy:0:3 at beowulf.master] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:3 at beowulf.master] PMI response: cmd=my_kvsname kvsname=kvs_9793_0
> [proxy:0:3 at beowulf.master] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:3 at beowulf.master] PMI response: cmd=my_kvsname kvsname=kvs_9793_0
> [proxy:0:1 at beowulf.master] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at beowulf.master] forwarding command (cmd=barrier_in) upstream
> [proxy:0:3 at beowulf.master] got pmi command (from 0): get
> kvsname=kvs_9793_0 key=PMI_process_mapping
> [proxy:0:3 at beowulf.master] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,5,1))
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:3 at beowulf.master] got pmi command (from 0): barrier_in
>
> [proxy:0:3 at beowulf.master] forwarding command (cmd=barrier_in) upstream
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at beowulf.master] PMI response to fd 6 pid 0: cmd=barrier_out
> [mpiexec at beowulf.master] PMI response to fd 7 pid 0: cmd=barrier_out
> [mpiexec at beowulf.master] PMI response to fd 9 pid 0: cmd=barrier_out
> [mpiexec at beowulf.master] PMI response to fd 12 pid 0: cmd=barrier_out
> [proxy:0:0 at beowulf.master] PMI response: cmd=barrier_out
> [proxy:0:2 at beowulf.master] PMI response: cmd=barrier_out
> [proxy:0:1 at beowulf.master] PMI response: cmd=barrier_out
> [proxy:0:3 at beowulf.master] PMI response: cmd=barrier_out
> [proxy:0:2 at beowulf.master] got pmi command (from 0): put
> kvsname=kvs_9793_0 key=P2-businesscard
> value=description#beowulf.node2$port#57128$ifname#172.16.20.33$
> [proxy:0:2 at beowulf.master] we don't understand this command put;
> forwarding upstream
> [proxy:0:3 at beowulf.master] got pmi command (from 0): put
> kvsname=kvs_9793_0 key=P3-businesscard
> value=description#beowulf.node3$port#47399$ifname#172.16.20.34$
> [proxy:0:3 at beowulf.master] we don't understand this command put;
> forwarding upstream
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=put
> kvsname=kvs_9793_0 key=P3-businesscard
> value=description#beowulf.node3$port#47399$ifname#172.16.20.34$
> [mpiexec at beowulf.master] PMI response to fd 12 pid 0: cmd=put_result
> rc=0 msg=success
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=put
> kvsname=kvs_9793_0 key=P2-businesscard
> value=description#beowulf.node2$port#57128$ifname#172.16.20.33$
> [mpiexec at beowulf.master] PMI response to fd 9 pid 0: cmd=put_result rc=0
> msg=success
> [proxy:0:2 at beowulf.master] we don't understand the response put_result;
> forwarding downstream
> [proxy:0:3 at beowulf.master] we don't understand the response put_result;
> forwarding downstream
> [proxy:0:3 at beowulf.master] got pmi command (from 0): barrier_in
>
> [proxy:0:3 at beowulf.master] forwarding command (cmd=barrier_in) upstream
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:2 at beowulf.master] got pmi command (from 0): barrier_in
>
> [proxy:0:2 at beowulf.master] forwarding command (cmd=barrier_in) upstream
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at beowulf.master] got pmi command (from 0): put
> kvsname=kvs_9793_0 key=P0-businesscard
> value=description#beowulf.master$port#40147$ifname#172.16.20.31$
> [proxy:0:0 at beowulf.master] we don't understand this command put;
> forwarding upstream
> [proxy:0:1 at beowulf.master] got pmi command (from 0): put
> kvsname=kvs_9793_0 key=P1-businesscard
> value=description#beowulf.node1$port#33312$ifname#172.16.20.32$
> [proxy:0:1 at beowulf.master] we don't understand this command put;
> forwarding upstream
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=put
> kvsname=kvs_9793_0 key=P0-businesscard
> value=description#beowulf.master$port#40147$ifname#172.16.20.31$
> [mpiexec at beowulf.master] PMI response to fd 6 pid 0: cmd=put_result rc=0
> msg=success
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=put
> kvsname=kvs_9793_0 key=P1-businesscard
> value=description#beowulf.node1$port#33312$ifname#172.16.20.32$
> [mpiexec at beowulf.master] PMI response to fd 7 pid 0: cmd=put_result rc=0
> msg=success
> [proxy:0:0 at beowulf.master] we don't understand the response put_result;
> forwarding downstream
> [proxy:0:0 at beowulf.master] got pmi command (from 0): barrier_in
>
> [proxy:0:0 at beowulf.master] forwarding command (cmd=barrier_in) upstream
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:1 at beowulf.master] we don't understand the response put_result;
> forwarding downstream
> [proxy:0:1 at beowulf.master] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at beowulf.master] forwarding command (cmd=barrier_in) upstream
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at beowulf.master] PMI response to fd 6 pid 0: cmd=barrier_out
> [mpiexec at beowulf.master] PMI response to fd 7 pid 0: cmd=barrier_out
> [mpiexec at beowulf.master] PMI response to fd 9 pid 0: cmd=barrier_out
> [mpiexec at beowulf.master] PMI response to fd 12 pid 0: cmd=barrier_out
> [proxy:0:0 at beowulf.master] PMI response: cmd=barrier_out
> [proxy:0:3 at beowulf.master] PMI response: cmd=barrier_out
> [proxy:0:1 at beowulf.master] PMI response: cmd=barrier_out
> [proxy:0:2 at beowulf.master] PMI response: cmd=barrier_out
> Process 3 of 4 is on beowulf.master
> Process 1 of 4 is on beowulf.master
> Process 2 of 4 is on beowulf.master
> Process 0 of 4 is on beowulf.master
> [proxy:0:0 at beowulf.master] got pmi command (from 0): get
> kvsname=kvs_9793_0 key=P2-businesscard
> [proxy:0:0 at beowulf.master] forwarding command (cmd=get
> kvsname=kvs_9793_0 key=P2-businesscard) upstream
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=get
> kvsname=kvs_9793_0 key=P2-businesscard
> [mpiexec at beowulf.master] PMI response to fd 6 pid 0: cmd=get_result rc=0
> msg=success value=description#beowulf.node2$port#57128$ifname#172.16.20.33$
> [proxy:0:0 at beowulf.master] we don't understand the response get_result;
> forwarding downstream
> [proxy:0:0 at beowulf.master] got pmi command (from 0): get
> kvsname=kvs_9793_0 key=P1-businesscard
> [proxy:0:0 at beowulf.master] forwarding command (cmd=get
> kvsname=kvs_9793_0 key=P1-businesscard) upstream
> [mpiexec at beowulf.master] [pgid: 0] got PMI command: cmd=get
> kvsname=kvs_9793_0 key=P1-businesscard
> [mpiexec at beowulf.master] PMI response to fd 6 pid 0: cmd=get_result rc=0
> msg=success value=description#beowulf.node1$port#33312$ifname#172.16.20.32$
> [proxy:0:0 at beowulf.master] we don't understand the response get_result;
> forwarding downstream
> Ctrl-C caught... cleaning up processes
> [root at beowulf ~]#
>
>
> On Tue, May 1, 2012 at 11:57 PM, Pavan Balaji <balaji at mcs.anl.gov
> <mailto:balaji at mcs.anl.gov>> wrote:
>
>
> Can you run "hostname" on beowulf.node1 and see what is returns?
> Also, can you send us the output of:
>
> mpiexec -verbose -f hosts -n 4 /opt/mpich2-1.4.1p1/examples/.__/cpi
>
> -- Pavan
>
>
> On 05/01/2012 01:25 PM, Albert Spade wrote:
>
> Yes I am sure.
> I created one file by name hosts in /root and its contents are
> beowulf.master
> beowulf.node1
> beowulf.node2
> beowulf.node3
> beowulf.node4
>
> I have one more file in /etc by name hosts and its contents are:
>
> [root at beowulf ~]# cat /etc/hosts
> 127.0.0.1 localhost localhost.localdomain localhost4
> localhost4.localdomain4
> ::1 localhost localhost.localdomain localhost6
> localhost6.localdomain6
> 172.16.20.31 beowulf.master
> 172.16.20.32 beowulf.node1
> 172.16.20.33 beowulf.node2
> 172.16.20.34 beowulf.node3
> 172.16.20.35 beowulf.node4
> [root at beowulf ~]#
>
> On Tue, May 1, 2012 at 11:15 PM, Pavan Balaji
> <balaji at mcs.anl.gov <mailto:balaji at mcs.anl.gov>
> <mailto:balaji at mcs.anl.gov <mailto:balaji at mcs.anl.gov>>> wrote:
>
>
> On 05/01/2012 12:39 PM, Albert Spade wrote:
>
> [root at beowulf ~]# mpiexec -f hosts -n 4
> /opt/mpich2-1.4.1p1/examples/.____/cpi
>
> Process 0 of 4 is on beowulf.master
> Process 3 of 4 is on beowulf.master
> Process 1 of 4 is on beowulf.master
> Process 2 of 4 is on beowulf.master
> Fatal error in PMPI_Reduce: Other MPI error, error stack:
> PMPI_Reduce(1270).............____..:
> MPI_Reduce(sbuf=0xbff0fd08,
>
> rbuf=0xbff0fd00, count=1, MPI_DOUBLE, MPI_SUM, root=0,
> MPI_COMM_WORLD)
> failed
> MPIR_Reduce_impl(1087)........____..:
> MPIR_Reduce_intra(895)........____..:
> MPIR_Reduce_binomial(144).....____..:
> MPIDI_CH3U_Recvq_FDU_or_AEP(____380): Communication
> error with rank 2
> MPIR_Reduce_binomial(144).....____..:
> MPIDI_CH3U_Recvq_FDU_or_AEP(____380): Communication
> error with rank 1
>
> ^CCtrl-C caught... cleaning up processes
>
>
> In your previous email you said that your host file contains
> this:
>
>
> beowulf.master
> beowulf.node1
> beowulf.node2
> beowulf.node3
> beowulf.node4
>
> The above output does not match this. Process 1 should be
> scheduled
> on node1. So something is not correct here. Are you sure the
> information you gave us is right?
>
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> <http://www.mcs.anl.gov/%7Ebalaji>
> <http://www.mcs.anl.gov/%__7Ebalaji
> <http://www.mcs.anl.gov/%7Ebalaji>>
>
>
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%7Ebalaji>
>
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list