[mpich-discuss] mpiexec with -verbose hangs on one of the systems

Pramod pramodc at gmail.com
Wed Aug 24 17:26:41 CDT 2011


Hi,

One one of our AMD machines, the following mpiexec call with -verbose
switch hangs. To exit I have to Ctrl-C and kill the still running
'hydra_pmi_proxy' processes. However, the same command works fine
without '-verbose' switch.

mpiexec -verbose -n 3 -binding cpu:sockets hostname

I am running this on the SMP (not on the network). This happens
irrespective of the number process and ONLY when binding is specified.
Below is the MPICH version and tail of the verbose output. Let me know
if you need any additional information.

Thanks,
Pramod

System details:
AMD Opteron 2435 (12 cores) OS: Linux 2.6.9-89.ELlargesmp

MPICH version:
HYDRA build details:
    Version:                                 1.4.1rc1
    Release Date:                            Wed Aug 17 12:44:31 CDT 2011
    CC:
/u/prod/gnu/gcc/20100526/gcc-4.5.0-linux/bin/gcc  -O3 -fPIC


The tail of the log is below:

---tail log---
Proxy launch args:
/u/dvtbata/rkjain/dev-debug/modeltech/linux/hydra_pmi_proxy
--control-port helen:55497 --debug --rmk user --launcher ssh --demux
poll --pgid 0 --retries 10 --proxy-id

[mpiexec at helen] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
Arguments being passed to proxy 0:
--version 1.4.1rc1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME
--hostname helen --global-core-map 0,1,0 --filler-process-map 0,1,0
--global-process-count 3 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname
kvs_27786_0 --pmi-process-mapping (vector,(0,1,1)) --binding
cpu:sockets --ckpoint-num -1 --global-inherited-env 71 'USER=pchandra'
'LOGNAME=pchandra' 'HOME=/u/pchandra'
'PATH=/u/prod/mpich/mpich2-1.06/linux/bin:.:/usr/bin:/u/prod/perforce/latest/linux/2.6:/u/prod/bin/linux:/u/prod/bin:/usr/local/bin:/bin:/usr/X11R6/bin:/opt/kde3/bin:/home/mtisouth/bin/linux:/u/dvtbata/rkjain/dev-debug//modeltech/linux'
'MAIL=/var/spool/mail/pchandra' 'SHELL=/bin/csh'
'SSH_CLIENT=::ffff:147.34.21.31 50188 22'
'SSH_CONNECTION=::ffff:147.34.21.31 50188 ::ffff:147.34.21.56 22'
'SSH_TTY=/dev/pts/2' 'TERM=xterm' 'HOSTTYPE=x86_64-linux'
'VENDOR=unknown' 'OSTYPE=linux' 'MACHTYPE=x86_64' 'SHLVL=1'
'PWD=/export/scratch/rkjain_perf/tests_mp/vhdl_designs/matarox'
'GROUP=mti' 'HOST=helen' 'REMOTEHOST=dvtvnc4.wv.mentorg.com'
'HOSTNAME=helen' 'INPUTRC=/etc/inputrc'
'LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:'
'G_BROKEN_FILENAMES=1'
'SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass' 'KDEDIR=/usr'
'LANG=en_US.UTF-8' 'SUPPORTED=en_US.UTF-8:en_US:en'
'LESSOPEN=|/usr/bin/lesspipe.sh %s' 'QTDIR=/usr/lib64/qt-3.3'
'QTINC=/usr/lib64/qt-3.3/include' 'QTLIB=/usr/lib64/qt-3.3/lib'
'ISMTISOUTH=FALSE' 'PRODDIR=/u/prod' 'MTIEXTRA=/u/mtiextra'
'RELEASE=/u/release' 'DVTBATA=/u/dvtbata' 'BATA_ROOT=/u/dvtbata'
'PRODDIRBIN=/u/prod/bin' 'PLAT=linux'
'CVSROOT=:pserver:pchandra at cvssvr:/export/cvs'
'PERLLIB=/u/prod/tests/lib'
'OLD_LM_LICENSE_FILE=1700 at licsvr_s:1700 at licsvr:1650 at licsvr_s:1650 at licsvr:5300 at licsvr:1700 at oemlicsvr:1700 at licsvr2'
'LM_LICENSE_FILE=1700 at licsvr_s:1700 at licsvr:5300 at licsvr:1700 at licsvr2'
'LD_LIBRARY_PATH=' 'PURIFYOPTIONS=-chain-length=30
-recursion-depth-limit=40000' 'ENSCRIPT=-r2Ghk' 'PLATFORM=linux'
'PLATFORM2=linux' 'GNUMAN=' 'PURIFYMAN='
'MANPATH=:/usr/man:/usr/local/man:/usr/dt/man'
'MTI_HOME=/u/dvtbata/rkjain/dev-debug//modeltech'
'TESTROOT=/u/pchandra/..' 'SM_ENTITY=/u/prod/rel/new/linux/sm_entity'
'HM_ENTITY=/u/prod/rel/new/linux/hm_entity'
'SWIFTKIT=/u/prod/lmc/swiftkit_2.21'
'LMC_HOME=/u/prod/lmc/swiftkit_2.21/library'
'LIBSWIFT=/u/prod/lmc/swiftkit_2.21/library/lib/x86_linux.lib/libswift.so'
'LIBSWIFTPLI=/u/prod/lmc/swiftkit_2.21/library/lib/x86_linux.lib/swiftpli_mti.so'
'LM_DIR=/u/prod/lmc/hw_36a/sms/lm_dir'
'LM_LIB=/u/prod/lmc/hw_36a/sms/models:/u/prod/lmc/hw_36a/sms/maps'
'LIBSFI=/u/prod/lmc/hw_36a/sms/lib/linux/libsfi.so'
'P4PORT=p4proxy-orw2:1666' 'P4CONFIG=.P4CONFIG' 'P4CLIENT='
'P4EDITOR=vim' 'TITLE=CMI78A Slothrop Veritable Voltmeter'
'MTI_MC2_AUTOMPD=1' 'MTI_MC2_ENABLE_ALL_TESTS=1' 'MORE=-c'
'mti_mode=/u/dvtbata/rkjain/dev-debug/' --global-user-env 0
--global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
--proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 3
--exec-local-env 0 --exec-wdir
/export/scratch/rkjain_perf/tests_mp/vhdl_designs/matarox --exec-args
1 hostname

[mpiexec at helen] Launch arguments:
/u/dvtbata/rkjain/dev-debug/modeltech/linux/hydra_pmi_proxy
--control-port helen:55497 --debug --rmk user --launcher ssh --demux
poll --pgid 0 --retries 10 --proxy-id 0
Ctrl-C caught... cleaning up processes
---tail log---


Thanks,
Pramod


More information about the mpich-discuss mailing list