[mpich-discuss] mpiexec with -verbose hangs on one of the systems

Pavan Balaji balaji at mcs.anl.gov
Thu Aug 25 17:03:30 CDT 2011


Pramod,

This seems to work fine for me. Maybe something wrong with your setup. 
Too hard to tell with the information provided below.

Can you clean up your installation and reinstall MPICH2?

  -- Pavan

On 08/24/2011 05:26 PM, Pramod wrote:
> Hi,
>
> One one of our AMD machines, the following mpiexec call with -verbose
> switch hangs. To exit I have to Ctrl-C and kill the still running
> 'hydra_pmi_proxy' processes. However, the same command works fine
> without '-verbose' switch.
>
> mpiexec -verbose -n 3 -binding cpu:sockets hostname
>
> I am running this on the SMP (not on the network). This happens
> irrespective of the number process and ONLY when binding is specified.
> Below is the MPICH version and tail of the verbose output. Let me know
> if you need any additional information.
>
> Thanks,
> Pramod
>
> System details:
> AMD Opteron 2435 (12 cores) OS: Linux 2.6.9-89.ELlargesmp
>
> MPICH version:
> HYDRA build details:
>      Version:                                 1.4.1rc1
>      Release Date:                            Wed Aug 17 12:44:31 CDT 2011
>      CC:
> /u/prod/gnu/gcc/20100526/gcc-4.5.0-linux/bin/gcc  -O3 -fPIC
>
>
> The tail of the log is below:
>
> ---tail log---
> Proxy launch args:
> /u/dvtbata/rkjain/dev-debug/modeltech/linux/hydra_pmi_proxy
> --control-port helen:55497 --debug --rmk user --launcher ssh --demux
> poll --pgid 0 --retries 10 --proxy-id
>
> [mpiexec at helen] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
> Arguments being passed to proxy 0:
> --version 1.4.1rc1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME
> --hostname helen --global-core-map 0,1,0 --filler-process-map 0,1,0
> --global-process-count 3 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname
> kvs_27786_0 --pmi-process-mapping (vector,(0,1,1)) --binding
> cpu:sockets --ckpoint-num -1 --global-inherited-env 71 'USER=pchandra'
> 'LOGNAME=pchandra' 'HOME=/u/pchandra'
> 'PATH=/u/prod/mpich/mpich2-1.06/linux/bin:.:/usr/bin:/u/prod/perforce/latest/linux/2.6:/u/prod/bin/linux:/u/prod/bin:/usr/local/bin:/bin:/usr/X11R6/bin:/opt/kde3/bin:/home/mtisouth/bin/linux:/u/dvtbata/rkjain/dev-debug//modeltech/linux'
> 'MAIL=/var/spool/mail/pchandra' 'SHELL=/bin/csh'
> 'SSH_CLIENT=::ffff:147.34.21.31 50188 22'
> 'SSH_CONNECTION=::ffff:147.34.21.31 50188 ::ffff:147.34.21.56 22'
> 'SSH_TTY=/dev/pts/2' 'TERM=xterm' 'HOSTTYPE=x86_64-linux'
> 'VENDOR=unknown' 'OSTYPE=linux' 'MACHTYPE=x86_64' 'SHLVL=1'
> 'PWD=/export/scratch/rkjain_perf/tests_mp/vhdl_designs/matarox'
> 'GROUP=mti' 'HOST=helen' 'REMOTEHOST=dvtvnc4.wv.mentorg.com'
> 'HOSTNAME=helen' 'INPUTRC=/etc/inputrc'
> 'LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:'
> 'G_BROKEN_FILENAMES=1'
> 'SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass' 'KDEDIR=/usr'
> 'LANG=en_US.UTF-8' 'SUPPORTED=en_US.UTF-8:en_US:en'
> 'LESSOPEN=|/usr/bin/lesspipe.sh %s' 'QTDIR=/usr/lib64/qt-3.3'
> 'QTINC=/usr/lib64/qt-3.3/include' 'QTLIB=/usr/lib64/qt-3.3/lib'
> 'ISMTISOUTH=FALSE' 'PRODDIR=/u/prod' 'MTIEXTRA=/u/mtiextra'
> 'RELEASE=/u/release' 'DVTBATA=/u/dvtbata' 'BATA_ROOT=/u/dvtbata'
> 'PRODDIRBIN=/u/prod/bin' 'PLAT=linux'
> 'CVSROOT=:pserver:pchandra at cvssvr:/export/cvs'
> 'PERLLIB=/u/prod/tests/lib'
> 'OLD_LM_LICENSE_FILE=1700 at licsvr_s:1700 at licsvr:1650 at licsvr_s:1650 at licsvr:5300 at licsvr:1700 at oemlicsvr:1700 at licsvr2'
> 'LM_LICENSE_FILE=1700 at licsvr_s:1700 at licsvr:5300 at licsvr:1700 at licsvr2'
> 'LD_LIBRARY_PATH=' 'PURIFYOPTIONS=-chain-length=30
> -recursion-depth-limit=40000' 'ENSCRIPT=-r2Ghk' 'PLATFORM=linux'
> 'PLATFORM2=linux' 'GNUMAN=' 'PURIFYMAN='
> 'MANPATH=:/usr/man:/usr/local/man:/usr/dt/man'
> 'MTI_HOME=/u/dvtbata/rkjain/dev-debug//modeltech'
> 'TESTROOT=/u/pchandra/..' 'SM_ENTITY=/u/prod/rel/new/linux/sm_entity'
> 'HM_ENTITY=/u/prod/rel/new/linux/hm_entity'
> 'SWIFTKIT=/u/prod/lmc/swiftkit_2.21'
> 'LMC_HOME=/u/prod/lmc/swiftkit_2.21/library'
> 'LIBSWIFT=/u/prod/lmc/swiftkit_2.21/library/lib/x86_linux.lib/libswift.so'
> 'LIBSWIFTPLI=/u/prod/lmc/swiftkit_2.21/library/lib/x86_linux.lib/swiftpli_mti.so'
> 'LM_DIR=/u/prod/lmc/hw_36a/sms/lm_dir'
> 'LM_LIB=/u/prod/lmc/hw_36a/sms/models:/u/prod/lmc/hw_36a/sms/maps'
> 'LIBSFI=/u/prod/lmc/hw_36a/sms/lib/linux/libsfi.so'
> 'P4PORT=p4proxy-orw2:1666' 'P4CONFIG=.P4CONFIG' 'P4CLIENT='
> 'P4EDITOR=vim' 'TITLE=CMI78A Slothrop Veritable Voltmeter'
> 'MTI_MC2_AUTOMPD=1' 'MTI_MC2_ENABLE_ALL_TESTS=1' 'MORE=-c'
> 'mti_mode=/u/dvtbata/rkjain/dev-debug/' --global-user-env 0
> --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
> --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 3
> --exec-local-env 0 --exec-wdir
> /export/scratch/rkjain_perf/tests_mp/vhdl_designs/matarox --exec-args
> 1 hostname
>
> [mpiexec at helen] Launch arguments:
> /u/dvtbata/rkjain/dev-debug/modeltech/linux/hydra_pmi_proxy
> --control-port helen:55497 --debug --rmk user --launcher ssh --demux
> poll --pgid 0 --retries 10 --proxy-id 0
> Ctrl-C caught... cleaning up processes
> ---tail log---
>
>
> Thanks,
> Pramod
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list