[mpich-discuss] mpiexec with -verbose hangs on one of the systems

Pramod pramodc at gmail.com
Thu Aug 25 21:08:04 CDT 2011


Thanks Pavan! Sorry for the trouble, It was indeed the bad installation.

Regards,
Pramod

On Thu, Aug 25, 2011 at 3:03 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> Pramod,
>
> This seems to work fine for me. Maybe something wrong with your setup. Too
> hard to tell with the information provided below.
>
> Can you clean up your installation and reinstall MPICH2?
>
>  -- Pavan
>
> On 08/24/2011 05:26 PM, Pramod wrote:
>>
>> Hi,
>>
>> One one of our AMD machines, the following mpiexec call with -verbose
>> switch hangs. To exit I have to Ctrl-C and kill the still running
>> 'hydra_pmi_proxy' processes. However, the same command works fine
>> without '-verbose' switch.
>>
>> mpiexec -verbose -n 3 -binding cpu:sockets hostname
>>
>> I am running this on the SMP (not on the network). This happens
>> irrespective of the number process and ONLY when binding is specified.
>> Below is the MPICH version and tail of the verbose output. Let me know
>> if you need any additional information.
>>
>> Thanks,
>> Pramod
>>
>> System details:
>> AMD Opteron 2435 (12 cores) OS: Linux 2.6.9-89.ELlargesmp
>>
>> MPICH version:
>> HYDRA build details:
>>     Version:                                 1.4.1rc1
>>     Release Date:                            Wed Aug 17 12:44:31 CDT 2011
>>     CC:
>> /u/prod/gnu/gcc/20100526/gcc-4.5.0-linux/bin/gcc  -O3 -fPIC
>>
>>
>> The tail of the log is below:
>>
>> ---tail log---
>> Proxy launch args:
>> /u/dvtbata/rkjain/dev-debug/modeltech/linux/hydra_pmi_proxy
>> --control-port helen:55497 --debug --rmk user --launcher ssh --demux
>> poll --pgid 0 --retries 10 --proxy-id
>>
>> [mpiexec at helen] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
>> Arguments being passed to proxy 0:
>> --version 1.4.1rc1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME
>> --hostname helen --global-core-map 0,1,0 --filler-process-map 0,1,0
>> --global-process-count 3 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname
>> kvs_27786_0 --pmi-process-mapping (vector,(0,1,1)) --binding
>> cpu:sockets --ckpoint-num -1 --global-inherited-env 71 'USER=pchandra'
>> 'LOGNAME=pchandra' 'HOME=/u/pchandra'
>>
>> 'PATH=/u/prod/mpich/mpich2-1.06/linux/bin:.:/usr/bin:/u/prod/perforce/latest/linux/2.6:/u/prod/bin/linux:/u/prod/bin:/usr/local/bin:/bin:/usr/X11R6/bin:/opt/kde3/bin:/home/mtisouth/bin/linux:/u/dvtbata/rkjain/dev-debug//modeltech/linux'
>> 'MAIL=/var/spool/mail/pchandra' 'SHELL=/bin/csh'
>> 'SSH_CLIENT=::ffff:147.34.21.31 50188 22'
>> 'SSH_CONNECTION=::ffff:147.34.21.31 50188 ::ffff:147.34.21.56 22'
>> 'SSH_TTY=/dev/pts/2' 'TERM=xterm' 'HOSTTYPE=x86_64-linux'
>> 'VENDOR=unknown' 'OSTYPE=linux' 'MACHTYPE=x86_64' 'SHLVL=1'
>> 'PWD=/export/scratch/rkjain_perf/tests_mp/vhdl_designs/matarox'
>> 'GROUP=mti' 'HOST=helen' 'REMOTEHOST=dvtvnc4.wv.mentorg.com'
>> 'HOSTNAME=helen' 'INPUTRC=/etc/inputrc'
>>
>> 'LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:'
>> 'G_BROKEN_FILENAMES=1'
>> 'SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass' 'KDEDIR=/usr'
>> 'LANG=en_US.UTF-8' 'SUPPORTED=en_US.UTF-8:en_US:en'
>> 'LESSOPEN=|/usr/bin/lesspipe.sh %s' 'QTDIR=/usr/lib64/qt-3.3'
>> 'QTINC=/usr/lib64/qt-3.3/include' 'QTLIB=/usr/lib64/qt-3.3/lib'
>> 'ISMTISOUTH=FALSE' 'PRODDIR=/u/prod' 'MTIEXTRA=/u/mtiextra'
>> 'RELEASE=/u/release' 'DVTBATA=/u/dvtbata' 'BATA_ROOT=/u/dvtbata'
>> 'PRODDIRBIN=/u/prod/bin' 'PLAT=linux'
>> 'CVSROOT=:pserver:pchandra at cvssvr:/export/cvs'
>> 'PERLLIB=/u/prod/tests/lib'
>>
>> 'OLD_LM_LICENSE_FILE=1700 at licsvr_s:1700 at licsvr:1650 at licsvr_s:1650 at licsvr:5300 at licsvr:1700 at oemlicsvr:1700 at licsvr2'
>> 'LM_LICENSE_FILE=1700 at licsvr_s:1700 at licsvr:5300 at licsvr:1700 at licsvr2'
>> 'LD_LIBRARY_PATH=' 'PURIFYOPTIONS=-chain-length=30
>> -recursion-depth-limit=40000' 'ENSCRIPT=-r2Ghk' 'PLATFORM=linux'
>> 'PLATFORM2=linux' 'GNUMAN=' 'PURIFYMAN='
>> 'MANPATH=:/usr/man:/usr/local/man:/usr/dt/man'
>> 'MTI_HOME=/u/dvtbata/rkjain/dev-debug//modeltech'
>> 'TESTROOT=/u/pchandra/..' 'SM_ENTITY=/u/prod/rel/new/linux/sm_entity'
>> 'HM_ENTITY=/u/prod/rel/new/linux/hm_entity'
>> 'SWIFTKIT=/u/prod/lmc/swiftkit_2.21'
>> 'LMC_HOME=/u/prod/lmc/swiftkit_2.21/library'
>> 'LIBSWIFT=/u/prod/lmc/swiftkit_2.21/library/lib/x86_linux.lib/libswift.so'
>>
>> 'LIBSWIFTPLI=/u/prod/lmc/swiftkit_2.21/library/lib/x86_linux.lib/swiftpli_mti.so'
>> 'LM_DIR=/u/prod/lmc/hw_36a/sms/lm_dir'
>> 'LM_LIB=/u/prod/lmc/hw_36a/sms/models:/u/prod/lmc/hw_36a/sms/maps'
>> 'LIBSFI=/u/prod/lmc/hw_36a/sms/lib/linux/libsfi.so'
>> 'P4PORT=p4proxy-orw2:1666' 'P4CONFIG=.P4CONFIG' 'P4CLIENT='
>> 'P4EDITOR=vim' 'TITLE=CMI78A Slothrop Veritable Voltmeter'
>> 'MTI_MC2_AUTOMPD=1' 'MTI_MC2_ENABLE_ALL_TESTS=1' 'MORE=-c'
>> 'mti_mode=/u/dvtbata/rkjain/dev-debug/' --global-user-env 0
>> --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
>> --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 3
>> --exec-local-env 0 --exec-wdir
>> /export/scratch/rkjain_perf/tests_mp/vhdl_designs/matarox --exec-args
>> 1 hostname
>>
>> [mpiexec at helen] Launch arguments:
>> /u/dvtbata/rkjain/dev-debug/modeltech/linux/hydra_pmi_proxy
>> --control-port helen:55497 --debug --rmk user --launcher ssh --demux
>> poll --pgid 0 --retries 10 --proxy-id 0
>> Ctrl-C caught... cleaning up processes
>> ---tail log---
>>
>>
>> Thanks,
>> Pramod
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>


More information about the mpich-discuss mailing list