[MPICH] Problem handling redirected stdin in MPICH2 1.0.4p1, F90, dual Opteron
Xavier Cartoixa Soler
Xavier.Cartoixa at uab.es
Wed Nov 8 05:26:30 CST 2006
Hi everyone,
I am trying to run a parallel program (SIESTA, an electronic structure code) compiled with the Intel compilers under MPICH2 1.0.4p1, and I am facing serious difficulties that I think are related to redirection of input. I am running a dual Opteron cluster under Rocks 4.1 (clone of Red Hat Enterprise Linux 4.0).
My MPICH2 was configured with
CC=icc CFLAGS=-O0 CXX=icc CXXFLAGS=-O0 F77=ifort FFLAGS="-O0 -assume 2underscores" F90=ifort F90FLAGS="-O0 -assume 2underscores" ./configure --prefix=/opt/mpich2-1.0.4p1/ch3_intel_eth/ --with-device=ch3:sock --enable-f90 --enable-cxx --disable-sharedlibs --enable-timer-type=gettimeofday
(the -O0 option to rule out problems caused by optimization). The mpd daemon starts all right in one of the dual cpu nodes:
[compute-0-0 ~]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdtrace -l
compute-0-0.local_33552 (10.255.255.254)
[compute-0-0 ~]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdringtest 1000
time for 1000 loops = 0.160723924637 seconds
and I can even run small parallel programs with redirected input:
[compute-0-0 mpi_bug_test]$
/opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpiexec -n 2 ./example1 < inf.txt
Hello world from number 0 54
Hello world from number 1 1
but when I go for the real deal, the siesta program hangs most of the times I try:
[xcs at hydra partest]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpiexec -n 2 ./siesta < emmental.fdf
Before MPI_Init...
it just hangs at that point. The "Before MPI_Init..." is a write(0,*) statement I have added before the MPI initialization block. Some times the 2 cpus print the statement, some times only one does and some times none of them prints it. Some times (one out of ~30) everything works as it's supposed to. Hitting Ctrl+C and killing the siesta program by hand, I receive the mpd error messages:
hydra.uab.es_mpdman_0 (handle_console_input 1281): cannot send stdin to client
[... many times ...]
hydra.uab.es_mpdman_0 (handle_console_input 1281): cannot send stdin to client
hydra.uab.es_mpdman_0: mpd_uncaught_except_tb handling:
exceptions.AttributeError: 'int' object has no attribute 'send_dict_msg'
/opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdman.py 1270 handle_console_input
self.ring.rhsSock.send_dict_msg(msg)
/opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdlib.py 527 handle_active_streams
handler(stream,*args)
plus traceback to mpd.run()
it can also fail as
hydra.uab.es_mpdman_0: mpd_uncaught_except_tb handling:
exceptions.AttributeError: 'int' object has no attribute 'send_char_msg'
/opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdman.py 538 handle_lhs_input
self.pmiSock.send_char_msg(pmiMsgToSend)
/opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdlib.py 527 handle_active_streams
handler(stream,*args)
plus traceback to mpd.run()
Incidentally, using gcc and/or MPICH2 1.0.3 gives me the same problem.
After unsuccessful googling, now I have run out of things to try, so any pointer would be extremely appreciated! Thanks if you've made it thus far in the message!!
Xavier
More information about the mpich-discuss
mailing list