[MPICH] Problem handling redirected stdin in MPICH2 1.0.4p1, F90, dual Opteron

Xavier Cartoixa Soler Xavier.Cartoixa at uab.es
Wed Nov 8 05:26:30 CST 2006


 Hi everyone,

 I am trying to run a parallel program (SIESTA, an electronic structure code) compiled with the Intel compilers under MPICH2 1.0.4p1, and I am facing serious difficulties that I think are related to redirection of input. I am running a dual Opteron cluster under Rocks 4.1 (clone of Red Hat Enterprise Linux 4.0).
 My MPICH2 was configured with

CC=icc CFLAGS=-O0 CXX=icc CXXFLAGS=-O0 F77=ifort FFLAGS="-O0 -assume 2underscores" F90=ifort F90FLAGS="-O0 -assume 2underscores" ./configure --prefix=/opt/mpich2-1.0.4p1/ch3_intel_eth/ --with-device=ch3:sock --enable-f90 --enable-cxx --disable-sharedlibs --enable-timer-type=gettimeofday

(the -O0 option to rule out problems caused by optimization). The mpd daemon starts all right in one of the dual cpu nodes:

[compute-0-0 ~]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdtrace -l
compute-0-0.local_33552 (10.255.255.254)

[compute-0-0 ~]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdringtest 1000
time for 1000 loops = 0.160723924637 seconds

and I can even run small parallel programs with redirected input:

[compute-0-0 mpi_bug_test]$
/opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpiexec -n 2 ./example1 < inf.txt
 Hello world from number            0          54
 Hello world from number            1           1

but when I go for the real deal, the siesta program hangs most of the times I try:

[xcs at hydra partest]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpiexec -n 2 ./siesta < emmental.fdf
 Before MPI_Init...

it just hangs at that point. The "Before MPI_Init..." is a write(0,*) statement I have added before the MPI initialization block. Some times the 2 cpus print the statement, some times only one does and some times none of them prints it. Some times (one out of ~30) everything works as it's supposed to. Hitting Ctrl+C and killing the siesta program by hand, I receive the mpd error messages:

hydra.uab.es_mpdman_0 (handle_console_input 1281): cannot send stdin to client
[... many times ...]
hydra.uab.es_mpdman_0 (handle_console_input 1281): cannot send stdin to client
hydra.uab.es_mpdman_0: mpd_uncaught_except_tb handling:
  exceptions.AttributeError: 'int' object has no attribute 'send_dict_msg'
    /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdman.py  1270 handle_console_input
        self.ring.rhsSock.send_dict_msg(msg)
    /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdlib.py  527 handle_active_streams
        handler(stream,*args)
plus traceback to mpd.run()

it can also fail as

hydra.uab.es_mpdman_0: mpd_uncaught_except_tb handling:
  exceptions.AttributeError: 'int' object has no attribute 'send_char_msg'
    /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdman.py  538  handle_lhs_input
        self.pmiSock.send_char_msg(pmiMsgToSend)
    /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdlib.py  527 handle_active_streams
        handler(stream,*args)
plus traceback to mpd.run()


Incidentally, using gcc and/or MPICH2 1.0.3 gives me the same problem.
After unsuccessful googling, now I have run out of things to try, so any pointer would be extremely appreciated! Thanks if you've made it thus far in the message!!

 Xavier








More information about the mpich-discuss mailing list