[MPICH] Problem handling redirected stdin in MPICH2 1.0.4p1, F90,dual Opteron
Xavier Cartoixa Soler
Xavier.Cartoixa at uab.es
Wed Nov 8 14:16:14 CST 2006
Thanks, I changed the code to read from a file and it worked.
Xavier
Ralph Butler wrote:
> The current implementation of mpiexec/mpd only supports low-volume,
> slow (e.g. via tty) stdin. This is because it provide options to route
> to arbitrary subsets of ranks, and does not do its own buffering. There
> is no standard on this and implementations sometimes only permit
> stdin to go to rank 0, or to no rank at all.
> Other input should be obtained by open and read, or some parallel I/O
> operations.
>
> On WedNov 8, at Wed Nov 8 5:26AM, Xavier Cartoixa Soler wrote:
>
>> Hi everyone,
>>
>> I am trying to run a parallel program (SIESTA, an electronic
>> structure code) compiled with the Intel compilers under MPICH2
>> 1.0.4p1, and I am facing serious difficulties that I think are related
>> to redirection of input. I am running a dual Opteron cluster under
>> Rocks 4.1 (clone of Red Hat Enterprise Linux 4.0).
>> My MPICH2 was configured with
>>
>> CC=icc CFLAGS=-O0 CXX=icc CXXFLAGS=-O0 F77=ifort FFLAGS="-O0 -assume
>> 2underscores" F90=ifort F90FLAGS="-O0 -assume 2underscores"
>> ./configure --prefix=/opt/mpich2-1.0.4p1/ch3_intel_eth/
>> --with-device=ch3:sock --enable-f90 --enable-cxx --disable-sharedlibs
>> --enable-timer-type=gettimeofday
>>
>> (the -O0 option to rule out problems caused by optimization). The mpd
>> daemon starts all right in one of the dual cpu nodes:
>>
>> [compute-0-0 ~]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdtrace -l
>> compute-0-0.local_33552 (10.255.255.254)
>>
>> [compute-0-0 ~]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdringtest 1000
>> time for 1000 loops = 0.160723924637 seconds
>>
>> and I can even run small parallel programs with redirected input:
>>
>> [compute-0-0 mpi_bug_test]$
>> /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpiexec -n 2 ./example1 < inf.txt
>> Hello world from number 0 54
>> Hello world from number 1 1
>>
>> but when I go for the real deal, the siesta program hangs most of the
>> times I try:
>>
>> [xcs at hydra partest]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpiexec -n
>> 2 ./siesta < emmental.fdf
>> Before MPI_Init...
>>
>> it just hangs at that point. The "Before MPI_Init..." is a write(0,*)
>> statement I have added before the MPI initialization block. Some times
>> the 2 cpus print the statement, some times only one does and some
>> times none of them prints it. Some times (one out of ~30) everything
>> works as it's supposed to. Hitting Ctrl+C and killing the siesta
>> program by hand, I receive the mpd error messages:
>>
>> hydra.uab.es_mpdman_0 (handle_console_input 1281): cannot send stdin
>> to client
>> [... many times ...]
>> hydra.uab.es_mpdman_0 (handle_console_input 1281): cannot send stdin
>> to client
>> hydra.uab.es_mpdman_0: mpd_uncaught_except_tb handling:
>> exceptions.AttributeError: 'int' object has no attribute
>> 'send_dict_msg'
>> /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdman.py 1270
>> handle_console_input
>> self.ring.rhsSock.send_dict_msg(msg)
>> /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdlib.py 527
>> handle_active_streams
>> handler(stream,*args)
>> plus traceback to mpd.run()
>>
>> it can also fail as
>>
>> hydra.uab.es_mpdman_0: mpd_uncaught_except_tb handling:
>> exceptions.AttributeError: 'int' object has no attribute
>> 'send_char_msg'
>> /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdman.py 538
>> handle_lhs_input
>> self.pmiSock.send_char_msg(pmiMsgToSend)
>> /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdlib.py 527
>> handle_active_streams
>> handler(stream,*args)
>> plus traceback to mpd.run()
>>
>>
>> Incidentally, using gcc and/or MPICH2 1.0.3 gives me the same problem.
>> After unsuccessful googling, now I have run out of things to try, so
>> any pointer would be extremely appreciated! Thanks if you've made it
>> thus far in the message!!
>>
>> Xavier
>>
>>
>>
>>
>>
>
More information about the mpich-discuss
mailing list