[MPICH] Problem handling redirected stdin in MPICH2 1.0.4p1, F90, dual Opteron

Ralph Butler rbutler at mtsu.edu
Wed Nov 8 07:12:49 CST 2006


The current implementation of mpiexec/mpd only supports low-volume,
slow (e.g. via tty) stdin.  This is because it provide options to route
to arbitrary subsets of ranks, and does not do its own buffering.  There
is no standard on this and implementations sometimes only permit
stdin to go to rank 0, or to no rank at all.
Other input should be obtained by open and read, or some parallel I/O
operations.

On WedNov 8, at Wed Nov 8 5:26AM, Xavier Cartoixa Soler wrote:

>  Hi everyone,
>
>  I am trying to run a parallel program (SIESTA, an electronic  
> structure code) compiled with the Intel compilers under MPICH2  
> 1.0.4p1, and I am facing serious difficulties that I think are  
> related to redirection of input. I am running a dual Opteron  
> cluster under Rocks 4.1 (clone of Red Hat Enterprise Linux 4.0).
>  My MPICH2 was configured with
>
> CC=icc CFLAGS=-O0 CXX=icc CXXFLAGS=-O0 F77=ifort FFLAGS="-O0 - 
> assume 2underscores" F90=ifort F90FLAGS="-O0 -assume  
> 2underscores" ./configure --prefix=/opt/mpich2-1.0.4p1/ 
> ch3_intel_eth/ --with-device=ch3:sock --enable-f90 --enable-cxx -- 
> disable-sharedlibs --enable-timer-type=gettimeofday
>
> (the -O0 option to rule out problems caused by optimization). The  
> mpd daemon starts all right in one of the dual cpu nodes:
>
> [compute-0-0 ~]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdtrace -l
> compute-0-0.local_33552 (10.255.255.254)
>
> [compute-0-0 ~]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdringtest  
> 1000
> time for 1000 loops = 0.160723924637 seconds
>
> and I can even run small parallel programs with redirected input:
>
> [compute-0-0 mpi_bug_test]$
> /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpiexec -n 2 ./example1 <  
> inf.txt
>  Hello world from number            0          54
>  Hello world from number            1           1
>
> but when I go for the real deal, the siesta program hangs most of  
> the times I try:
>
> [xcs at hydra partest]$ /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpiexec - 
> n 2 ./siesta < emmental.fdf
>  Before MPI_Init...
>
> it just hangs at that point. The "Before MPI_Init..." is a write 
> (0,*) statement I have added before the MPI initialization block.  
> Some times the 2 cpus print the statement, some times only one does  
> and some times none of them prints it. Some times (one out of ~30)  
> everything works as it's supposed to. Hitting Ctrl+C and killing  
> the siesta program by hand, I receive the mpd error messages:
>
> hydra.uab.es_mpdman_0 (handle_console_input 1281): cannot send  
> stdin to client
> [... many times ...]
> hydra.uab.es_mpdman_0 (handle_console_input 1281): cannot send  
> stdin to client
> hydra.uab.es_mpdman_0: mpd_uncaught_except_tb handling:
>   exceptions.AttributeError: 'int' object has no attribute  
> 'send_dict_msg'
>     /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdman.py  1270  
> handle_console_input
>         self.ring.rhsSock.send_dict_msg(msg)
>     /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdlib.py  527  
> handle_active_streams
>         handler(stream,*args)
> plus traceback to mpd.run()
>
> it can also fail as
>
> hydra.uab.es_mpdman_0: mpd_uncaught_except_tb handling:
>   exceptions.AttributeError: 'int' object has no attribute  
> 'send_char_msg'
>     /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdman.py  538   
> handle_lhs_input
>         self.pmiSock.send_char_msg(pmiMsgToSend)
>     /opt/mpich2-1.0.4p1/ch3_intel_eth/bin/mpdlib.py  527  
> handle_active_streams
>         handler(stream,*args)
> plus traceback to mpd.run()
>
>
> Incidentally, using gcc and/or MPICH2 1.0.3 gives me the same problem.
> After unsuccessful googling, now I have run out of things to try,  
> so any pointer would be extremely appreciated! Thanks if you've  
> made it thus far in the message!!
>
>  Xavier
>
>
>
>
>




More information about the mpich-discuss mailing list