[mpich-discuss] Problem with tcsh and ppn >= 5

Anthony Chan chan at mcs.anl.gov
Tue Jun 14 15:07:37 CDT 2011


What version of csh that you are using. Pavan told me once that
there are some versions of buggy csh/tcsh that prevents hydra
from working correctly.  But on my F14 system, everything works
fine with tcsh/csh.  On the system with buggy csh/tcsh, I got
the following.


A.Chan

/homes/chan> csh --version
tcsh 6.17.00 (Astron) 2009-07-10 (x86_64-unknown-linux) options wide,nls,dl,al,kan,rh,nd,color,filec
/homes/chan> tcsh --version
tcsh 6.17.00 (Astron) 2009-07-10 (x86_64-unknown-linux) options wide,nls,dl,al,kan,rh,nd,color,filec
/homes/chan> /disk/chan/mpich2_work/install/bin/mpiexec -n 1 -ppn 5 /bin/csh -c /disk/chan/mpich2_work/build/examples/cpi
[cli_0]: write_line error; fd=6 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_0]: Unable to write to PMI_fd
[cli_0]: write_line error; fd=6 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(388): 
MPID_Init(107).......: channel initialization failed
MPID_Init(389).......: PMI_Get_appnum returned -1
/homes/chan> /disk/chan/mpich2_work/install/bin/mpiexec -n 1 -ppn 5 /bin/tcsh -c /disk/chan/mpich2_work/build/examples/cpi
[cli_0]: write_line error; fd=6 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_0]: Unable to write to PMI_fd
[cli_0]: write_line error; fd=6 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(388): 
MPID_Init(107).......: channel initialization failed
MPID_Init(389).......: PMI_Get_appnum returned -1


----- Original Message -----
> > -----Original Message-----
> > From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-
> > bounces at mcs.anl.gov] On Behalf Of Frank Riley
> > Sent: Tuesday, June 14, 2011 12:20 PM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: [mpich-discuss] Problem with tcsh and ppn >= 5
> >
> > Hello,
> >
> > We are having a problem running more than 4 processes per node when
> > using the tcsh shell. Has anyone seen this? Here is a simple test
> > case:
> >
> > mpiexec -n 1 -ppn 5 /bin/csh -c /path/to/a.out
> >
> > where a.out is a simple C test executable that does a MPI_Init and a
> > MPI_Finalize. The error is as follows:
> >
> > [cli_3]: write_line error; fd=18 buf=:cmd=init pmi_version=1
> > pmi_subversion=1 system message for write_line failure : Bad file
> > descriptor
> >
> > Note that the following command (bash shell) works fine:
> >
> > mpiexec -n 1 -ppn 5 /bin/sh -c /path/to/a.out
> >
> > Our mpich2 is version 1.3.2p1 and is built with the following flags:
> >
> > --enable-fast --enable-romio --enable-debuginfo --enable-smpcoll
> > --enable-
> > mpe --enable-threads=runtime --enable-shared --with-mpe
> 
> I forgot to mention that we do not see the failure on our cluster that
> has nodes with 2 cores each. It only fails on our clusters that have
> nodes with 8 cores each.
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list