[MPICH] error information
Rajeev Thakur
thakur at mcs.anl.gov
Wed May 10 20:22:42 CDT 2006
You should be able to use MPICH-GM on jazz with the gcc compiler. You might
need to specify the right field in your .soft environment. See
http://www.lcrc.anl.gov/faq/cache/54.html for example.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Yusong Wang
> Sent: Wednesday, May 10, 2006 6:42 PM
> To: Rusty Lusk
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] error information
>
> I may need wait some days before I can run it under MPICH2. I
> was able to run the program with command line under MPICH2
> environment on our cluster. Our system administrator was
> trying to integrate MPICH2 with Sun Grid Engine, but stuck at
> the use of smpd. Right Now, I can't run the program with
> MPICH2 during the update. It seems to me there is no gcc
> based MPICH2 available on Jazz and our code can only be
> compiled with gcc compiler.
>
> The problem comes from a regression test of 100 cases. If I
> run them one by one (with some break time between each run),
> I would not expect this problem. It seems to me some
> operations have not been done although the previous run quit
> normally.
>
> Thanks,
>
> Yusong
>
> ----- Original Message -----
> From: Rusty Lusk <lusk at mcs.anl.gov>
> Date: Wednesday, May 10, 2006 4:34 pm
> Subject: Re: [MPICH] error information
>
> > You are using a very old version of MPICH. Can you use MPICH2?
> > It might give you better information on termination.
> >
> > Regards,
> > Rusty Lusk
> >
> > From: Yusong Wang <ywang25 at aps.anl.gov>
> > Subject: [MPICH] error information
> > Date: Wed, 10 May 2006 16:27:13 -0500
> >
> > > Hi,
> > >
> > > I repeated a same test several times on Jazz. Most times it
> > works fine,
> > > occasionally (1 out of 5 runs), I got the following errors:
> > >
> > > /soft/apps/packages/mpich-p4-1.2.6-gcc-3.2.3-1/bin/mpirun: line
> > 1: 24600
> > > Broken pipe
> /home/ywang/oag/apps/bin/linux-x86/Pelegant
> > > "run.ele" -p4pg /home/ywang/elegantRuns/script3/PI24473 -
> > > p4wd /home/ywang/elegantRuns/script3
> > > p4_error: latest msg from perror: Bad file descriptor
> > > rm_l_2_16806: (1.024331) net_send: could not write to fd=6,
> > errno = 9
> > > rm_l_2_16806: p4_error: net_send write: -1
> > > Broken pipe
> > > length of beamline PAR per pass: 3.066670000001400e+01 m
> > > statistics: ET: 00:00:01 CP: 0.09 BIO:0 DIO:0 PF:0 MEM:0
> > > p3_15201: p4_error: net_recv read: probable EOF on socket: 1
> > > Broken pipe
> > >
> > > I can't find the reason of this problem. The same thing
> happened on
> > > another cluster. The totalview debugger didn't give me too much
> > useful> information. The survived processes just stuck at an
> > MPI_Barrier> command.
> > >
> > > Can someone give me some hint to fixed the problem
> according to the
> > > error information given above?
> > >
> > > The working directory is:
> > > /home/ywang/elegantRuns/script3/
> > > The command I used:
> > > mpirun -np 4 -machinefile $PBS_NODEFILE
> > /home/ywang/oag/apps/bin/linux-
> > > x86/Pelegant run.ele
> > >
> > > Thanks in advance,
> > >
> > > Yusong Wang
> > >
> >
> >
>
>
More information about the mpich-discuss
mailing list