[MPICH] error information
Anthony Chan
chan at mcs.anl.gov
Wed May 10 22:02:44 CDT 2006
Jazz has mpich2 prebuilt with pgi and intel compilers.
ls -d /soft/apps/packages/mpich2*
/soft/apps/packages/mpich2-1.0.1-intel-8.1/
/soft/apps/packages/mpich2-gm-1.0.1-pgi-5.2/
/soft/apps/packages/mpich2-ip-1.0.1-pgi-5.2/
You could build mpich2 with gcc yourself. If you need to use myrinet,
you can build mpich2 with gasnet support.
A.Chan
On Wed, 10 May 2006, Wei-keng Liao wrote:
>
> I was told by Jazz system support last week that the MPICH compiled with
> gcc and GM is in /soft/apps/packages/mpich-gm-1.2.6..13b-gcc-3.2.3-1/bin
>
> : Just add the softenv key "@all-mpich_gm-gcc3.2.3" before the"@default"
> : line in your ~/.soft file (and then type "resoft" or logout and back
> : into jazz), this will place the gcc MPI environment inyour path ahead of
> : the system default intel one.
>
> Wei-keng
>
>
> On Wed, 10 May 2006, Rajeev Thakur wrote:
>
> > You should be able to use MPICH-GM on jazz with the gcc compiler. You might
> > need to specify the right field in your .soft environment. See
> > http://www.lcrc.anl.gov/faq/cache/54.html for example.
> >
> > Rajeev
> >
> >> -----Original Message-----
> >> From: owner-mpich-discuss at mcs.anl.gov
> >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Yusong Wang
> >> Sent: Wednesday, May 10, 2006 6:42 PM
> >> To: Rusty Lusk
> >> Cc: mpich-discuss at mcs.anl.gov
> >> Subject: Re: [MPICH] error information
> >>
> >> I may need wait some days before I can run it under MPICH2. I
> >> was able to run the program with command line under MPICH2
> >> environment on our cluster. Our system administrator was
> >> trying to integrate MPICH2 with Sun Grid Engine, but stuck at
> >> the use of smpd. Right Now, I can't run the program with
> >> MPICH2 during the update. It seems to me there is no gcc
> >> based MPICH2 available on Jazz and our code can only be
> >> compiled with gcc compiler.
> >>
> >> The problem comes from a regression test of 100 cases. If I
> >> run them one by one (with some break time between each run),
> >> I would not expect this problem. It seems to me some
> >> operations have not been done although the previous run quit
> >> normally.
> >>
> >> Thanks,
> >>
> >> Yusong
> >>
> >> ----- Original Message -----
> >> From: Rusty Lusk <lusk at mcs.anl.gov>
> >> Date: Wednesday, May 10, 2006 4:34 pm
> >> Subject: Re: [MPICH] error information
> >>
> >>> You are using a very old version of MPICH. Can you use MPICH2?
> >>> It might give you better information on termination.
> >>>
> >>> Regards,
> >>> Rusty Lusk
> >>>
> >>> From: Yusong Wang <ywang25 at aps.anl.gov>
> >>> Subject: [MPICH] error information
> >>> Date: Wed, 10 May 2006 16:27:13 -0500
> >>>
> >>>> Hi,
> >>>>
> >>>> I repeated a same test several times on Jazz. Most times it
> >>> works fine,
> >>>> occasionally (1 out of 5 runs), I got the following errors:
> >>>>
> >>>> /soft/apps/packages/mpich-p4-1.2.6-gcc-3.2.3-1/bin/mpirun: line
> >>> 1: 24600
> >>>> Broken pipe
> >> /home/ywang/oag/apps/bin/linux-x86/Pelegant
> >>>> "run.ele" -p4pg /home/ywang/elegantRuns/script3/PI24473 -
> >>>> p4wd /home/ywang/elegantRuns/script3
> >>>> p4_error: latest msg from perror: Bad file descriptor
> >>>> rm_l_2_16806: (1.024331) net_send: could not write to fd=6,
> >>> errno = 9
> >>>> rm_l_2_16806: p4_error: net_send write: -1
> >>>> Broken pipe
> >>>> length of beamline PAR per pass: 3.066670000001400e+01 m
> >>>> statistics: ET: 00:00:01 CP: 0.09 BIO:0 DIO:0 PF:0 MEM:0
> >>>> p3_15201: p4_error: net_recv read: probable EOF on socket: 1
> >>>> Broken pipe
> >>>>
> >>>> I can't find the reason of this problem. The same thing
> >> happened on
> >>>> another cluster. The totalview debugger didn't give me too much
> >>> useful> information. The survived processes just stuck at an
> >>> MPI_Barrier> command.
> >>>>
> >>>> Can someone give me some hint to fixed the problem
> >> according to the
> >>>> error information given above?
> >>>>
> >>>> The working directory is:
> >>>> /home/ywang/elegantRuns/script3/
> >>>> The command I used:
> >>>> mpirun -np 4 -machinefile $PBS_NODEFILE
> >>> /home/ywang/oag/apps/bin/linux-
> >>>> x86/Pelegant run.ele
> >>>>
> >>>> Thanks in advance,
> >>>>
> >>>> Yusong Wang
> >>>>
> >>>
> >>>
> >>
> >>
> >
>
>
More information about the mpich-discuss
mailing list