[MPICH] error information

Wei-keng Liao wkliao at ece.northwestern.edu
Wed May 10 21:34:42 CDT 2006


I was told by Jazz system support last week that the MPICH compiled with 
gcc and GM is in /soft/apps/packages/mpich-gm-1.2.6..13b-gcc-3.2.3-1/bin

: Just add the softenv key "@all-mpich_gm-gcc3.2.3" before the"@default"
: line in your ~/.soft file (and then type "resoft" or logout and back
: into jazz), this will place the gcc MPI environment inyour path ahead of 
: the system default intel one.

Wei-keng


On Wed, 10 May 2006, Rajeev Thakur wrote:

> You should be able to use MPICH-GM on jazz with the gcc compiler. You might
> need to specify the right field in your .soft environment. See
> http://www.lcrc.anl.gov/faq/cache/54.html for example.
>
> Rajeev
>
>> -----Original Message-----
>> From: owner-mpich-discuss at mcs.anl.gov
>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Yusong Wang
>> Sent: Wednesday, May 10, 2006 6:42 PM
>> To: Rusty Lusk
>> Cc: mpich-discuss at mcs.anl.gov
>> Subject: Re: [MPICH] error information
>>
>> I may need wait some days before I can run it under MPICH2. I
>> was able to run the program with command line under MPICH2
>> environment on our cluster. Our system administrator was
>> trying to integrate MPICH2 with Sun Grid Engine, but stuck at
>> the use of smpd. Right Now,  I can't run the program with
>> MPICH2 during the update.  It seems to me there is no gcc
>> based MPICH2 available on Jazz and our code can only be
>> compiled with gcc compiler.
>>
>> The problem comes from a regression test of 100 cases. If I
>> run them one by one (with some break time between each run),
>> I would not expect this problem. It seems to me some
>> operations have not been done although the previous run quit
>> normally.
>>
>> Thanks,
>>
>> Yusong
>>
>> ----- Original Message -----
>> From: Rusty Lusk <lusk at mcs.anl.gov>
>> Date: Wednesday, May 10, 2006 4:34 pm
>> Subject: Re: [MPICH] error information
>>
>>> You are using a very old version of MPICH.  Can you use MPICH2?
>>> It might give you better information on termination.
>>>
>>> Regards,
>>> Rusty Lusk
>>>
>>> From: Yusong Wang <ywang25 at aps.anl.gov>
>>> Subject: [MPICH] error information
>>> Date: Wed, 10 May 2006 16:27:13 -0500
>>>
>>>> Hi,
>>>>
>>>> I repeated a same test several times on Jazz. Most times it
>>> works fine,
>>>> occasionally (1 out of 5 runs), I got the following errors:
>>>>
>>>> /soft/apps/packages/mpich-p4-1.2.6-gcc-3.2.3-1/bin/mpirun: line
>>> 1: 24600
>>>> Broken pipe
>> /home/ywang/oag/apps/bin/linux-x86/Pelegant
>>>> "run.ele" -p4pg /home/ywang/elegantRuns/script3/PI24473 -
>>>> p4wd /home/ywang/elegantRuns/script3
>>>>     p4_error: latest msg from perror: Bad file descriptor
>>>> rm_l_2_16806: (1.024331) net_send: could not write to fd=6,
>>> errno = 9
>>>> rm_l_2_16806:  p4_error: net_send write: -1
>>>> Broken pipe
>>>> length of beamline PAR per pass: 3.066670000001400e+01 m
>>>> statistics:    ET:     00:00:01 CP:    0.09 BIO:0 DIO:0 PF:0 MEM:0
>>>> p3_15201:  p4_error: net_recv read:  probable EOF on socket: 1
>>>> Broken pipe
>>>>
>>>> I can't find the reason of this problem. The same thing
>> happened on
>>>> another cluster. The totalview debugger didn't give me too much
>>> useful> information. The survived processes just stuck at an
>>> MPI_Barrier> command.
>>>>
>>>> Can someone give me some hint to fixed the problem
>> according to the
>>>> error information given above?
>>>>
>>>> The working directory is:
>>>>  /home/ywang/elegantRuns/script3/
>>>> The command I used:
>>>> mpirun -np 4 -machinefile $PBS_NODEFILE
>>> /home/ywang/oag/apps/bin/linux-
>>>> x86/Pelegant run.ele
>>>>
>>>> Thanks in advance,
>>>>
>>>> Yusong Wang
>>>>
>>>
>>>
>>
>>
>




More information about the mpich-discuss mailing list