[MPICH] error information

Rusty Lusk lusk at mcs.anl.gov
Wed May 10 16:34:11 CDT 2006


You are using a very old version of MPICH.  Can you use MPICH2?
It might give you better information on termination.

Regards,
Rusty Lusk

From: Yusong Wang <ywang25 at aps.anl.gov>
Subject: [MPICH] error information
Date: Wed, 10 May 2006 16:27:13 -0500

> Hi,
> 
> I repeated a same test several times on Jazz. Most times it works fine,
> occasionally (1 out of 5 runs), I got the following errors:
> 
> /soft/apps/packages/mpich-p4-1.2.6-gcc-3.2.3-1/bin/mpirun: line 1: 24600
> Broken pipe             /home/ywang/oag/apps/bin/linux-x86/Pelegant
> "run.ele" -p4pg /home/ywang/elegantRuns/script3/PI24473 -
> p4wd /home/ywang/elegantRuns/script3
>     p4_error: latest msg from perror: Bad file descriptor
> rm_l_2_16806: (1.024331) net_send: could not write to fd=6, errno = 9
> rm_l_2_16806:  p4_error: net_send write: -1
> Broken pipe
> length of beamline PAR per pass: 3.066670000001400e+01 m
> statistics:    ET:     00:00:01 CP:    0.09 BIO:0 DIO:0 PF:0 MEM:0
> p3_15201:  p4_error: net_recv read:  probable EOF on socket: 1
> Broken pipe
> 
> I can't find the reason of this problem. The same thing happened on
> another cluster. The totalview debugger didn't give me too much useful
> information. The survived processes just stuck at an MPI_Barrier
> command. 
> 
> Can someone give me some hint to fixed the problem according to the
> error information given above?
> 
> The working directory is:
>  /home/ywang/elegantRuns/script3/
> The command I used:
> mpirun -np 4 -machinefile $PBS_NODEFILE /home/ywang/oag/apps/bin/linux-
> x86/Pelegant run.ele
> 
> Thanks in advance,
> 
> Yusong Wang
> 




More information about the mpich-discuss mailing list