[MPICH] error information

Yusong Wang ywang25 at aps.anl.gov
Wed May 10 16:27:13 CDT 2006


Hi,

I repeated a same test several times on Jazz. Most times it works fine,
occasionally (1 out of 5 runs), I got the following errors:

/soft/apps/packages/mpich-p4-1.2.6-gcc-3.2.3-1/bin/mpirun: line 1: 24600
Broken pipe             /home/ywang/oag/apps/bin/linux-x86/Pelegant
"run.ele" -p4pg /home/ywang/elegantRuns/script3/PI24473 -
p4wd /home/ywang/elegantRuns/script3
    p4_error: latest msg from perror: Bad file descriptor
rm_l_2_16806: (1.024331) net_send: could not write to fd=6, errno = 9
rm_l_2_16806:  p4_error: net_send write: -1
Broken pipe
length of beamline PAR per pass: 3.066670000001400e+01 m
statistics:    ET:     00:00:01 CP:    0.09 BIO:0 DIO:0 PF:0 MEM:0
p3_15201:  p4_error: net_recv read:  probable EOF on socket: 1
Broken pipe

I can't find the reason of this problem. The same thing happened on
another cluster. The totalview debugger didn't give me too much useful
information. The survived processes just stuck at an MPI_Barrier
command. 

Can someone give me some hint to fixed the problem according to the
error information given above?

The working directory is:
 /home/ywang/elegantRuns/script3/
The command I used:
mpirun -np 4 -machinefile $PBS_NODEFILE /home/ywang/oag/apps/bin/linux-
x86/Pelegant run.ele

Thanks in advance,

Yusong Wang




More information about the mpich-discuss mailing list