[MPICH] error information
Yusong Wang
ywang25 at aps.anl.gov
Wed May 10 16:27:13 CDT 2006
Hi,
I repeated a same test several times on Jazz. Most times it works fine,
occasionally (1 out of 5 runs), I got the following errors:
/soft/apps/packages/mpich-p4-1.2.6-gcc-3.2.3-1/bin/mpirun: line 1: 24600
Broken pipe /home/ywang/oag/apps/bin/linux-x86/Pelegant
"run.ele" -p4pg /home/ywang/elegantRuns/script3/PI24473 -
p4wd /home/ywang/elegantRuns/script3
p4_error: latest msg from perror: Bad file descriptor
rm_l_2_16806: (1.024331) net_send: could not write to fd=6, errno = 9
rm_l_2_16806: p4_error: net_send write: -1
Broken pipe
length of beamline PAR per pass: 3.066670000001400e+01 m
statistics: ET: 00:00:01 CP: 0.09 BIO:0 DIO:0 PF:0 MEM:0
p3_15201: p4_error: net_recv read: probable EOF on socket: 1
Broken pipe
I can't find the reason of this problem. The same thing happened on
another cluster. The totalview debugger didn't give me too much useful
information. The survived processes just stuck at an MPI_Barrier
command.
Can someone give me some hint to fixed the problem according to the
error information given above?
The working directory is:
/home/ywang/elegantRuns/script3/
The command I used:
mpirun -np 4 -machinefile $PBS_NODEFILE /home/ywang/oag/apps/bin/linux-
x86/Pelegant run.ele
Thanks in advance,
Yusong Wang
More information about the mpich-discuss
mailing list