[MPICH] error information
Rusty Lusk
lusk at mcs.anl.gov
Wed May 10 16:34:11 CDT 2006
You are using a very old version of MPICH. Can you use MPICH2?
It might give you better information on termination.
Regards,
Rusty Lusk
From: Yusong Wang <ywang25 at aps.anl.gov>
Subject: [MPICH] error information
Date: Wed, 10 May 2006 16:27:13 -0500
> Hi,
>
> I repeated a same test several times on Jazz. Most times it works fine,
> occasionally (1 out of 5 runs), I got the following errors:
>
> /soft/apps/packages/mpich-p4-1.2.6-gcc-3.2.3-1/bin/mpirun: line 1: 24600
> Broken pipe /home/ywang/oag/apps/bin/linux-x86/Pelegant
> "run.ele" -p4pg /home/ywang/elegantRuns/script3/PI24473 -
> p4wd /home/ywang/elegantRuns/script3
> p4_error: latest msg from perror: Bad file descriptor
> rm_l_2_16806: (1.024331) net_send: could not write to fd=6, errno = 9
> rm_l_2_16806: p4_error: net_send write: -1
> Broken pipe
> length of beamline PAR per pass: 3.066670000001400e+01 m
> statistics: ET: 00:00:01 CP: 0.09 BIO:0 DIO:0 PF:0 MEM:0
> p3_15201: p4_error: net_recv read: probable EOF on socket: 1
> Broken pipe
>
> I can't find the reason of this problem. The same thing happened on
> another cluster. The totalview debugger didn't give me too much useful
> information. The survived processes just stuck at an MPI_Barrier
> command.
>
> Can someone give me some hint to fixed the problem according to the
> error information given above?
>
> The working directory is:
> /home/ywang/elegantRuns/script3/
> The command I used:
> mpirun -np 4 -machinefile $PBS_NODEFILE /home/ywang/oag/apps/bin/linux-
> x86/Pelegant run.ele
>
> Thanks in advance,
>
> Yusong Wang
>
More information about the mpich-discuss
mailing list