[MPICH] Explanation of error message in MPICH-1.2.7

Yusong Wang ywang25 at aps.anl.gov
Fri Oct 6 17:33:19 CDT 2006


Can you log on to one of the slave nodes and type:

ps -fed

You may find some left MPI processes there.

If I am right, it is better to update to MPICH2, which doesn't have such
problems.

Yusong

On Fri, 2006-10-06 at 15:47 -0400, Jeffrey B. Layton wrote:
> Afternoon cluster fans,
> 
>    I'm working with a CFD code using the PGI 6.1 compilers and
> MPICH-1.2.7. The code runs fine for a while but I get an error
> message that I've never seen before:
> 
> 
> [2] MPI Internal Aborting program Deep nest in Check_incoming
> [2] Deep nest in Check_incoming
> 
> This error message is in the error file from PBS. The output from
> the code gives the following:
> 
> 
> p2_15458:  p4_error: : 1
> p5_21530:  p4_error: net_recv read:  probable EOF on socket: 1
> p7_21548:  p4_error: net_recv read:  probable EOF on socket: 1
> p6_21539:  p4_error: net_recv read:  probable EOF on socket: 1
> rm_l_6_21544: (95.492188) net_send: could not write to fd=5, errno = 32
> rm_l_2_15464: (95.835938) net_send: could not write to fd=5, errno = 32
> rm_l_5_21535: (95.574219) net_send: could not write to fd=5, errno = 32
> rm_l_7_21553: (95.410156) net_send: could not write to fd=5, errno = 32
> 
> 
>    The code runs fine with other MPI implementations (Scali,
> MVAPICH, etc.) My googling efforts haven't yielded anything.
> Does anyone have any input on this?
> 
> Thanks!
> 
> Jeff
> 




More information about the mpich-discuss mailing list