Hi,<br>
<br>
i have one executable compiled with mpich-1.2.7p1. it works fine if run
on one server (node) but it gives following error if i try to run it
across multiple nodes.<br>
<br>
<br>
p15_4211: p4_error: net_recv read: probable EOF on socket: 1<br>
> p14_4184: p4_error: net_recv read: probable EOF on socket: 1<br>
> p13_4157: p4_error: net_recv read: probable EOF on socket: 1<br>
> p12_4130: p4_error: net_recv read: probable EOF on socket: 1<br>
> p11_4103: p4_error: net_recv read: probable EOF on socket: 1<br>
> p10_4076: p4_error: net_recv read: probable EOF on socket: 1<br>
> p8_4022: p4_error: net_recv read: probable EOF on socket: 1<br>
> p9_4049: p4_error: net_recv read: probable EOF on socket: 1<br>
> p23_9466: p4_error: net_recv read: probable EOF on socket: 1<br>
> p18_9331: p4_error: net_recv read: probable EOF on socket: 1<br>
> p22_9439: p4_error: net_recv read: probable EOF on socket: 1<br>
> p16_9277: p4_error: net_recv read: probable EOF on socket: 1<br>
> p17_9304: p4_error: net_recv read: probable EOF on socket: 1<br>
> p3_32073: p4_error: net_recv read: probable EOF on socket: 1<br>
> p6_32157: p4_error: net_recv read: probable EOF on socket: 1<br>
> p7_32185: p4_error: net_recv read: probable EOF on socket: 1<br>
> p4_32101: p4_error: net_recv read: probable EOF on socket: 1<br>
> p5_32129: p4_error: net_recv read: probable EOF on socket: 1<br>
> rm_l_2_32068: (1003.843750) net_send: could not write to fd=5, errno = 32<br>
<br>
I have tried changing P4_GLOBMEMSIZE but it did not work. machine is having linux (rhel 5.2) with 48 GB of memory.<br>Where to llok for errors? I know that mpich-1 is no longer supported. But looking for help.<br>