[mpich-discuss] Strange MPI_Recv error

Xiao Li shinelee.thewise at gmail.com
Fri Feb 11 00:14:49 CST 2011


PS:

When the code error is reported at iteration n, if I restart the program at
iteration n-1, then everything goes fine for several iteration rounds. Then,
the error occurs again.

On Fri, Feb 11, 2011 at 12:59 AM, Xiao Li <shinelee.thewise at gmail.com>wrote:

> Hi,
>
> I use a small MPI program and get the following error.
>
> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186)........................: MPI_Recv(buf=0012FA20, count=1,
>> MPI_INT,
>> src=MPI_ANY_SOURCE, tag=5, MPI_COMM_WORLD, status=0012FA80) failed
>> MPIDI_CH3I_Progress(335).............:
>> MPID_nem_mpich2_blocking_recv(906)...:
>> MPID_nem_newtcp_module_poll(37)......:
>> MPID_nem_newtcp_module_connpoll(2655):
>> gen_read_fail_handler(1145)..........: read from socket failed - The
>> specified network name is no longer available.
>
>
> The code framework is something like this below.
>
> if rank == 0
> {
>   for iter=1 to N
>        MPI_Recv any
>        get proc rank from status
>        MPI_Send proc
>   end
> }
> else
> {
>        for iter=1 to N
>            MPI_Send to 0
>            MPI_Recv from 0
>            do some computation  here
>        end
> }
>
> I do check my code carefully. And I even rewrite the core computation code
> in a series way. Then I get no error.  Even more strange is that the code
> will crash at different for loop iteration. I suspect the MPI can not work
> in my network environment. The network is composed by four Windows XP
> machines with 100/mbps Ethernet network. Would you help me on this issue?
>
> cheers
> Xiao
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110211/9c14b206/attachment-0001.htm>


More information about the mpich-discuss mailing list