[mpich-discuss] Strange MPI_Recv error

Jayesh Krishna jayesh at mcs.anl.gov
Fri Feb 11 10:56:12 CST 2011


Hi,
 Yes, you need to disable firewalls to enable MPI communication (Or you can follow the steps in Section 9.5 of the Windows developer's guide - http://www.mcs.anl.gov/research/projects/mpich2/documentation/files/mpich2-1.3.2-windevguide.pdf).
 Let us know if you have any further issues.

Regards,
Jayesh

----- Original Message -----
From: "Xiao Li" <shinelee.thewise at gmail.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Sent: Friday, February 11, 2011 9:45:19 AM
Subject: Re: [mpich-discuss] Strange MPI_Recv error

Hi Jayesh, 


I just checked the network environment again. I found one of the Windows machine has firewall open. Does that occasionally block the MPI communication? 


PS: the source code depends on several library files which I can not publish yet. 


cheers 
Xiao 


On Fri, Feb 11, 2011 at 8:34 AM, Jayesh Krishna < jayesh at mcs.anl.gov > wrote: 


Hi, 
Can you send us a test program (source code) that fails ? 

Regards, 
Jayesh 




----- Original Message ----- 
From: "Xiao Li" < shinelee.thewise at gmail.com > 
To: mpich-discuss at mcs.anl.gov 
Sent: Friday, February 11, 2011 12:14:49 AM 
Subject: Re: [mpich-discuss] Strange MPI_Recv error 


PS: 


When the code error is reported at iteration n, if I restart the program at iteration n-1, then everything goes fine for several iteration rounds. Then, the error occurs again. 


On Fri, Feb 11, 2011 at 12:59 AM, Xiao Li < shinelee.thewise at gmail.com > wrote: 



Hi, 


I use a small MPI program and get the following error. 




Fatal error in MPI_Recv: Other MPI error, error stack: 
MPI_Recv(186)........................: MPI_Recv(buf=0012FA20, count=1, MPI_INT, 
src=MPI_ANY_SOURCE, tag=5, MPI_COMM_WORLD, status=0012FA80) failed 
MPIDI_CH3I_Progress(335).............: 
MPID_nem_mpich2_blocking_recv(906)...: 
MPID_nem_newtcp_module_poll(37)......: 
MPID_nem_newtcp_module_connpoll(2655): 
gen_read_fail_handler(1145)..........: read from socket failed - The specified network name is no longer available. 


The code framework is something like this below. 


if rank == 0 
{ 
for iter=1 to N 
MPI_Recv any 
get proc rank from status 
MPI_Send proc 
end 
} 
else 
{ 
for iter=1 to N 
MPI_Send to 0 
MPI_Recv from 0 
do some computation here 
end 

} 


I do check my code carefully. And I even rewrite the core computation code in a series way. Then I get no error. Even more strange is that the code will crash at different for loop iteration. I suspect the MPI can not work in my network environment. The network is composed by four Windows XP machines with 100/mbps Ethernet network. Would you help me on this issue? 


cheers 
Xiao 


_______________________________________________ 
mpich-discuss mailing list 
mpich-discuss at mcs.anl.gov 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 



More information about the mpich-discuss mailing list