[mpich-discuss] Strange MPI_Recv error
Jayesh Krishna
jayesh at mcs.anl.gov
Fri Feb 11 10:56:12 CST 2011
Hi,
Yes, you need to disable firewalls to enable MPI communication (Or you can follow the steps in Section 9.5 of the Windows developer's guide - http://www.mcs.anl.gov/research/projects/mpich2/documentation/files/mpich2-1.3.2-windevguide.pdf).
Let us know if you have any further issues.
Regards,
Jayesh
----- Original Message -----
From: "Xiao Li" <shinelee.thewise at gmail.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Sent: Friday, February 11, 2011 9:45:19 AM
Subject: Re: [mpich-discuss] Strange MPI_Recv error
Hi Jayesh,
I just checked the network environment again. I found one of the Windows machine has firewall open. Does that occasionally block the MPI communication?
PS: the source code depends on several library files which I can not publish yet.
cheers
Xiao
On Fri, Feb 11, 2011 at 8:34 AM, Jayesh Krishna < jayesh at mcs.anl.gov > wrote:
Hi,
Can you send us a test program (source code) that fails ?
Regards,
Jayesh
----- Original Message -----
From: "Xiao Li" < shinelee.thewise at gmail.com >
To: mpich-discuss at mcs.anl.gov
Sent: Friday, February 11, 2011 12:14:49 AM
Subject: Re: [mpich-discuss] Strange MPI_Recv error
PS:
When the code error is reported at iteration n, if I restart the program at iteration n-1, then everything goes fine for several iteration rounds. Then, the error occurs again.
On Fri, Feb 11, 2011 at 12:59 AM, Xiao Li < shinelee.thewise at gmail.com > wrote:
Hi,
I use a small MPI program and get the following error.
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186)........................: MPI_Recv(buf=0012FA20, count=1, MPI_INT,
src=MPI_ANY_SOURCE, tag=5, MPI_COMM_WORLD, status=0012FA80) failed
MPIDI_CH3I_Progress(335).............:
MPID_nem_mpich2_blocking_recv(906)...:
MPID_nem_newtcp_module_poll(37)......:
MPID_nem_newtcp_module_connpoll(2655):
gen_read_fail_handler(1145)..........: read from socket failed - The specified network name is no longer available.
The code framework is something like this below.
if rank == 0
{
for iter=1 to N
MPI_Recv any
get proc rank from status
MPI_Send proc
end
}
else
{
for iter=1 to N
MPI_Send to 0
MPI_Recv from 0
do some computation here
end
}
I do check my code carefully. And I even rewrite the core computation code in a series way. Then I get no error. Even more strange is that the code will crash at different for loop iteration. I suspect the MPI can not work in my network environment. The network is composed by four Windows XP machines with 100/mbps Ethernet network. Would you help me on this issue?
cheers
Xiao
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list