[mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on Windows XP and Windows Server 2008

Li Zuwei lzuwei at dso.org.sg
Thu Feb 24 02:00:53 CST 2011


Hi users,

I have some issues with MPI_Barrier() on the MPICH2-1.3.2p1 build on Windows.
On a single node, the operation works flawlessly, however when the program is scheduled to run on multiple nodes I get the following errors.

mf.txt
node0:1
node1:1

>mpiexec -machinefile mf.txt -n 2 mpi_test.exe

Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425)...........................: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(331)......................: Failure during collective
MPIR_Barrier_impl(313)......................:
MPIR_Barrier_intra(83)......................:
MPIC_Sendrecv(192)..........................:
MPIC_Wait(540)..............................:
MPIDI_CH3I_Progress(353)....................:
MPID_nem_mpich2_blocking_recv(905)..........:
MPID_nem_newtcp_module_poll(37).............:
MPID_nem_newtcp_module_connpoll(2655).......:
gen_cnting_fail_handler(1738)...............: connect failed - the network location connot be reached. For information about network troubleshooting, see Windows Help.

(errno 1231)

job aborted:
rank: node: exit code[: error message]
0: node0: 123
1: node1: 1: process 1 exited without calling finalize

Additional Notes:
When running against code without any MPI_Barrier calls, no problems with were encountered (ie: on multiple nodes send and recv). Based on that I presume my settings were correct and the problem might lie in the barrier implementation on windows. 
 
Any help to identify the problem here would be great.


Regards,
Zuwei

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110224/614b02a1/attachment.htm>


More information about the mpich-discuss mailing list