[mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on Windows XP and Windows Server 2008

tiago alves tiagoalveshandsome at gmail.com
Tue Mar 1 20:19:26 CST 2011


Why not using true systems ? i mean unix like ...
ist a shame use windows in clusters ...
in fact, for it is also put this 1/4 of system im my PC

2011/2/28 Jayesh Krishna <jayesh at mcs.anl.gov>

> Hi,
>  The virtual machine setup is not something that we typically test at our
> test environment. I have created a ticket (
> https://trac.mcs.anl.gov/projects/mpich2/ticket/1445) for the same. I will
> get to it when I have some free dev cycles.
>
> Regards,
> Jayesh
>
> ----- Original Message -----
> From: "Li Zuwei" <lzuwei at dso.org.sg>
> To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
> Cc: mpich-discuss at mcs.anl.gov
> Sent: Sunday, February 27, 2011 7:18:27 PM
> Subject: RE: [mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on
> Windows        XP and Windows Server 2008
>
>
>
> Hi Jayesh,
>
> Firewalls are disabled on the machines. The machines are actually running
> on VMware ESX3.5 in a intel xeon server. They are physically housed together
> but logically separated by vmware. No problems with pinging the servers.
>
> Regards,
> Zuwei
>
>
>
> -----Original Message-----
> From: Jayesh Krishna [ mailto:jayesh at mcs.anl.gov ]
> Sent: Fri 2/25/2011 10:09 PM
> To: Li Zuwei
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on
> Windows XP and Windows Server 2008
>
> Hi,
> This could be a firewall issue. Did you turn off Windows firewall on both
> the machines ?
>
> Regards,
> jayesh
>
> ----- Original Message -----
> From: "Li Zuwei" <lzuwei at dso.org.sg>
> To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
> Sent: Thursday, February 24, 2011 8:45:54 PM
> Subject: RE: [mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on
> Windows XP and Windows Server 2008
>
>
>
> Hi,
>
> Thanks for the response, a similar problem occurred for cpi.exe, this time
> round it lies in MPI_Bcast, with same network errors. What puzzles me is the
> fact that I could run the program remotely on the other nodes through the
> >mpiexec -host remote_node -n 4 cpi.exe
>
> On a program that uses purely Send and Recv commands, I didn't have
> problems running on multiple nodes using the machinefile.
>
> In the case of network error, are there any settings that I have to take
> note for windows? Such as DCOM settings, remote access etc. The users for
> the nodes are all administrators, so I presume there won't be any problems
> with remote file access and launch of programs.
>
> Regards,
> Zuwei
>
>
>
> -----Original Message-----
> From: Jayesh Krishna [ mailto:jayesh at mcs.anl.gov ]
> Sent: Fri 2/25/2011 12:59 AM
> To: mpich-discuss at mcs.anl.gov
> Cc: Li Zuwei
> Subject: Re: [mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on
> Windows XP and Windows Server 2008
>
> Hi,
> >From the error message it looks like a network connectivity issue (not
> related to MPI_Barrier()). Can you send us a test program that fails ?
> Can you run cpi.exe (c:\program files\MPICH2\examples\cpi.exe) across the
> nodes ?
>
> Regards,
> Jayesh
>
> ----- Original Message -----
> From: "Li Zuwei" <lzuwei at dso.org.sg>
> To: mpich-discuss at mcs.anl.gov
> Sent: Thursday, February 24, 2011 2:00:53 AM
> Subject: [mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on
> Windows XP and Windows Server 2008
>
>
> Problems with Barriers on MPICH2-1.3.2p1 on Windows XP and Windows Server
> 2008
>
> Hi users,
>
> I have some issues with MPI_Barrier() on the MPICH2-1.3.2p1 build on
> Windows.
> On a single node, the operation works flawlessly, however when the program
> is scheduled to run on multiple nodes I get the following errors.
>
> mf.txt
> node0:1
> node1:1
>
> >mpiexec -machinefile mf.txt -n 2 mpi_test.exe
>
> Fatal error in PMPI_Barrier: Other MPI error, error stack:
> PMPI_Barrier(425)...........................: MPI_Barrier(MPI_COMM_WORLD)
> failed
> MPIR_Barrier_impl(331)......................: Failure during collective
> MPIR_Barrier_impl(313)......................:
> MPIR_Barrier_intra(83)......................:
> MPIC_Sendrecv(192)..........................:
> MPIC_Wait(540)..............................:
> MPIDI_CH3I_Progress(353)....................:
> MPID_nem_mpich2_blocking_recv(905)..........:
> MPID_nem_newtcp_module_poll(37).............:
> MPID_nem_newtcp_module_connpoll(2655).......:
> gen_cnting_fail_handler(1738)...............: connect failed - the network
> location connot be reached. For information about network troubleshooting,
> see Windows Help.
>
> (errno 1231)
>
> job aborted:
> rank: node: exit code[: error message]
> 0: node0: 123
> 1: node1: 1: process 1 exited without calling finalize
>
> Additional Notes:
> When running against code without any MPI_Barrier calls, no problems with
> were encountered (ie: on multiple nodes send and recv). Based on that I
> presume my settings were correct and the problem might lie in the barrier
> implementation on windows.
>
> Any help to identify the problem here would be great.
>
>
> Regards,
> Zuwei
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110301/e3ec8305/attachment-0001.htm>


More information about the mpich-discuss mailing list