[mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on Windows XP and Windows Server 2008

Jayesh Krishna jayesh at mcs.anl.gov
Wed Mar 2 10:22:51 CST 2011


 Please do not use the mailing list to broadcast your opinions on whether someone should use Unix/Windows/Mac/... to run MPI.
 You are welcome to post any solutions (other than "switch to unix" :) ) for user's problems.

-Jayesh

----- Original Message -----
From: "tiago alves" <tiagoalveshandsome at gmail.com>
To: mpich-discuss at mcs.anl.gov
Sent: Tuesday, March 1, 2011 8:19:26 PM
Subject: Re: [mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on Windows XP and Windows Server 2008


Why not using true systems ? i mean unix like ... 
ist a shame use windows in clusters ... 
in fact, for it is also put this 1/4 of system im my PC 


2011/2/28 Jayesh Krishna < jayesh at mcs.anl.gov > 


Hi, 
The virtual machine setup is not something that we typically test at our test environment. I have created a ticket ( https://trac.mcs.anl.gov/projects/mpich2/ticket/1445 ) for the same. I will get to it when I have some free dev cycles. 


Regards, 
Jayesh 

----- Original Message ----- 
From: "Li Zuwei" < lzuwei at dso.org.sg > 

To: "Jayesh Krishna" < jayesh at mcs.anl.gov > 

Cc: mpich-discuss at mcs.anl.gov 



Sent: Sunday, February 27, 2011 7:18:27 PM 
Subject: RE: [mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on Windows XP and Windows Server 2008 



Hi Jayesh, 

Firewalls are disabled on the machines. The machines are actually running on VMware ESX3.5 in a intel xeon server. They are physically housed together but logically separated by vmware. No problems with pinging the servers. 

Regards, 
Zuwei 



-----Original Message----- 
From: Jayesh Krishna [ mailto: jayesh at mcs.anl.gov ] 
Sent: Fri 2/25/2011 10:09 PM 
To: Li Zuwei 
Cc: mpich-discuss at mcs.anl.gov 
Subject: Re: [mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on Windows XP and Windows Server 2008 

Hi, 
This could be a firewall issue. Did you turn off Windows firewall on both the machines ? 

Regards, 
jayesh 

----- Original Message ----- 
From: "Li Zuwei" < lzuwei at dso.org.sg > 
To: "Jayesh Krishna" < jayesh at mcs.anl.gov > 
Sent: Thursday, February 24, 2011 8:45:54 PM 
Subject: RE: [mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on Windows XP and Windows Server 2008 



Hi, 

Thanks for the response, a similar problem occurred for cpi.exe, this time round it lies in MPI_Bcast, with same network errors. What puzzles me is the fact that I could run the program remotely on the other nodes through the 
>mpiexec -host remote_node -n 4 cpi.exe 

On a program that uses purely Send and Recv commands, I didn't have problems running on multiple nodes using the machinefile. 

In the case of network error, are there any settings that I have to take note for windows? Such as DCOM settings, remote access etc. The users for the nodes are all administrators, so I presume there won't be any problems with remote file access and launch of programs. 

Regards, 
Zuwei 



-----Original Message----- 
From: Jayesh Krishna [ mailto: jayesh at mcs.anl.gov ] 
Sent: Fri 2/25/2011 12:59 AM 
To: mpich-discuss at mcs.anl.gov 
Cc: Li Zuwei 
Subject: Re: [mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on Windows XP and Windows Server 2008 

Hi, 
>From the error message it looks like a network connectivity issue (not related to MPI_Barrier()). Can you send us a test program that fails ? 
Can you run cpi.exe (c:\program files\MPICH2\examples\cpi.exe) across the nodes ? 

Regards, 
Jayesh 

----- Original Message ----- 
From: "Li Zuwei" < lzuwei at dso.org.sg > 
To: mpich-discuss at mcs.anl.gov 
Sent: Thursday, February 24, 2011 2:00:53 AM 
Subject: [mpich-discuss] Problems with Barriers on MPICH2-1.3.2p1 on Windows XP and Windows Server 2008 


Problems with Barriers on MPICH2-1.3.2p1 on Windows XP and Windows Server 2008 

Hi users, 

I have some issues with MPI_Barrier() on the MPICH2-1.3.2p1 build on Windows. 
On a single node, the operation works flawlessly, however when the program is scheduled to run on multiple nodes I get the following errors. 

mf.txt 
node0:1 
node1:1 

>mpiexec -machinefile mf.txt -n 2 mpi_test.exe 

Fatal error in PMPI_Barrier: Other MPI error, error stack: 
PMPI_Barrier(425)...........................: MPI_Barrier(MPI_COMM_WORLD) failed 
MPIR_Barrier_impl(331)......................: Failure during collective 
MPIR_Barrier_impl(313)......................: 
MPIR_Barrier_intra(83)......................: 
MPIC_Sendrecv(192)..........................: 
MPIC_Wait(540)..............................: 
MPIDI_CH3I_Progress(353)....................: 
MPID_nem_mpich2_blocking_recv(905)..........: 
MPID_nem_newtcp_module_poll(37).............: 
MPID_nem_newtcp_module_connpoll(2655).......: 
gen_cnting_fail_handler(1738)...............: connect failed - the network location connot be reached. For information about network troubleshooting, see Windows Help. 

(errno 1231) 

job aborted: 
rank: node: exit code[: error message] 
0: node0: 123 
1: node1: 1: process 1 exited without calling finalize 

Additional Notes: 
When running against code without any MPI_Barrier calls, no problems with were encountered (ie: on multiple nodes send and recv). Based on that I presume my settings were correct and the problem might lie in the barrier implementation on windows. 

Any help to identify the problem here would be great. 


Regards, 
Zuwei 


_______________________________________________ 
mpich-discuss mailing list 
mpich-discuss at mcs.anl.gov 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 

_______________________________________________ 
mpich-discuss mailing list 
mpich-discuss at mcs.anl.gov 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 


_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list