[MPICH] MPICH.NT and MPICH2 issue on Windows 2003 R2

Watford, Christopher A (GE Infra, Energy) christopher.watford at ge.com
Wed Jul 25 08:47:46 CDT 2007


An interesting thing to note is this is not an HP NC373i Multifunction
Gigabit Server Adapter TOE problem. The machines also have an HP NC380T
PCIe DP Multifunc Gig Server Adapter which ALSO has the same TOE
problem.
 
It appears the Scalable Networking Pack (SNP) that comes with Windows
2003 R2 does not like offloading the packets our application is
generating (150MiB+). I believe Windows 2000 Server is not affected
because it does not use the new 'Chimney' system that comes with SNP for
Win2k3.
 
I'm now engaging Microsoft to get to the bottom of this issue.
 
--
Christopher
9106755743
 


________________________________

	From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Jayesh Krishna
	Sent: Friday, July 20, 2007 3:49 PM
	To: Watford, Christopher A (GE Infra, Energy)
	Cc: mpich-discuss at mcs.anl.gov
	Subject: RE: [MPICH] MPICH.NT and MPICH2 issue on Windows 2003
R2
	
	
	Hi,
	 Good to hear that you solved the problem. Let us know if you
need any further assistance.
	 
	Regards,
	Jayesh

________________________________

	From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Watford,
Christopher A (GE Infra, Energy)
	Sent: Friday, July 20, 2007 2:05 PM
	To: Jayesh Krishna
	Cc: mpich-discuss at mcs.anl.gov
	Subject: RE: [MPICH] MPICH.NT and MPICH2 issue on Windows 2003
R2
	
	
	I have isolated the problem as an issue in the TCP Offloading
engine in the new machines HP NC373i Multifunction Gigabit Server
Adapter. I'm not sure exactly which offloading section was failing, but
the cards reported TCP Offload Errors during the MPI_Bcast calls (150MB
buffers!).
	 
	Disabling TCP offloading fixed the problem (granted it will
probably hose performance, hopefully a driver/firmware fix exists).
	 
	Thank you.
	 
	--
	Christopher
	9106755743
	 


________________________________

		From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Watford,
Christopher A (GE Infra, Energy)
		Sent: Thursday, July 19, 2007 1:57 PM
		To: Jayesh Krishna
		Cc: mpich-discuss at mcs.anl.gov
		Subject: RE: [MPICH] MPICH.NT and MPICH2 issue on
Windows 2003 R2
		
		
		We use MPICH2 as well and applications with MPICH2 are
having problems as well on Win2k3r2. However, we have some machines
still on Win2ksvr and some on Win2k3x64, neither have this issue.
		 
		MPICH.NT/MPICH2 on Win2k3r2 is where we see it.
		 
		Now cpi.c does not exhibit this problem, however, I'm
not sure exactly how cpi.c's usage differs from our applications, but
I'll have a look.
		 
		--
		Christopher
		9106755743
		 


________________________________

			From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov]

			Sent: Thursday, July 19, 2007 12:45 PM
			To: Watford, Christopher A (GE Infra, Energy)
			Cc: mpich-discuss at mcs.anl.gov
			Subject: RE: [MPICH] MPICH.NT and MPICH2 issue
on Windows 2003 R2
			
			
			Hi,
			 We would recommend that you migrate to MPICH2.
All of our current development efforts are going into MPICH2.
			 Meanwhile, do you have the same problem with OS
flavors other than Win 2003 R2 ? Do you see the same problem with sample
MPI applications like cpi.c ?
			 
			Regards,
			Jayesh
			
________________________________

			From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Watford,
Christopher A (GE Infra, Energy)
			Sent: Thursday, July 19, 2007 10:47 AM
			To: mpich-discuss at mcs.anl.gov
			Subject: RE: [MPICH] MPICH.NT and MPICH2 issue
on Windows 2003 R2
			
			
			(Apologies for the double post, our email system
is not the best.)
			 
			It also appears that I have no problems
submitting from host A to host B to run (solely on host B). I can watch
it work fine on host B via the command prompt on host A (where mpiexec
was called). Yet it appears as soon as I have a job that spans two
Win2k3r2 hosts I run into a problem.
			 
			--
			Christopher
			9106755743
			 


________________________________

				From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Watford,
Christopher A (GE Infra, Energy)
				Sent: Thursday, July 19, 2007 10:32 AM
				To: mpich-discuss at mcs.anl.gov
				Subject: [MPICH] MPICH.NT and MPICH2
issue on Windows 2003 R2
				
				
				I've run into an interesting problem
with both MPICH.NT and MPICH2 on Windows 2003 R2. With MPICH.NT our
application, given N processors across M machines, will have at most 2
processes going at once (across all machines). On MPICH2, if the N
processors span more than one machine nothing happens. Process 0 can
make forward progress, but no other machines talk to one another, which
is similar to the problem we are seeing with MPICH.NT, but not quite the
same.
				 

				Christopher Watford
				GE-Hitachi Nuclear Energy
				E Christopher.Watford at ge.com
<mailto:Christopher.Watford at ge.com> 
				http://www.ge-energy.com/nuclear

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070725/36b3afb6/attachment.htm>


More information about the mpich-discuss mailing list