[mpich-discuss] MPICH2 1.3 speed down on Windows 2008 R2

Youri LACAN-BARTLEY youri.lacan-bartley at transvalor.com
Fri Aug 19 03:07:56 CDT 2011


I've finally been able to run another series of tests on this server.
What I've noticed is that IMB performs very poorly with MPICH2 1.3 be it on 2 or 32 cores (observed latencies are catastrophic).
Out of simple curiosity I ran IMB on 32 cores using MSMPI and in this case I obtain perfectly satisfactory performances (I can provide IMB output for both scenarios if necessary).

Running our application on eight cores with MPICH2 and binding has yielded the following expected result: 2 cores per socket is 80% faster than running on one single socket.

Since MSMPI is based on MPICH2 I'm really surprised by the differences in latency. The only major difference I see is the WinSock Direct protocol used in MSMPI.

Would anyone have any kind of idea on what is causing these issues with MPICH2?

Thanks,

Youri LACAN-BARTLEY

-----Message d'origine-----
De : mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov] De la part de Youri LACAN-BARTLEY
Envoyé : jeudi 12 mai 2011 17:38
À : mpich-discuss at mcs.anl.gov
Objet : Re: [mpich-discuss] MPICH2 1.3 speed down on Windows 2008 R2

Hi Darius,

The test machine is currently being used for other purposes but I will run those tests as soon as I can.
I wasn't able to run them in my previous tests.

I'll post the results as soon as I have them.

All the best,

Your LACAN-BARTLEY

-----Message d'origine-----
De : mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov] De la part de Darius Buntinas
Envoyé : mercredi 27 avril 2011 19:18
À : mpich-discuss at mcs.anl.gov
Objet : Re: [mpich-discuss] MPICH2 1.3 speed down on Windows 2008 R2


Can you check that there are no other processes running on your system?  Also, see what happens for 31, 30, 29, etc processes to find the point where performance drops suddenly.

-d


On Apr 27, 2011, at 11:49 AM, Youri LACAN-BARTLEY wrote:

> Hi Jayesh,
> 
> First of all, thank you for the swift reply.
> 
> To answer your question, I've been basically binding my mpi jobs with "-binding user:0,1,2,3,[...],29,30,31" since I'm using all available cores on the machine. It's this specific scenario that is bugging me.
> I have tried running jobs with let's say 8 cores and specifying different bindings (per core, per socket, shared L2 cache, etc) and the results were what I was expecting.
> What I can't explain is why I obtain such a massive speed down between running on 16 cores and 32 cores.
> I might be overlooking something but I really can't put my finger on it.
> 
> I've even played around with channels (sock, nemesis, etc) in the hope that this might shed some light on the issue but to no avail.
> 
> If you need more detailed information, don't hesitate to ask.
> 
> Thanks for the help,
> 
> Youri LACAN-BARTLEY
> 
> -----Message d'origine-----
> De : Jayesh Krishna [mailto:jayesh at mcs.anl.gov] 
> Envoyé : mercredi 27 avril 2011 17:30
> À : mpich-discuss at mcs.anl.gov
> Cc : Youri LACAN-BARTLEY
> Objet : Re: [mpich-discuss] MPICH2 1.3 speed down on Windows 2008 R2
> 
> Hi,
> What is the binding used ? Did you try different bindings to see if that changes the performance (Does not specifying user defined binding increase/decrease perf)?
> More details please.
> 
> -Jayesh
> 
> ----- Original Message -----
> From: "Youri LACAN-BARTLEY" <youri.lacan-bartley at transvalor.com>
> To: mpich-discuss at mcs.anl.gov
> Sent: Wednesday, April 27, 2011 4:04:25 AM
> Subject: [mpich-discuss] MPICH2 1.3 speed down on Windows 2008 R2
> 
> 
> 
> 
> 
> Hi, 
> 
> 
> 
> I'm currently benchmarking a 32 core machine with four Intel X7560 running Windows 2008 R2. 
> 
> I've noticed severe speed down when running on all 32 cores at once using user defined binding and the nemesis channel. 
> 
> Would anyone have any idea why this might be the case. 
> 
> I've run the exact same hardware with the same software running CentOS 5 and OpenMPI 1.4 and in that case the results show a regular speed up as expected. 
> 
> 
> 
> Am I hitting a specific MPICH2 issue or has this rather got something to do with Windows? 
> 
> 
> 
> Kind regards, 
> 
> 
> 
> Youri LACAN-BARTLEY 
> 
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list