[petsc-users] Parallelization efficiency diagnose
Sun, Hui
hus003 at ucsd.edu
Fri Dec 5 00:45:06 CST 2014
Thank you Barry and Jed for your explanations. I think I understand it a little bit better now.
Hui
________________________________________
From: Barry Smith [bsmith at mcs.anl.gov]
Sent: Thursday, December 04, 2014 7:37 PM
To: Jed Brown
Cc: Sun, Hui; petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] Parallelization efficiency diagnose
I have a different MacBook Pro generation and get
$ make streams NPMAX=4
cd src/benchmarks/streams; /usr/bin/make --no-print-directory streams
/Users/barrysmith/Src/PETSc/arch-mpich/bin/mpicc -o MPIVersion.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 -I/Users/barrysmith/Src/PETSc/include -I/Users/barrysmith/Src/PETSc/arch-mpich/include -I/opt/X11/include -I/opt/local/include `pwd`/MPIVersion.c
Number of MPI processes 1 Processor names Barrys-MacBook-Pro-3.local
Triad: 10417.1979 Rate (MB/s)
Number of MPI processes 2 Processor names Barrys-MacBook-Pro-3.local Barrys-MacBook-Pro-3.local
Triad: 14673.8802 Rate (MB/s)
Number of MPI processes 3 Processor names Barrys-MacBook-Pro-3.local Barrys-MacBook-Pro-3.local Barrys-MacBook-Pro-3.local
Triad: 14998.7656 Rate (MB/s)
Number of MPI processes 4 Processor names Barrys-MacBook-Pro-3.local Barrys-MacBook-Pro-3.local Barrys-MacBook-Pro-3.local Barrys-MacBook-Pro-3.local
Triad: 15001.2941 Rate (MB/s)
------------------------------------------------
np speedup
1 1.0
2 1.41
3 1.44
4 1.44
Is mine a better machine since I get a speedup of 1.44 while you get no speed up?
No, the total memory bandwidth each of our machines can sustain is about Triad: 15001.2941 Rate (MB/s). My machine, which I am guessing is a little older than yours cannot utilize all that memory bandwidth with a single core. Triad: 10417.1979 Rate (MB/s) On your machine a single core can utilize all of the memory bandwidth, hence when you use the second core you get no speedup. I get speed up because the second core utilizes the extra memory bandwidth the first core did not utilize. On the other hand your machine will run PETSc programs a good bit faster on one core than mine. So parallelism will not give you any real benefit on your laptop, on mine it does, but in the end code will run slightly faster on your machine so your machine is better than mine.
Barry
> On Dec 4, 2014, at 7:51 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> "Sun, Hui" <hus003 at ucsd.edu> writes:
>
>> Thank you Jed. I don't know how to use "lstopo" from the hwloc,
>
> A search engine will solve that problem.
>
>> but I looked up the cores and memory from the hardware overview from
>> my MAC, it has
>>
>> Number of Processors: 1
>> Total Number of Cores: 2
>>
>> Besides, as you said, there are 4 logical cores due to hyperthreading. However, I'm still expecting to get speed doubled because I have 2 real cores. So where is the restriction then?
>
> Memory bandwidth, as stated in my email and the page I linked.
More information about the petsc-users
mailing list