[petsc-users] RE : RE : How to get bandwidth peak out of PETSc log ?

HOUSSEN Franck Franck.Houssen at cea.fr
Fri Jun 21 07:10:05 CDT 2013


It seems performance are always the same whatever the number of threads.
I ran all these tests on my laptop (which is not a "great supercomputer" !... As Matt noticed, and as I knew it !)

FH

~/Bandwidth>export OMP_NUM_THREADS=1
~/Bandwidth>./stream_c.exe > stream_1thread.log
~/Bandwidth>export OMP_NUM_THREADS=2
~/Bandwidth>./stream_c.exe > stream_2threads.log
~/Bandwidth>export OMP_NUM_THREADS=4
~/Bandwidth>./stream_c.exe > stream_4threads.log
~/Bandwidth>grep Triad *log
stream_1thread.log:Triad:          13057.6     0.018427     0.018380     0.018539
stream_2threads.log:Triad:          13044.9     0.018496     0.018398     0.018936
stream_4threads.log:Triad:          13049.1     0.018449     0.018392     0.018561

________________________________________
De : Jed Brown [five9a2 at gmail.com] de la part de Jed Brown [jedbrown at mcs.anl.gov]
Date d'envoi : vendredi 21 juin 2013 12:41
À : HOUSSEN Franck; Barry Smith
Cc: petsc-users at mcs.anl.gov
Objet : Re: [petsc-users] RE :  How to get bandwidth peak out of PETSc log ?

HOUSSEN Franck <Franck.Houssen at cea.fr> writes:

> MatMult            13216 1.0 5.4614e+01 1.0 5.52e+10 1.0 0.0e+00 0.0e+00 0.0e+00 21 56  0  0  0  21 56  0  0  0  1010

Much of your time is spent here, which is bandwidth limited.  It needs 6
bytes per flop (plus a little, if the vector is perfectly reused) , so
this number is about 6 GB/s.

> MatMult            13550 1.0 3.8204e+01 1.0 2.85e+10 1.0 2.7e+04 3.5e+04 0.0e+00 23 55 79 93  0  23 55 79 93  0  1494

Here with two processes, you have about 9 GB/s.

Was your STREAM test (getting 11 GB/s) using multiple threads/processes?
Can you send STREAM results for one and for two threads?  50% of STREAM
is not very good (though it's actually the best you can do on some funny
architectures), 70-85% is what we expect.

If you're getting a low fraction of peak in MatMult, try reordering your
matrix to have lower bandwidth.  You can use MatGetOrdering with RCM for
this.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stream_1thread.log
Type: text/x-log
Size: 1639 bytes
Desc: stream_1thread.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130621/cae9ae04/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stream_2threads.log
Type: text/x-log
Size: 1639 bytes
Desc: stream_2threads.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130621/cae9ae04/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stream_4threads.log
Type: text/x-log
Size: 1639 bytes
Desc: stream_4threads.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130621/cae9ae04/attachment-0002.bin>


More information about the petsc-users mailing list