[petsc-dev] PETSc OpenMP benchmarking

Gerard Gorman g.gorman at imperial.ac.uk
Sat Mar 17 03:33:16 CDT 2012


Hi

We have profiled on Cray compute nodes with two 16-core AMD Opteron
2.3GHz Interlagos processors, using the same matrix but this time with
-ksp_type cg and -pc_type jacobi. Attached are the logs with the 32 MPI
processes and the 32 OpenMP threads tests.

Most of the time is in stage 2. As seen previously, MatMult is
performing well, but the overall performance in KSPSolve drops for
OpenMP. I have attached a plot of the (hybrid mpi+openmp time)/(pure
openmp) where all 32 cores are always used. What the graph shows is that
we are always getting better performance in MatMult for pure OpenMP but
there is something additional in KSPSolve that degrades the OpenMP
performance.

So far we have profiled with oprofile measuring the event
CPU_CLK_UNHALTED, but this has not shown up the bottleneck. So more
digging is required.

Any suggestions/comments gratefully received. 
 
Cheers
Gerard

Gerard Gorman emailed the following on 14/03/12 16:59:
> Hi
>
> Since Vec and most of Mat is now threaded we have started to do more
> detailed profiling. I'm posting these initial tasters from a two socket
> Intel Core Bloomfield processor system (i.e. 8 cores) to stimulate
> discussion.
>
> The matrix comes from a 3D lock exchange problem discretised using a
> continuous Galerkin finite element formulation and has about 450k
> degrees of freedom.
>
> I have configured the simulator (Fluidity -
> http://amcg.ese.ic.ac.uk/Fluidity) to dump out PETSc matrices at each
> solve. These individual matrices are then solved using
> petsc-dev/src/ksp/ksp/examples/tests/ex6 compiled with GCC 4.6.3
> --with-debugging=0.
>
> The PETSc options are:
> -get_total_flops -pc_type gamg -ksp_type cg -ksp_rtol 1.0e-6 -log_summary
>
> The 3 log files attached are for OMP_NUM_THREADS=1, OMP_NUM_THREADS=8
> and non-threaded MPI run with 8 processes for comparison.
>
> So the reason this benchmark is interesting is because it is pressure
> which is really stiff , and it uses GAMG as a blackbox.
>
> Using xxdiff to compare the logs I think the interesting points are:
> - Overall OpenMP compares favourably with MPI.
> - OpenMP converged in 2 less iterations than with MPI. Earlier I was
> expecting fewer iterations simply because of the absence of partitions
> to diminish the effectiveness of coarsening. I have not been following
> Mark's GAMG development but it looks repartitioning is being used to get
> around that issue (?). However, the biggest plus is because Chebychev is
> used as a smoother (rather than something difficult to parallelise like
> SSOR), GAMG appears to scale pretty well when threaded with OpenMP.
> - Important operations like MatMult etc perform well.
> - From the summary, "mystage 1" is the main section where OMP appears to
> need more work. We suffer from operations such as  MatPtAP and
> MatTrnMatMult for example which we have not got around to looking at yet.
>
> As this is a relatively small and boring UMA machine I have not bothered
> with scaling curves. We are setting the same benchmark up on 32-core
> Interlagos compute nodes at the moment - hopefully these will be ready
> by tomorrow.
>
> Comments welcome.
>
> Cheers
> Gerard
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: pressure-matrix-cg-32mpi.dat
Type: application/x-ns-proxy-autoconfig
Size: 8592 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120317/a028a704/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pressure-matrix-cg-jacobi-1mpi-32omp.dat
Type: application/x-ns-proxy-autoconfig
Size: 6267 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120317/a028a704/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pressure-matrix-cg-hybrid_speedup.pdf
Type: application/pdf
Size: 16475 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120317/a028a704/attachment.pdf>


More information about the petsc-dev mailing list