[petsc-dev] Status of pthreads and OpenMP support

Shri abhyshr at mcs.anl.gov
Thu Oct 25 20:24:17 CDT 2012


John, Dave,
   As you've found out through your experimentation with PETSc's threading interface, there are still a lot of improvements that we do need to do before the threading support is stable. Your input certainly helps us in improving the functionality. I'll look at the logs you've sent, now that I have some time, and get back to you. 

Thanks,
Shri

On Oct 25, 2012, at 8:05 PM, John Fettig wrote:

> Hi Dave,
> 
> On Thu, Oct 25, 2012 at 5:21 PM, Nystrom, William D <wdn at lanl.gov> wrote:
>> Hi John,
>> 
>> I have also been trying to test out the pthreads and openmp support in petsc-dev.  I've attached
>> a gzipped tarball of some of my test results.  I've been running the ex2.c example test problem
>> located in the petsc-dev/src/ksp/ksp/examples/tutorials directory.  I've been testing on a machine
>> where each node has two 8 core Sandybridge xeons.  I've been indentifying and reporting some
>> issues.  For instance, if I use one of my builds of petsc-dev that builds several external packages,
>> then I have a really slow run when using "-threadcomm_nthreads 1 -threadcomm_type pthread".
>> However, it seems to run fine when setting "-threadcomm_nthreads" to values from 2 to 16.  If I
>> build a version of petsc-dev that does not use any external packages, this slowness problem seems
>> to go away.  However, it comes back if I then add in the use of MKL in my petsc-dev build.  So
>> it looks like there might be some interaction problem when building petsc-dev to use MKL for
>> blas/lapack.
> 
> Thanks for the hint. When I built without the external libraries or
> MKL, I get ~3x speedup on 6 threads with one MPI process which is a
> huge improvement.  Perhaps this caveat should be added to the
> documentation for using pthreads.
> 
> I'm using Platform MPI which has an option to set CPU affinity for the
> MPI processes, and I used that.  I'll have to try setting thread
> affinity as well.
> 
> 
>> I also am only seeing a speedup of 5x or so when comparing 16 threads to one thread.  Not
>> sure why.  On this same machine, I was seeing a speedup of 14x or more using openmp and
>> 16 threads on a simple 2d explicit hydro code.  But maybe it is just the limitations of memory
>> bandwidth as you mentioned.
> 
> What I see in your results is about 7x speedup by using 16 threads.  I
> think you should get better results by running 8 threads with 2
> processes because the memory can be allocated on separate memory
> controllers, and the memory will be physically closer to the cores.
> I'm surprised that you get worse results.
> 
> It doesn't surprise me that an explicit code gets much better speedup.
> 
>> I also get about the same performance results on the ex2 problem when running it with just
>> mpi alone i.e. with 16 mpi processes.
>> 
>> So from my perspective, the new pthreads/openmp support is looking pretty good assuming
>> the issue with the MKL/external packages interaction can be fixed.
>> 
>> I was just using jacobi preconditioning for ex2.  I'm wondering if there are any other preconditioners
>> that might be multi-threaded.  Or maybe a polynomial preconditioner could work well for the
>> new pthreads/openmp support.
> 
> GAMG with SOR smoothing seems like a prime candidate for threading.  I
> wonder if anybody has worked on this yet?
> 
> Best regards,
> John




More information about the petsc-dev mailing list