[petsc-dev] new work on to start on threading and PETSc

Sun May 25 19:39:23 CDT 2014

"Eller, Paul R" <eller3 at illinois.edu> writes:

> Thanks for the suggestions.  I have been reading over the PETSc
> threadcomm code and trying to understand what it is doing and I have a
> few questions.
>
> First, I was wondering if you had any suggestions on a
> debugging/profiler tool to use to help step through the parallel PETSc
> code and see what is happening.  I had used Totalview at a previous
> job but I haven't found a good replacement for that since going back
> to school last fall.

To be honest, when I need a debugger, I almost always use GDB.

> Regarding the pattern you suggested implementing, can you clarify how
> that would work in PETSc?  In particular, is the idea to allow the
> user to create threads, use those threads within PETSc, then return
> the threads to the user for future use?  

Exactly this.  I want the user to be very specific about what resources
are granted to PETSc.  Communicators provide this between processes, but
threads are more mobile and more nuanced so I think explicitly handing
threads over to PETSc is the right way to go.  A more automatic default
is okay as far as I'm concerned.

> I think I understand in general how to create and use a threadpool
> using pthreads (although I haven't worked with pthreads much in the
> past), but I am unsure how to create and use a threadpool with openmp
> and then have that threads stay active once the pragma omp parallel
> has completed.  When I have used openmp in the past, I generally tried
> to place the parallel pragmas around the largest chunk of code I
> could, such as having each thread work on separate independent
> iterations of a large time-consuming loop.  Or create multiple threads
> prior to an iterative loop, and have each thread work on different
> parts of the arrays each iteration, using synchronization at the end
> of each iteration to make sure all threads are at the same place.  I
> guess I am trying to figure out how the PETSc code I need to develop
> compares to code I have developed in the past with openmp.

I can see OpenMP being handled in two different ways.  The OpenMP
runtime usually manages its own thread pool.  When an "omp parallel"
block is entered, the threads in the thread pool are assigned to work on
the block.  In the existing threadcomm implementation, each kernel
launch has its own omp parallel region.  An alternative implementation
would be for the user to create a parallel region (at coarser
granularity) and hand off (some or all of) the threads to PETSc.

I think it's useful to have both, though the former ("omp parallel" in
each kernel launch) is a more 1-1 mapping with the most common use of
Open MP.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140525/13ee4dc9/attachment.sig>