[petsc-dev] new work on to start on threading and PETSc

Sun May 25 20:29:52 CDT 2014

  Jed,

   So the user code needs to explicitly call first the pragma and then do a series of PETSc calls? Essentially they could often have PetscInitialize() and then have this code fragment that contains essentially all of main inside the else case?

   Thanks

    Barry

This is, sadly a lot like the HMPI stuff that I recently took out of PETSc :-) Where the other MPI “threads” were hidden in the PetscInitialize() and then used by the “main” program after the PetscInitialize.

On May 25, 2014, at 8:23 PM, Jed Brown <jed at jedbrown.org> wrote:

> Jed Brown <jed at jedbrown.org> writes:
>> I can see OpenMP being handled in two different ways.  The OpenMP
>> runtime usually manages its own thread pool.  When an "omp parallel"
>> block is entered, the threads in the thread pool are assigned to work on
>> the block.  In the existing threadcomm implementation, each kernel
>> launch has its own omp parallel region.  An alternative implementation
>> would be for the user to create a parallel region (at coarser
>> granularity) and hand off (some or all of) the threads to PETSc.
> 
> This may need clarification.  In the existing implementation, we have:
> 
>  PetscErrorCode PetscThreadCommRunKernel_OpenMP(PetscThreadComm tcomm,PetscThreadCommJobCtx job)
>  {
>    PetscInt        trank=0;
> 
>    PetscFunctionBegin;
>  #pragma omp parallel num_threads(tcomm->nworkThreads) shared(job) private(trank)
>    {
>      trank = omp_get_thread_num();
>      PetscRunKernel(trank,job->nargs,job);
>      job->job_status[trank] = THREAD_JOB_COMPLETED;
>    }
>    PetscFunctionReturn(0);
>  }
> 
> 
> so PETSc uses the OpenMP runtime in a very conventional way.  In the
> alternative implementation, which I'll call the "pthread model", the
> user creates the threads up-front and then does lots of things with
> them.  You can use the "pthread model" with threads obtained from
> OpenMP.  It would look something like this:
> 
>   #pragma omp parallel
>     {
>       int trank = omp_get_thread_num();
>       if (trank) {
>         // This thread is joining the pool as a worker.
>         // This function will not return until the thread is released from the pool.
>         PetscThreadPoolJoin(pool);
>       } else {
>         // Make sure all worker threads have joined.
>         PetscThreadPoolWait(pool, omp_get_num_threads());
> 
>         // Use the workers one or more times
>         PetscThreadCommRunKernel(...);
>         KSPSolve(...);             // contains many calls to PetscThreadCommRunKernel()
> 
>         PetscThreadPoolReturn(pool); // release the worker threads waiting in PetscThreadPoolJoin
>       }
>     }
> 
> 
> Of course the worker threads could have been created many other ways.
> PETSc could create them itself and keep them alive until PetscFinalize,
> the user could create them using pthread_create, C11 thrd_create, etc.
> 
> 
> Note that error handling with OpenMP is messy because you can't
> early-return from within an omp parallel block (and because error
> conditions with threads are fundamentally hard).