[petsc-dev] OpenMP/Vec

Wed Mar 7 10:45:50 CST 2012

On 07/03/2012 15:26, Jed Brown wrote:
> On Wed, Mar 7, 2012 at 09:12, Lawrence Mitchell
> <lawrence.mitchell at ed.ac.uk <mailto:lawrence.mitchell at ed.ac.uk>> wrote:
> 
> Moving forward, I've done a bunch of this macroisation, which makes the
> code more PETSc-like.
> 
> As you point out it's probably unwise to call omp_get_num_threads() on
> every loop iteration, so I went for the magic names in the end.  A
> typical simple loop now looks like:
> 
> VecOMPParallelBegin(xin, private(i) shared(x)); for (i=__start;
> i<__end;i++) x[i] = work(x[i]); VecOMPParallelEnd();
> 
> In the --without-openmp case, this transforms into:
> 
> do { PetscInt __start = 0, __end = xin->map->n; for ...; } while (0);
> 
> Rather than adding an nthreads slot to vectors, I put it in the generic 
> PetscObject struct -- we're threading matrices too.
> 
> 
> We are going with PetscLayout for distribution information with the
> pthreads implementations.

Ah, that's a better approach.  So probably the recently created
seqpthread/tmap.c wants moving.

> We have also discussed making "thread communicator" information (how many
> threads and their affinities) an attribute of an MPI_Comm. I'd be
> interested to hear your opinions on that.

I'm not sure quite what that buys you, except in the case you describe below
where two third-party libraries have different threading models.  In that
case stashing the threading information on one communicator per library
makes data management slightly easier, I guess.  You'd probably still need
to manage transferring the data between layouts by hand though.

> All, should we consider moving all the "kernels" into their own
> functions and just doing parallel region dispatch differently for OpenMP
> and pthreads? I think that is the only reasonable way to have just one
> version of the code. If we committed to always using separate
> side-effect-free kernels, we might be able to reuse many of them with TBB
> and OpenCL as well.

It would probably be a good idea, even if only to cut down on typo-bugs.  I
imagine, to take as an example VecMax_Seq it would look something like this:

VecMax_SeqKernel(PetscScalar *xx, PetscInt start, PetscInt end, PetscInt
*idx, PetscReal *z)
{
  PetscInt i;
  PetscInt j;
  PetscReal max;
  max = PetscRealPart(xx[start]);
  j = start;
  for ( i=start+1; i < end; i++ ) {
    if (PetscRealPart(xx[i]) > max) {
      max = PetscRealPart(xx[i]); j = i;
    }
  }
  if (idx) *idx = j;
}

And then you have three (?) wrapper implementations:

VecMax_Seq(Vec xin, PetscInt *idx, PetscReal *z)
{
  PetscScalar *xx;
  PetscFunctionBegin;
  VecGetArrayRead(xin, &xx);
  VecMax_SeqKernel(xx, 0, xin->map->n, idx, z);
  VecRestoreArrayRead(xin, &xx);
  PetscFunctionReturn(0);
}

VecMax_SeqOMP(Vec xin, PetscInt *idx, PetscReal *z)
{
  PetscScalar *xx;
  PetscInt    *jt;
  PetscReal   *zt;
  PetscFunctionBegin;
  // you could just call VecMax_Seq here if nthread == 1
  VecGetArrayRead(xin, &xx);
  PetscMalloc(xin->map->tmap->nthread * sizeof(PetscReal), &zt);
  PetscMalloc(xin->map->tmap->nthread * sizeof(PetscInt), &jt);
  VecOMPParallelBegin(xin, shared(xx, jt, zt));
  PetscInt thread = PetscGetThreadNum();
  zt[thread] = PETSC_MIN_REAL;
  jt[thread] = -1;
  VecMax_SeqKernel(xx + __start, __end, jt + thread, zt + thread);
  VecOMPParallelEnd();
  // do reduction to find max value/index
  ...;
  PetscFunctionReturn(0);
}

And the Pthread version looks similar but has to wrap the kernel invocation
in another level as is currently done, I guess.

> Is there ever a case where we want to have _both_ OpenMP and Pthreads 
> objects in the same application? Maybe, e.g. when servicing two
> multiphysics applications, one of which chooses OpenMP and one of which
> chooses Pthreads (perhaps on different communicators so we don't have to
> fight with over-subscription).

See above.  I think that currently the OMP and pthread implementations would
play fine with one-another, modulo over-subscription of threads.  I think
addressing the issue should probably be motivated by a concrete use case.

> 
> 
> I don't know how to write the correct tests for configure to probe for
> the existance of _Pragma, so that's still missing.
> 
> 
> I'll do it.

Thanks.

Lawrence

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.