[petsc-dev] OpenMP/Vec
Lawrence Mitchell
lawrence.mitchell at ed.ac.uk
Wed Mar 7 10:45:50 CST 2012
On 07/03/2012 15:26, Jed Brown wrote:
> On Wed, Mar 7, 2012 at 09:12, Lawrence Mitchell
> <lawrence.mitchell at ed.ac.uk <mailto:lawrence.mitchell at ed.ac.uk>> wrote:
>
> Moving forward, I've done a bunch of this macroisation, which makes the
> code more PETSc-like.
>
> As you point out it's probably unwise to call omp_get_num_threads() on
> every loop iteration, so I went for the magic names in the end. A
> typical simple loop now looks like:
>
> VecOMPParallelBegin(xin, private(i) shared(x)); for (i=__start;
> i<__end;i++) x[i] = work(x[i]); VecOMPParallelEnd();
>
> In the --without-openmp case, this transforms into:
>
> do { PetscInt __start = 0, __end = xin->map->n; for ...; } while (0);
>
> Rather than adding an nthreads slot to vectors, I put it in the generic
> PetscObject struct -- we're threading matrices too.
>
>
> We are going with PetscLayout for distribution information with the
> pthreads implementations.
Ah, that's a better approach. So probably the recently created
seqpthread/tmap.c wants moving.
> We have also discussed making "thread communicator" information (how many
> threads and their affinities) an attribute of an MPI_Comm. I'd be
> interested to hear your opinions on that.
I'm not sure quite what that buys you, except in the case you describe below
where two third-party libraries have different threading models. In that
case stashing the threading information on one communicator per library
makes data management slightly easier, I guess. You'd probably still need
to manage transferring the data between layouts by hand though.
> All, should we consider moving all the "kernels" into their own
> functions and just doing parallel region dispatch differently for OpenMP
> and pthreads? I think that is the only reasonable way to have just one
> version of the code. If we committed to always using separate
> side-effect-free kernels, we might be able to reuse many of them with TBB
> and OpenCL as well.
It would probably be a good idea, even if only to cut down on typo-bugs. I
imagine, to take as an example VecMax_Seq it would look something like this:
VecMax_SeqKernel(PetscScalar *xx, PetscInt start, PetscInt end, PetscInt
*idx, PetscReal *z)
{
PetscInt i;
PetscInt j;
PetscReal max;
max = PetscRealPart(xx[start]);
j = start;
for ( i=start+1; i < end; i++ ) {
if (PetscRealPart(xx[i]) > max) {
max = PetscRealPart(xx[i]); j = i;
}
}
if (idx) *idx = j;
}
And then you have three (?) wrapper implementations:
VecMax_Seq(Vec xin, PetscInt *idx, PetscReal *z)
{
PetscScalar *xx;
PetscFunctionBegin;
VecGetArrayRead(xin, &xx);
VecMax_SeqKernel(xx, 0, xin->map->n, idx, z);
VecRestoreArrayRead(xin, &xx);
PetscFunctionReturn(0);
}
VecMax_SeqOMP(Vec xin, PetscInt *idx, PetscReal *z)
{
PetscScalar *xx;
PetscInt *jt;
PetscReal *zt;
PetscFunctionBegin;
// you could just call VecMax_Seq here if nthread == 1
VecGetArrayRead(xin, &xx);
PetscMalloc(xin->map->tmap->nthread * sizeof(PetscReal), &zt);
PetscMalloc(xin->map->tmap->nthread * sizeof(PetscInt), &jt);
VecOMPParallelBegin(xin, shared(xx, jt, zt));
PetscInt thread = PetscGetThreadNum();
zt[thread] = PETSC_MIN_REAL;
jt[thread] = -1;
VecMax_SeqKernel(xx + __start, __end, jt + thread, zt + thread);
VecOMPParallelEnd();
// do reduction to find max value/index
...;
PetscFunctionReturn(0);
}
And the Pthread version looks similar but has to wrap the kernel invocation
in another level as is currently done, I guess.
> Is there ever a case where we want to have _both_ OpenMP and Pthreads
> objects in the same application? Maybe, e.g. when servicing two
> multiphysics applications, one of which chooses OpenMP and one of which
> chooses Pthreads (perhaps on different communicators so we don't have to
> fight with over-subscription).
See above. I think that currently the OMP and pthread implementations would
play fine with one-another, modulo over-subscription of threads. I think
addressing the issue should probably be motivated by a concrete use case.
>
>
> I don't know how to write the correct tests for configure to probe for
> the existance of _Pragma, so that's still missing.
>
>
> I'll do it.
Thanks.
Lawrence
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the petsc-dev
mailing list