since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

Sat Dec 5 16:25:58 CST 2009

On Sat, 5 Dec 2009 16:02:38 -0600, Matthew Knepley <knepley at gmail.com> wrote:
> I need to understand better. You are asking about the case where we have
> many GPUs and one CPU? If its always one or two GPUs per CPU I do not
> see the problem.

Barry initially proposed one Python thread per node, then distributing
the kernels over many CPU cores on that node, or to one-or-more GPUs.
With some abuse of terminology, lets call them all worker threads,
perhaps dozens if running on multicore CPUs, or hundreds/thousands when
on a GPU.  The physics, such as FEM integration, has to be done by those
worker threads.  But unless every thread is it's own subdomain
(i.e. Block Jacobi/ASM with very small subdomains), we still need to
assemble a small number of matrices per node.  So we would need a
lock-free concurrent MatSetValues, otherwise we'll only scale to a few
worker threads before everything is blocked on MatSetValues.

> Hmm, still not quite getting this problem. We need concurrency on the
> GPU, but why would we need it on the CPU?

Only if the we were doing real work on the many CPU cores per node.

> On the GPU, triangular solve will be just as crappy as it currently
> is, but will look even worse due to large number of cores.

It could be worse because a single GPU thread is likely slower than a
CPU core.

> It is not the only smoother. For instance, polynomial smoothers would
> be more concurrent.

Yup.

> > I have trouble finding decent preconditioning algorithms suitable for
> > the fine granularity of GPUs.  Matt thinks we can get rid of all the
> > crappy sparse matrix kernels and precondition everything with FMM.
> >
> 
> That is definitely my view, or at least my goal. And I would say this,
> if we are just starting out on these things, I think it makes sense to
> do the home runs first. If we just try and reproduce things, people
> might say "That is nice, but I can already do that pretty well".

Agreed, but it's also important to have something good to offer people
who aren't ready to throw out everything they know and design a new
algorithm based on a radically different approach that may or may not be
any good for their physics.

Jed