[petsc-dev] CUDA + OMP make error

Mark Adams mfadams at lbl.gov
Thu Apr 16 07:42:11 CDT 2020


Yea, GPU assembly would be great. I was figuring OMP might be simpler.

As far as the interface, I am flexible, the simplest way to do it would be
to take an array of element matrices and a DMPlex and call
to DMPlexMatSetClosure. You can see this code in
mark/feature-xgc-interface-rebase, at the bottom of
src/vec/vec/impls/seq/seqcuda/landau.cu.

I was shy about putting a version of DMPlexMatSetClosure in CUDA, but maybe
that is easier, just plow through it and cut stuff that we don't need. OMP
broke because there are some temp arrays that Matt caches that need to be
"private" of dealt with in some way.

Coloring is not attractive to me because GPUs demand a lot of parallelism
and the code that this serial (velocity space) solver would be embedded in
a full 3D code that does not use a huge amount of MPI parallelism. For
instance if the app code was to use 6 (or 7 max in SUMMIT) cores per GPU
(or even 4x that with hardware threads) then *I could imagine* there would
be enough parallelism, with coloring, to fuse the element construction and
assembly, so assembling the element matrices right after they are created.
That would be great in terms of not storing all these matrices and then
assembling them all at once. The app that I am targeting does not use that
much MPI parallelism though. But we could explore that, coloring, space and
my mental model could be inaccurate. (note, I did recently add 8x more
parallelism to my code this week and got a 25% speedup, using one whole
GPU).

Or if you have some sort of lower level synchronization that could allow
for fusing the the assembly with the element creation, then, by all means,
we can explore that.

I'd be happy to work with you on this.

Thanks,
Mark

On Mon, Apr 13, 2020 at 7:08 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Probably matrix assembly on GPU is more important. Do you have an example
> for me to play to see what GPU interface we should have?
> --Junchao Zhang
>
> On Mon, Apr 13, 2020 at 5:44 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> I was looking into assembling matrices with threads. I have a coloring to
>> avoid conflicts.
>>
>> Turning off all the logging seems way overkill and for methods that can
>> get called in a thread then we could use PETSC_HAVE_THREADSAFTEY (thingy)
>> to protect logging functions. So one can still get timings for the whole
>> assembly process, just not for MatSetValues. Few people are going to do
>> this. I don't think it will be a time sink, and if it is we just revert
>> back to saying 'turn logging off'. I don't see a good argument for
>> insisting on turning off logging, it is pretty important, if we just say
>> that we are going to protect methods as needed.
>>
>> It is not a big deal, I am just exploring this idea. It is such a basic
>> concept in shared memory sparse linear algebra that it seems like a good
>> thing to be able to support and have in an example to say we can assemble
>> matrices in threads (not that it is a great idea). We have all the tools
>> (eg, coloring methods) that it is just a matter of protecting code a few
>> methods. I use DMPlex MatClosure instead of MatSetValues and this is where
>> I die now with non-thread safe code. We have an idea, from Jed, on how to
>> fix it.
>>
>> Anyway, thanks for your help, but I think we should hold off on doing
>> anything until we have some consensus that this would be a good idea to put
>> some effort into getting a thread safe PETSc that can support OMP matrix
>> assembly with a nice compact example.
>>
>> Thanks again,
>> Mark
>>
>> On Mon, Apr 13, 2020 at 5:44 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>> Mark,
>>>  I saw you had "--with-threadsaftey --with-log=0".  Do you really want
>>> to call petsc from multiple threads (in contrast to letting petsc call
>>> other libraries, e.g., BLAS, doing multithreading)?  If not, you can
>>> drop --with-threadsaftey.
>>>  I have https://gitlab.com/petsc/petsc/-/merge_requests/2714 that
>>> should fix your original compilation errors.
>>>
>>> --Junchao Zhang
>>>
>>> On Mon, Apr 13, 2020 at 2:07 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html
>>>>
>>>> and I see this on my Mac:
>>>>
>>>> 14:23 1 mark/feature-xgc-interface-rebase *= ~/Codes/petsc$
>>>> ../arch-macosx-gnu-O-omp.py
>>>>
>>>>
>>>>
>>>> ===============================================================================
>>>>              Configuring PETSc to compile on your system
>>>>
>>>>
>>>> ===============================================================================
>>>> ===============================================================================
>>>>
>>>>
>>>>                Warning: PETSC_ARCH from environment does not match
>>>> command-line or name of script.
>>>>
>>>>                                      Warning: Using from command-line or
>>>> name of script: arch-macosx-gnu-O-omp, ignoring environment:
>>>> arch-macosx-gnu-g
>>>>
>>>> ===============================================================================
>>>>
>>>>
>>>>          TESTING: configureLibraryOptions from
>>>> PETSc.options.libraryOptions(config/PETSc/options/libraryOptions.py:37)
>>>>
>>>>
>>>>  *******************************************************************************
>>>>          UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log
>>>> for details):
>>>>
>>>> -------------------------------------------------------------------------------
>>>> Must use --with-log=0 with --with-threadsafety
>>>>
>>>> *******************************************************************************
>>>>
>>>>
>>>> On Mon, Apr 13, 2020 at 2:54 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 13, 2020 at 12:06 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> BTW, I can build on SUMMIT with logging and OMP, apparently. I also
>>>>>> seem to be able to build with debugging. Both of which are not allowed
>>>>>> according the the docs. I am puzzled.
>>>>>>
>>>>>  What are "the docs"?
>>>>>
>>>>>>
>>>>>> On Mon, Apr 13, 2020 at 12:05 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>
>>>>>>> I think the problem is that you have to turn off logging with openmp
>>>>>>> and the (newish) GPU timers did not protect their timers.
>>>>>>>
>>>>>>> I don't see a good reason to require logging be turned off with OMP.
>>>>>>> We could use PETSC_HAVE_THREADSAFETY to protect logs that we care about
>>>>>>> (eg, in MatSetValues) and as users discover more things that they want to
>>>>>>> call in an OMP thread block, then tell them to turn logging off and we will
>>>>>>> fix it when we can.
>>>>>>>
>>>>>>> Any thoughts on the idea of letting users keep logging with openmp?
>>>>>>>
>>>>>>> On Mon, Apr 13, 2020 at 11:40 AM Junchao Zhang <
>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>
>>>>>>>> Yes. Looks we need to include petsclog.h. Don't know why OMP
>>>>>>>> triggered the error.
>>>>>>>> --Junchao Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Apr 13, 2020 at 9:59 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>>
>>>>>>>>> Should I do an MR to fix this?
>>>>>>>>>
>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200416/8757a659/attachment.html>


More information about the petsc-dev mailing list