[petsc-dev] CUDA + OMP make error
Mark Adams
mfadams at lbl.gov
Thu Apr 16 09:11:22 CDT 2020
On Thu, Apr 16, 2020 at 9:31 AM Matthew Knepley <knepley at gmail.com> wrote:
> On Thu, Apr 16, 2020 at 8:42 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> Yea, GPU assembly would be great. I was figuring OMP might be simpler.
>>
>> As far as the interface, I am flexible, the simplest way to do it would
>> be to take an array of element matrices and a DMPlex and call
>> to DMPlexMatSetClosure. You can see this code in
>> mark/feature-xgc-interface-rebase, at the bottom of
>> src/vec/vec/impls/seq/seqcuda/landau.cu.
>>
>> I was shy about putting a version of DMPlexMatSetClosure in CUDA, but
>> maybe that is easier, just plow through it and cut stuff that we don't
>> need. OMP broke because there are some temp arrays that Matt caches that
>> need to be "private" of dealt with in some way.
>>
>
> We should refactor so that all temp arrays are sized and constructed up
> front, and then the work is done in an internal function which is passed
> those arrays. I tried to do this, but might have crapped out here. Then
> you can just call the internal function directly with your arrays.
>
Just to be clear, we have to copy the code to a .cu file and declare
everything as a device method (__global__) right?
Or write a batched version of DMPlexMatSetClosure and when it gets down to
some kernel like MatSetValues, we, for example, move the Mat pointers to
the GPU, copy to the element matrices to the device, in a .cu file, launch
a kernel that calls a __global__ version of MatSetValues_SeqAIJ (with
mallocs stripped out), then Copy the Mat pointers back to the CPU. All this
copy stuff is usually done with a shadow copy of the object, but that is
complicated by cusparse matrices which, from what I can tell, have a
different, cusparse friendly, device Mat object.
>
> Matt
>
>
>> Coloring is not attractive to me because GPUs demand a lot of parallelism
>> and the code that this serial (velocity space) solver would be embedded in
>> a full 3D code that does not use a huge amount of MPI parallelism. For
>> instance if the app code was to use 6 (or 7 max in SUMMIT) cores per GPU
>> (or even 4x that with hardware threads) then *I could imagine* there
>> would be enough parallelism, with coloring, to fuse the element
>> construction and assembly, so assembling the element matrices right after
>> they are created. That would be great in terms of not storing all these
>> matrices and then assembling them all at once. The app that I am
>> targeting does not use that much MPI parallelism though. But we could
>> explore that, coloring, space and my mental model could be inaccurate.
>> (note, I did recently add 8x more parallelism to my code this week and got
>> a 25% speedup, using one whole GPU).
>>
>> Or if you have some sort of lower level synchronization that could allow
>> for fusing the the assembly with the element creation, then, by all means,
>> we can explore that.
>>
>> I'd be happy to work with you on this.
>>
>> Thanks,
>> Mark
>>
>> On Mon, Apr 13, 2020 at 7:08 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>> Probably matrix assembly on GPU is more important. Do you have an
>>> example for me to play to see what GPU interface we should have?
>>> --Junchao Zhang
>>>
>>> On Mon, Apr 13, 2020 at 5:44 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> I was looking into assembling matrices with threads. I have a coloring
>>>> to avoid conflicts.
>>>>
>>>> Turning off all the logging seems way overkill and for methods that can
>>>> get called in a thread then we could use PETSC_HAVE_THREADSAFTEY (thingy)
>>>> to protect logging functions. So one can still get timings for the whole
>>>> assembly process, just not for MatSetValues. Few people are going to do
>>>> this. I don't think it will be a time sink, and if it is we just revert
>>>> back to saying 'turn logging off'. I don't see a good argument for
>>>> insisting on turning off logging, it is pretty important, if we just say
>>>> that we are going to protect methods as needed.
>>>>
>>>> It is not a big deal, I am just exploring this idea. It is such a basic
>>>> concept in shared memory sparse linear algebra that it seems like a good
>>>> thing to be able to support and have in an example to say we can assemble
>>>> matrices in threads (not that it is a great idea). We have all the tools
>>>> (eg, coloring methods) that it is just a matter of protecting code a few
>>>> methods. I use DMPlex MatClosure instead of MatSetValues and this is where
>>>> I die now with non-thread safe code. We have an idea, from Jed, on how to
>>>> fix it.
>>>>
>>>> Anyway, thanks for your help, but I think we should hold off on doing
>>>> anything until we have some consensus that this would be a good idea to put
>>>> some effort into getting a thread safe PETSc that can support OMP matrix
>>>> assembly with a nice compact example.
>>>>
>>>> Thanks again,
>>>> Mark
>>>>
>>>> On Mon, Apr 13, 2020 at 5:44 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>> Mark,
>>>>> I saw you had "--with-threadsaftey --with-log=0". Do you really want
>>>>> to call petsc from multiple threads (in contrast to letting petsc call
>>>>> other libraries, e.g., BLAS, doing multithreading)? If not, you can
>>>>> drop --with-threadsaftey.
>>>>> I have https://gitlab.com/petsc/petsc/-/merge_requests/2714 that
>>>>> should fix your original compilation errors.
>>>>>
>>>>> --Junchao Zhang
>>>>>
>>>>> On Mon, Apr 13, 2020 at 2:07 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html
>>>>>>
>>>>>> and I see this on my Mac:
>>>>>>
>>>>>> 14:23 1 mark/feature-xgc-interface-rebase *= ~/Codes/petsc$
>>>>>> ../arch-macosx-gnu-O-omp.py
>>>>>>
>>>>>>
>>>>>>
>>>>>> ===============================================================================
>>>>>> Configuring PETSc to compile on your system
>>>>>>
>>>>>>
>>>>>> ===============================================================================
>>>>>> ===============================================================================
>>>>>>
>>>>>>
>>>>>> Warning: PETSC_ARCH from environment does not match
>>>>>> command-line or name of script.
>>>>>>
>>>>>> Warning: Using from command-line or
>>>>>> name of script: arch-macosx-gnu-O-omp, ignoring environment:
>>>>>> arch-macosx-gnu-g
>>>>>>
>>>>>> ===============================================================================
>>>>>>
>>>>>>
>>>>>> TESTING: configureLibraryOptions from
>>>>>> PETSc.options.libraryOptions(config/PETSc/options/libraryOptions.py:37)
>>>>>>
>>>>>>
>>>>>> *******************************************************************************
>>>>>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log
>>>>>> for details):
>>>>>>
>>>>>> -------------------------------------------------------------------------------
>>>>>> Must use --with-log=0 with --with-threadsafety
>>>>>>
>>>>>> *******************************************************************************
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 13, 2020 at 2:54 PM Junchao Zhang <
>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 13, 2020 at 12:06 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>>> BTW, I can build on SUMMIT with logging and OMP, apparently. I also
>>>>>>>> seem to be able to build with debugging. Both of which are not allowed
>>>>>>>> according the the docs. I am puzzled.
>>>>>>>>
>>>>>>> What are "the docs"?
>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Apr 13, 2020 at 12:05 PM Mark Adams <mfadams at lbl.gov>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I think the problem is that you have to turn off logging with
>>>>>>>>> openmp and the (newish) GPU timers did not protect their timers.
>>>>>>>>>
>>>>>>>>> I don't see a good reason to require logging be turned off with
>>>>>>>>> OMP. We could use PETSC_HAVE_THREADSAFETY to protect logs that we care
>>>>>>>>> about (eg, in MatSetValues) and as users discover more things that they
>>>>>>>>> want to call in an OMP thread block, then tell them to turn logging off and
>>>>>>>>> we will fix it when we can.
>>>>>>>>>
>>>>>>>>> Any thoughts on the idea of letting users keep logging with openmp?
>>>>>>>>>
>>>>>>>>> On Mon, Apr 13, 2020 at 11:40 AM Junchao Zhang <
>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Yes. Looks we need to include petsclog.h. Don't know why OMP
>>>>>>>>>> triggered the error.
>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 13, 2020 at 9:59 AM Mark Adams <mfadams at lbl.gov>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Should I do an MR to fix this?
>>>>>>>>>>>
>>>>>>>>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200416/7055ac67/attachment.html>
More information about the petsc-dev
mailing list