[petsc-dev] Memory problem with OpenMP and Fieldsplit sub solvers

Mark Adams mfadams at lbl.gov
Fri Jan 22 17:49:12 CST 2021


Satish, can you tell me how I might configure SuperLU_dist w/o this _OPENMP
?
Thanks,
Mark

On Thu, Jan 21, 2021 at 11:57 PM Xiaoye S. Li <xsli at lbl.gov> wrote:

> All the OpenMP calls are surrounded by
>
> #ifdef _OPENMP
> ...
> #endif
>
> You can disable openmp during Cmake installation, with the following:
>     -Denable_openmp=FALSE
> (the default is true)
>
> (I think Satish knows how to do this with PETSc installation)
>
> -------
> The reason to use mixed MPI & OpenMP is mainly less memory consumption,
> compared to pure MPI.  Timewise probably it is just slightly faster. (I
> think that's the case with many codes.)
>
>
> Sherry
>
> On Thu, Jan 21, 2021 at 7:20 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>>
>>
>> On Thu, Jan 21, 2021 at 10:16 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>
>>> On Jan 21, 2021, at 9:11 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> I have tried it and it hangs, but that is expected. This is not
>>> something she has prepared for.
>>>
>>> I am working with Sherry on it.
>>>
>>> And she is fine with just one thread and suggests it if she is in a
>>> thread.
>>>
>>> Now that I think about it, I don't understand why she needs OpenMP if
>>> she can live with OMP_NUM_THREADS=1.
>>>
>>>
>>>  It is very possible it was just a coding decision by one of her
>>> students and with a few ifdef in her code should would not need the OpenMP
>>> but I don't have the time or energy to check her code and design decision.
>>>
>>
>> Oh yea there OMP calls like omp_num_threads() that need something. There
>> is probably a omp1.h file somewhere in the world like our serial MPI.
>>
>>
>>>
>>>   Barry
>>>
>>>
>>> Mark
>>>
>>>
>>>
>>> On Thu, Jan 21, 2021 at 9:30 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>>
>>>>
>>>> On Jan 21, 2021, at 5:37 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>> This did not work. I verified that MPI_Init_thread is being called
>>>> correctly and that MPI returns that it supports this highest level of
>>>> thread safety.
>>>>
>>>> I am going to ask ORNL.
>>>>
>>>> And if I use:
>>>>
>>>> -fieldsplit_i1_ksp_norm_type none
>>>> -fieldsplit_i1_ksp_max_it 300
>>>>
>>>> for all 9 "i" variables, I can run normal iterations on the 10th
>>>> variable, in a 10 species problem, and it works perfectly with 10 threads.
>>>>
>>>> So it is definitely that VecNorm is not thread safe.
>>>>
>>>> And, I want to call SuperLU_dist, which uses threads, but I don't want
>>>> SuperLU to start using threads. Is there a way to tell superLU that there
>>>> are no threads but have PETSc use them?
>>>>
>>>>
>>>>   My interpretation and Satish's for many years is that SuperLU_DIST
>>>> has to be built with and use OpenMP in order to work with CUDA.
>>>>
>>>>   def formCMakeConfigureArgs(self):
>>>>     args = config.package.CMakePackage.formCMakeConfigureArgs(self)
>>>>     if self.openmp.found:
>>>>       self.usesopenmp = 'yes'
>>>>     else:
>>>>       args.append('-DCMAKE_DISABLE_FIND_PACKAGE_OpenMP=TRUE')
>>>>     if self.cuda.found:
>>>>       if not self.openmp.found:
>>>>         raise RuntimeError('SuperLU_DIST GPU code currently requires
>>>> OpenMP. Use --with-openmp=1')
>>>>
>>>> But this could be ok. You use OpenMP and then it uses OpenMP
>>>> internally, each doing their own business (what could go wrong :-)).
>>>>
>>>> Have you tried it?
>>>>
>>>>   Barry
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>> On Thu, Jan 21, 2021 at 5:19 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>>> OK, the problem is probably:
>>>>>
>>>>> PetscMPIInt PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_FUNNELED;
>>>>>
>>>>> There is an example that sets:
>>>>>
>>>>> PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_MULTIPLE;
>>>>>
>>>>> This is what I need.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 21, 2021 at 2:26 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 21, 2021 at 2:11 PM Matthew Knepley <knepley at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Thu, Jan 21, 2021 at 2:02 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>>> On Thu, Jan 21, 2021 at 1:44 PM Matthew Knepley <knepley at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Thu, Jan 21, 2021 at 11:16 AM Mark Adams <mfadams at lbl.gov>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Yes, the problem is that each KSP solver is running in an OMP
>>>>>>>>>> thread (So at this point it only works for SELF and its Landau so it is all
>>>>>>>>>> I need). It looks like MPI reductions called with a comm_self are not
>>>>>>>>>> thread safe (eg, the could say, this is one proc, thus, just copy send -->
>>>>>>>>>> recv, but they don't)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Instead of using SELF, how about Comm_dup() for each thread?
>>>>>>>>>
>>>>>>>>
>>>>>>>> OK, raw MPI_Comm_dup. I tried PetscCommDup. Let me this.
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>
>>>>>>> You would have to dup them all outside the OMP section, since it is
>>>>>>> not threadsafe. Then each thread uses one I think.
>>>>>>>
>>>>>>
>>>>>> Yea sure. I do it in SetUp.
>>>>>>
>>>>>> Well that worked to get *different Comms*, finally, I still get the
>>>>>> same problem. The number of iterations differ wildly. This two species and
>>>>>> two threads (13 SNES its that is not deterministic). Way below is one
>>>>>> thread (8 its) and fairly uniform iteration counts.
>>>>>>
>>>>>> Maybe this MPI is just not thread safe at all. Let me look into it.
>>>>>> Thanks anyway,
>>>>>>
>>>>>>    0 SNES Function norm 4.974994975313e-03
>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>> 0x80017c60. Comms pc=0x67ad27c0 ksp=*0x7ffe1600* newcomm=0x8014b6e0
>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>> 0x7ffdabc0. Comms pc=0x67ad27c0 ksp=*0x7fff70d0* newcomm=0x7ffe9980
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 282
>>>>>>     1 SNES Function norm 1.836376279964e-05
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>> iterations 19
>>>>>>     2 SNES Function norm 3.059930074740e-07
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>> iterations 15
>>>>>>     3 SNES Function norm 4.744275398121e-08
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>> iterations 4
>>>>>>     4 SNES Function norm 4.014828563316e-08
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 456
>>>>>>     5 SNES Function norm 5.670836337808e-09
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>> iterations 2
>>>>>>     6 SNES Function norm 2.410421401323e-09
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>> iterations 18
>>>>>>     7 SNES Function norm 6.533948191791e-10
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 458
>>>>>>     8 SNES Function norm 1.008133815842e-10
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>> iterations 9
>>>>>>     9 SNES Function norm 1.690450876038e-11
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>> iterations 4
>>>>>>    10 SNES Function norm 1.336383986009e-11
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 463
>>>>>>    11 SNES Function norm 1.873022410774e-12
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 113
>>>>>>    12 SNES Function norm 1.801834606518e-13
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>> iterations 1
>>>>>>    13 SNES Function norm 1.004397317339e-13
>>>>>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE
>>>>>> iterations 13
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>     0 SNES Function norm 4.974994975313e-03
>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>> 0x6e265010. Comms pc=0x56450340 ksp=0x6e2168d0 newcomm=0x6e265090
>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>> 0x6e25bc40. Comms pc=0x56450340 ksp=0x6e22c1d0 newcomm=0x6e21e8f0
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 282
>>>>>>     1 SNES Function norm 1.836376279963e-05
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 380
>>>>>>     2 SNES Function norm 3.018499983019e-07
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 387
>>>>>>     3 SNES Function norm 1.826353175637e-08
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 391
>>>>>>     4 SNES Function norm 1.378600599548e-09
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 392
>>>>>>     5 SNES Function norm 1.077289085611e-10
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 394
>>>>>>     6 SNES Function norm 8.571891727748e-12
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 395
>>>>>>     7 SNES Function norm 6.897647643450e-13
>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>> iterations 395
>>>>>>     8 SNES Function norm 5.606434614114e-14
>>>>>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE
>>>>>> iterations 8
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>    Matt
>>>>>>>
>>>>>>>
>>>>>>>>   Matt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <
>>>>>>>>>> knepley at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <mfadams at lbl.gov>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> It looks like PETSc is just too clever for me. I am trying to
>>>>>>>>>>>> get a different MPI_Comm into each block, but PETSc is thwarting me:
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It looks like you are using SELF. Is that what you want? Do you
>>>>>>>>>>> want a bunch of comms with the same group, but independent somehow? I am
>>>>>>>>>>> confused.
>>>>>>>>>>>
>>>>>>>>>>>    Matt
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>   if (jac->use_openmp) {
>>>>>>>>>>>>     ierr          =
>>>>>>>>>>>> KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr);
>>>>>>>>>>>> PetscPrintf(PETSC_COMM_SELF,"In
>>>>>>>>>>>> PCFieldSplitSetFields_FieldSplit with -------------- link: %p. Comms %p
>>>>>>>>>>>> %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp));
>>>>>>>>>>>>   } else {
>>>>>>>>>>>>     ierr          =
>>>>>>>>>>>> KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr);
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>> produces:
>>>>>>>>>>>>
>>>>>>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>>>>>>> 0x7e9cb4f0. Comms 0x660c6ad0 0x660c6ad0
>>>>>>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>>>>>>> 0x7e88f7d0. Comms 0x660c6ad0 0x660c6ad0
>>>>>>>>>>>>
>>>>>>>>>>>> How can I work around this?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <mfadams at lbl.gov>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <bsmith at petsc.dev>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jan 20, 2021, at 3:09 PM, Mark Adams <mfadams at lbl.gov>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I put in a temporary hack to get the first Fieldsplit
>>>>>>>>>>>>>> apply to NOT use OMP and it sort of works.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Preonly/lu is fine. GMRES calls vector creates/dups in every
>>>>>>>>>>>>>> solve so that is a big problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   It should definitely not be creating vectors "in every"
>>>>>>>>>>>>>> solve. But it does do lazy allocation of needed restarted vectors which may
>>>>>>>>>>>>>> make it look like it is creating "every" vectors in every solve.  You can
>>>>>>>>>>>>>> use -ksp_gmres_preallocate to force it to create all the restart vectors up
>>>>>>>>>>>>>> front at KSPSetUp().
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, I run the first solve w/o OMP and I see Vec dups in
>>>>>>>>>>>>> cuSparse Vecs in the 2nd solve.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   Why is creating vectors "at every solve" a problem? It is
>>>>>>>>>>>>>> not thread safe I guess?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It dies when it looks at the options database, in a Free in
>>>>>>>>>>>>> the get-options method to be exact (see stacks).
>>>>>>>>>>>>>
>>>>>>>>>>>>> ======= Backtrace: =========
>>>>>>>>>>>>> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0]
>>>>>>>>>>>>>
>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c]
>>>>>>>>>>>>>
>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0]
>>>>>>>>>>>>>
>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28]
>>>>>>>>>>>>>
>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510]
>>>>>>>>>>>>>
>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664]
>>>>>>>>>>>>>
>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0]
>>>>>>>>>>>>>
>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198]
>>>>>>>>>>>>>
>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168]
>>>>>>>>>>>>>
>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820]
>>>>>>>>>>>>>
>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4]
>>>>>>>>>>>>>
>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richardson works except the convergence test gets confused,
>>>>>>>>>>>>>> presumably because MPI reductions with PETSC_COMM_SELF is not threadsafe.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One fix for the norms might be to create each
>>>>>>>>>>>>>> subdomain solver with a different communicator.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Yes you could do that. It might actually be the correct
>>>>>>>>>>>>>> thing to do also, if you have multiple threads call MPI reductions on the
>>>>>>>>>>>>>> same communicator that would be a problem. Each KSP should get a new
>>>>>>>>>>>>>> MPI_Comm.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> OK. I will only do this.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>>>>> experiments lead.
>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>
>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>>> experiments lead.
>>>>>>>>> -- Norbert Wiener
>>>>>>>>>
>>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>
>>>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210122/2da461ea/attachment-0001.html>


More information about the petsc-dev mailing list