[petsc-dev] Memory problem with OpenMP and Fieldsplit sub solvers

Mark Adams mfadams at lbl.gov
Thu Jan 21 17:37:22 CST 2021


This did not work. I verified that MPI_Init_thread is being called
correctly and that MPI returns that it supports this highest level of
thread safety.

I am going to ask ORNL.

And if I use:

-fieldsplit_i1_ksp_norm_type none
-fieldsplit_i1_ksp_max_it 300

for all 9 "i" variables, I can run normal iterations on the 10th variable,
in a 10 species problem, and it works perfectly with 10 threads.

So it is definitely that VecNorm is not thread safe.

And, I want to call SuperLU_dist, which uses threads, but I don't want
SuperLU to start using threads. Is there a way to tell superLU that there
are no threads but have PETSc use them?

Thanks,
Mark

On Thu, Jan 21, 2021 at 5:19 PM Mark Adams <mfadams at lbl.gov> wrote:

> OK, the problem is probably:
>
> PetscMPIInt PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_FUNNELED;
>
> There is an example that sets:
>
> PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_MULTIPLE;
>
> This is what I need.
>
>
>
>
> On Thu, Jan 21, 2021 at 2:26 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>>
>>
>> On Thu, Jan 21, 2021 at 2:11 PM Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Thu, Jan 21, 2021 at 2:02 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> On Thu, Jan 21, 2021 at 1:44 PM Matthew Knepley <knepley at gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, Jan 21, 2021 at 11:16 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> Yes, the problem is that each KSP solver is running in an OMP thread
>>>>>> (So at this point it only works for SELF and its Landau so it is all I
>>>>>> need). It looks like MPI reductions called with a comm_self are not thread
>>>>>> safe (eg, the could say, this is one proc, thus, just copy send --> recv,
>>>>>> but they don't)
>>>>>>
>>>>>
>>>>> Instead of using SELF, how about Comm_dup() for each thread?
>>>>>
>>>>
>>>> OK, raw MPI_Comm_dup. I tried PetscCommDup. Let me this.
>>>> Thanks,
>>>>
>>>
>>> You would have to dup them all outside the OMP section, since it is not
>>> threadsafe. Then each thread uses one I think.
>>>
>>
>> Yea sure. I do it in SetUp.
>>
>> Well that worked to get *different Comms*, finally, I still get the same
>> problem. The number of iterations differ wildly. This two species and two
>> threads (13 SNES its that is not deterministic). Way below is one thread (8
>> its) and fairly uniform iteration counts.
>>
>> Maybe this MPI is just not thread safe at all. Let me look into it.
>> Thanks anyway,
>>
>>    0 SNES Function norm 4.974994975313e-03
>> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x80017c60.
>> Comms pc=0x67ad27c0 ksp=*0x7ffe1600* newcomm=0x8014b6e0
>> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x7ffdabc0.
>> Comms pc=0x67ad27c0 ksp=*0x7fff70d0* newcomm=0x7ffe9980
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 282
>>     1 SNES Function norm 1.836376279964e-05
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>> iterations 19
>>     2 SNES Function norm 3.059930074740e-07
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>> iterations 15
>>     3 SNES Function norm 4.744275398121e-08
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>> iterations 4
>>     4 SNES Function norm 4.014828563316e-08
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 456
>>     5 SNES Function norm 5.670836337808e-09
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>> iterations 2
>>     6 SNES Function norm 2.410421401323e-09
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>> iterations 18
>>     7 SNES Function norm 6.533948191791e-10
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 458
>>     8 SNES Function norm 1.008133815842e-10
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>> iterations 9
>>     9 SNES Function norm 1.690450876038e-11
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>> iterations 4
>>    10 SNES Function norm 1.336383986009e-11
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 463
>>    11 SNES Function norm 1.873022410774e-12
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 113
>>    12 SNES Function norm 1.801834606518e-13
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>> iterations 1
>>    13 SNES Function norm 1.004397317339e-13
>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 13
>>
>>
>>
>>
>>     0 SNES Function norm 4.974994975313e-03
>> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x6e265010.
>> Comms pc=0x56450340 ksp=0x6e2168d0 newcomm=0x6e265090
>> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x6e25bc40.
>> Comms pc=0x56450340 ksp=0x6e22c1d0 newcomm=0x6e21e8f0
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 282
>>     1 SNES Function norm 1.836376279963e-05
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 380
>>     2 SNES Function norm 3.018499983019e-07
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 387
>>     3 SNES Function norm 1.826353175637e-08
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 391
>>     4 SNES Function norm 1.378600599548e-09
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 392
>>     5 SNES Function norm 1.077289085611e-10
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 394
>>     6 SNES Function norm 8.571891727748e-12
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 395
>>     7 SNES Function norm 6.897647643450e-13
>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>> iterations 395
>>     8 SNES Function norm 5.606434614114e-14
>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 8
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>>
>>>    Matt
>>>
>>>
>>>>   Matt
>>>>>
>>>>>
>>>>>> On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <knepley at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>>> It looks like PETSc is just too clever for me. I am trying to get a
>>>>>>>> different MPI_Comm into each block, but PETSc is thwarting me:
>>>>>>>>
>>>>>>>
>>>>>>> It looks like you are using SELF. Is that what you want? Do you want
>>>>>>> a bunch of comms with the same group, but independent somehow? I am
>>>>>>> confused.
>>>>>>>
>>>>>>>    Matt
>>>>>>>
>>>>>>>
>>>>>>>>   if (jac->use_openmp) {
>>>>>>>>     ierr          =
>>>>>>>> KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr);
>>>>>>>> PetscPrintf(PETSC_COMM_SELF,"In PCFieldSplitSetFields_FieldSplit
>>>>>>>> with -------------- link: %p. Comms %p
>>>>>>>> %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp));
>>>>>>>>   } else {
>>>>>>>>     ierr          =
>>>>>>>> KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr);
>>>>>>>>   }
>>>>>>>>
>>>>>>>> produces:
>>>>>>>>
>>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>>> 0x7e9cb4f0. Comms 0x660c6ad0 0x660c6ad0
>>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>>> 0x7e88f7d0. Comms 0x660c6ad0 0x660c6ad0
>>>>>>>>
>>>>>>>> How can I work around this?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <bsmith at petsc.dev>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Jan 20, 2021, at 3:09 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>>>>
>>>>>>>>>> So I put in a temporary hack to get the first Fieldsplit apply to
>>>>>>>>>> NOT use OMP and it sort of works.
>>>>>>>>>>
>>>>>>>>>> Preonly/lu is fine. GMRES calls vector creates/dups in every
>>>>>>>>>> solve so that is a big problem.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>   It should definitely not be creating vectors "in every" solve.
>>>>>>>>>> But it does do lazy allocation of needed restarted vectors which may make
>>>>>>>>>> it look like it is creating "every" vectors in every solve.  You can
>>>>>>>>>> use -ksp_gmres_preallocate to force it to create all the restart vectors up
>>>>>>>>>> front at KSPSetUp().
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Well, I run the first solve w/o OMP and I see Vec dups in cuSparse
>>>>>>>>> Vecs in the 2nd solve.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>   Why is creating vectors "at every solve" a problem? It is not
>>>>>>>>>> thread safe I guess?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It dies when it looks at the options database, in a Free in the
>>>>>>>>> get-options method to be exact (see stacks).
>>>>>>>>>
>>>>>>>>> ======= Backtrace: =========
>>>>>>>>> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0]
>>>>>>>>>
>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c]
>>>>>>>>>
>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0]
>>>>>>>>>
>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28]
>>>>>>>>>
>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510]
>>>>>>>>>
>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664]
>>>>>>>>>
>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0]
>>>>>>>>>
>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198]
>>>>>>>>>
>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168]
>>>>>>>>>
>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820]
>>>>>>>>>
>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4]
>>>>>>>>>
>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Richardson works except the convergence test gets confused,
>>>>>>>>>> presumably because MPI reductions with PETSC_COMM_SELF is not threadsafe.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> One fix for the norms might be to create each subdomain solver
>>>>>>>>>> with a different communicator.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    Yes you could do that. It might actually be the correct thing
>>>>>>>>>> to do also, if you have multiple threads call MPI reductions on the same
>>>>>>>>>> communicator that would be a problem. Each KSP should get a new MPI_Comm.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> OK. I will only do this.
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210121/289bc5c8/attachment-0001.html>


More information about the petsc-dev mailing list