[petsc-dev] Memory problem with OpenMP and Fieldsplit sub solvers

Mark Adams mfadams at lbl.gov
Sun Jan 24 11:03:52 CST 2021


Thanks Pierre, that works.

On Sun, Jan 24, 2021 at 11:42 AM Pierre Jolivet <pierre at joliv.et> wrote:

>
>
> On 24 Jan 2021, at 4:54 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> Hi Sherry, I have this running with OMP, with cuSparse solves (PETSc CPU
> factorizations)
>
> Building SuperLU_dist w/o _OPENMP was not easy for me.
>
>
> Expanding on Sherry’s answer on Thu Jan 21, it should be as easy as adding
> to your configure script
> '--download-superlu-cmake-arguments=-Denable_openmp=FALSE' '
> --download-superlu_dist-cmake-arguments=-Denable_openmp=FALSE'.
> Is that not easy enough, or is it not working?
>
> Thanks,
> Pierre
>
> We need to get a better way to do this. (Satish or Barry?)
>
> SuperLU works with one thread and two subdomains. With two threads I see
> this (appended). So this seems to be working in that before it was hanging.
>
> I set the solver up so that it does not use threads the first time it is
> called so that solvers can get any lazy allocations done in serial. This is
> just to be safe in that we do not use a Krylov method here and I don't
> believe "preonly" allocates any work vectors, and SuperLU does the symbolic
> factorizations without threads.
>
> Let me know how you want to proceed.
>
> Thanks,
> Mark
>
> ijcusparse -dm_vec_type cuda' NC=2 |g energy
>   0) species-0: charge density= -1.6022862392985e+01 z-momentum=
> -3.4369550192576e-19 energy=  9.6063873494138e+04
>   0) species-1: charge density=  1.6029950760009e+01 z-momentum=
> -2.7844197929124e-18 energy=  9.6333444502318e+04
>  0) Total: charge density=  7.0883670236874e-03, momentum=
> -3.1281152948382e-18, energy=  1.9239731799646e+05 (m_i[0]/m_e = 1835.47,
> 92 cells)
> ex2:
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/externalpackages/git.superlu_dist/SRC/dSchCompUdt-cuda.c:157:
> pdgstrf: Assertion `jjj-1<nub' failed.
> [h16n13:21073] *** Process received signal ***
> [h16n13:21073] Signal: Aborted (6)
> [h16n13:21073] Signal code: User function (kill, sigsend, abort, etc.) (0)
> [h16n13:21073] [ 0] [0x2000000504d8]
> [h16n13:21073] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200020ef2094]
> [h16n13:21073] [ 2] /lib64/libc.so.6(+0x356d4)[0x200020ee56d4]
> [h16n13:21073] [ 3] /lib64/libc.so.6(__assert_fail+0x64)[0x200020ee57c4]
> [h16n13:21073] [ 4]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libsuperlu_dist.so.6(pdgstrf+0x3848)[0x2000022fe5d8]
> [h16n13:21073] [ 5]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libsuperlu_dist.so.6(pdgssvx+0x1220)[0x2000022dc4a8]
> [h16n13:21073] [ 6]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x9aff28)[0x200000a9ff28]
> [h16n13:21073] [ 7]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(MatLUFactorNumeric+0x144)[0x2000007d273c]
> [h16n13:21073] [ 8]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0xecffc4)[0x200000fbffc4]
> [h16n13:21073] [ 9]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PCSetUp+0x134)[0x20000107dd38]
> [h16n13:21073] [10]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPSetUp+0x9f8)[0x2000010b272c]
> [h16n13:21073] [11]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0xfc46f0)[0x2000010b46f0]
> [h16n13:21073] [12]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPSolve+0x20)[0x2000010b6fb8]
> [h16n13:21073] [13]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0xf1e4f8)[0x20000100e4f8]
> [h16n13:21073] [14]
> /sw/summit/gcc/6.4.0/lib64/libgomp.so.1(+0x1a51c)[0x200020e2a51c]
> [h16n13:21073] [15] /lib64/libpthread.so.0(+0x8b94)[0x200020e78b94]
> [h16n13:21073] [16] /lib64/libc.so.6(clone+0xe4)[0x200020fd85f4]
> [h16n13:21073] *** End of error message ***
> ERROR:  One or more process (first noticed rank 0) terminated with signal
> 6 (core dumped)
> make: [runasm] Error 134 (ignored)
>
> On Thu, Jan 21, 2021 at 11:57 PM Xiaoye S. Li <xsli at lbl.gov> wrote:
>
>> All the OpenMP calls are surrounded by
>>
>> #ifdef _OPENMP
>> ...
>> #endif
>>
>> You can disable openmp during Cmake installation, with the following:
>>     -Denable_openmp=FALSE
>> (the default is true)
>>
>> (I think Satish knows how to do this with PETSc installation)
>>
>> -------
>> The reason to use mixed MPI & OpenMP is mainly less memory consumption,
>> compared to pure MPI.  Timewise probably it is just slightly faster. (I
>> think that's the case with many codes.)
>>
>>
>> Sherry
>>
>> On Thu, Jan 21, 2021 at 7:20 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>>
>>>
>>> On Thu, Jan 21, 2021 at 10:16 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>>
>>>>
>>>> On Jan 21, 2021, at 9:11 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>> I have tried it and it hangs, but that is expected. This is not
>>>> something she has prepared for.
>>>>
>>>> I am working with Sherry on it.
>>>>
>>>> And she is fine with just one thread and suggests it if she is in a
>>>> thread.
>>>>
>>>> Now that I think about it, I don't understand why she needs OpenMP if
>>>> she can live with OMP_NUM_THREADS=1.
>>>>
>>>>
>>>>  It is very possible it was just a coding decision by one of her
>>>> students and with a few ifdef in her code should would not need the OpenMP
>>>> but I don't have the time or energy to check her code and design decision.
>>>>
>>>
>>> Oh yea there OMP calls like omp_num_threads() that need something. There
>>> is probably a omp1.h file somewhere in the world like our serial MPI.
>>>
>>>
>>>>
>>>>   Barry
>>>>
>>>>
>>>> Mark
>>>>
>>>>
>>>>
>>>> On Thu, Jan 21, 2021 at 9:30 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Jan 21, 2021, at 5:37 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>> This did not work. I verified that MPI_Init_thread is being called
>>>>> correctly and that MPI returns that it supports this highest level of
>>>>> thread safety.
>>>>>
>>>>> I am going to ask ORNL.
>>>>>
>>>>> And if I use:
>>>>>
>>>>> -fieldsplit_i1_ksp_norm_type none
>>>>> -fieldsplit_i1_ksp_max_it 300
>>>>>
>>>>> for all 9 "i" variables, I can run normal iterations on the 10th
>>>>> variable, in a 10 species problem, and it works perfectly with 10 threads.
>>>>>
>>>>> So it is definitely that VecNorm is not thread safe.
>>>>>
>>>>> And, I want to call SuperLU_dist, which uses threads, but I don't want
>>>>> SuperLU to start using threads. Is there a way to tell superLU that there
>>>>> are no threads but have PETSc use them?
>>>>>
>>>>>
>>>>>   My interpretation and Satish's for many years is that SuperLU_DIST
>>>>> has to be built with and use OpenMP in order to work with CUDA.
>>>>>
>>>>>   def formCMakeConfigureArgs(self):
>>>>>     args = config.package.CMakePackage.formCMakeConfigureArgs(self)
>>>>>     if self.openmp.found:
>>>>>       self.usesopenmp = 'yes'
>>>>>     else:
>>>>>       args.append('-DCMAKE_DISABLE_FIND_PACKAGE_OpenMP=TRUE')
>>>>>     if self.cuda.found:
>>>>>       if not self.openmp.found:
>>>>>         raise RuntimeError('SuperLU_DIST GPU code currently requires
>>>>> OpenMP. Use --with-openmp=1')
>>>>>
>>>>> But this could be ok. You use OpenMP and then it uses OpenMP
>>>>> internally, each doing their own business (what could go wrong :-)).
>>>>>
>>>>> Have you tried it?
>>>>>
>>>>>   Barry
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>> On Thu, Jan 21, 2021 at 5:19 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> OK, the problem is probably:
>>>>>>
>>>>>> PetscMPIInt PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_FUNNELED;
>>>>>>
>>>>>> There is an example that sets:
>>>>>>
>>>>>> PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_MULTIPLE;
>>>>>>
>>>>>> This is what I need.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 21, 2021 at 2:26 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 21, 2021 at 2:11 PM Matthew Knepley <knepley at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Thu, Jan 21, 2021 at 2:02 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>>
>>>>>>>>> On Thu, Jan 21, 2021 at 1:44 PM Matthew Knepley <knepley at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On Thu, Jan 21, 2021 at 11:16 AM Mark Adams <mfadams at lbl.gov>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yes, the problem is that each KSP solver is running in an OMP
>>>>>>>>>>> thread (So at this point it only works for SELF and its Landau so it is all
>>>>>>>>>>> I need). It looks like MPI reductions called with a comm_self are not
>>>>>>>>>>> thread safe (eg, the could say, this is one proc, thus, just copy send -->
>>>>>>>>>>> recv, but they don't)
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Instead of using SELF, how about Comm_dup() for each thread?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> OK, raw MPI_Comm_dup. I tried PetscCommDup. Let me this.
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>
>>>>>>>> You would have to dup them all outside the OMP section, since it is
>>>>>>>> not threadsafe. Then each thread uses one I think.
>>>>>>>>
>>>>>>>
>>>>>>> Yea sure. I do it in SetUp.
>>>>>>>
>>>>>>> Well that worked to get *different Comms*, finally, I still get the
>>>>>>> same problem. The number of iterations differ wildly. This two species and
>>>>>>> two threads (13 SNES its that is not deterministic). Way below is one
>>>>>>> thread (8 its) and fairly uniform iteration counts.
>>>>>>>
>>>>>>> Maybe this MPI is just not thread safe at all. Let me look into it.
>>>>>>> Thanks anyway,
>>>>>>>
>>>>>>>    0 SNES Function norm 4.974994975313e-03
>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>> 0x80017c60. Comms pc=0x67ad27c0 ksp=*0x7ffe1600* newcomm=0x8014b6e0
>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>> 0x7ffdabc0. Comms pc=0x67ad27c0 ksp=*0x7fff70d0* newcomm=0x7ffe9980
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 282
>>>>>>>     1 SNES Function norm 1.836376279964e-05
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>>> iterations 19
>>>>>>>     2 SNES Function norm 3.059930074740e-07
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>>> iterations 15
>>>>>>>     3 SNES Function norm 4.744275398121e-08
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>>> iterations 4
>>>>>>>     4 SNES Function norm 4.014828563316e-08
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 456
>>>>>>>     5 SNES Function norm 5.670836337808e-09
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>>> iterations 2
>>>>>>>     6 SNES Function norm 2.410421401323e-09
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>>> iterations 18
>>>>>>>     7 SNES Function norm 6.533948191791e-10
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 458
>>>>>>>     8 SNES Function norm 1.008133815842e-10
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>>> iterations 9
>>>>>>>     9 SNES Function norm 1.690450876038e-11
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>>> iterations 4
>>>>>>>    10 SNES Function norm 1.336383986009e-11
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 463
>>>>>>>    11 SNES Function norm 1.873022410774e-12
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 113
>>>>>>>    12 SNES Function norm 1.801834606518e-13
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>>>>>> iterations 1
>>>>>>>    13 SNES Function norm 1.004397317339e-13
>>>>>>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE
>>>>>>> iterations 13
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     0 SNES Function norm 4.974994975313e-03
>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>> 0x6e265010. Comms pc=0x56450340 ksp=0x6e2168d0 newcomm=0x6e265090
>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>> 0x6e25bc40. Comms pc=0x56450340 ksp=0x6e22c1d0 newcomm=0x6e21e8f0
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 282
>>>>>>>     1 SNES Function norm 1.836376279963e-05
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 380
>>>>>>>     2 SNES Function norm 3.018499983019e-07
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 387
>>>>>>>     3 SNES Function norm 1.826353175637e-08
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 391
>>>>>>>     4 SNES Function norm 1.378600599548e-09
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 392
>>>>>>>     5 SNES Function norm 1.077289085611e-10
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 394
>>>>>>>     6 SNES Function norm 8.571891727748e-12
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 395
>>>>>>>     7 SNES Function norm 6.897647643450e-13
>>>>>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>>>>>> iterations 395
>>>>>>>     8 SNES Function norm 5.606434614114e-14
>>>>>>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE
>>>>>>> iterations 8
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>    Matt
>>>>>>>>
>>>>>>>>
>>>>>>>>>   Matt
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <
>>>>>>>>>>> knepley at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <mfadams at lbl.gov>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like PETSc is just too clever for me. I am trying to
>>>>>>>>>>>>> get a different MPI_Comm into each block, but PETSc is thwarting me:
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like you are using SELF. Is that what you want? Do you
>>>>>>>>>>>> want a bunch of comms with the same group, but independent somehow? I am
>>>>>>>>>>>> confused.
>>>>>>>>>>>>
>>>>>>>>>>>>    Matt
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>   if (jac->use_openmp) {
>>>>>>>>>>>>>     ierr          =
>>>>>>>>>>>>> KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr);
>>>>>>>>>>>>> PetscPrintf(PETSC_COMM_SELF,"In
>>>>>>>>>>>>> PCFieldSplitSetFields_FieldSplit with -------------- link: %p. Comms %p
>>>>>>>>>>>>> %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp));
>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>     ierr          =
>>>>>>>>>>>>> KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr);
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>
>>>>>>>>>>>>> produces:
>>>>>>>>>>>>>
>>>>>>>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>>>>>>>> 0x7e9cb4f0. Comms 0x660c6ad0 0x660c6ad0
>>>>>>>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>>>>>>>> 0x7e88f7d0. Comms 0x660c6ad0 0x660c6ad0
>>>>>>>>>>>>>
>>>>>>>>>>>>> How can I work around this?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <mfadams at lbl.gov>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <bsmith at petsc.dev>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jan 20, 2021, at 3:09 PM, Mark Adams <mfadams at lbl.gov>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So I put in a temporary hack to get the first Fieldsplit
>>>>>>>>>>>>>>> apply to NOT use OMP and it sort of works.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Preonly/lu is fine. GMRES calls vector creates/dups in every
>>>>>>>>>>>>>>> solve so that is a big problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   It should definitely not be creating vectors "in every"
>>>>>>>>>>>>>>> solve. But it does do lazy allocation of needed restarted vectors which may
>>>>>>>>>>>>>>> make it look like it is creating "every" vectors in every solve.  You can
>>>>>>>>>>>>>>> use -ksp_gmres_preallocate to force it to create all the restart vectors up
>>>>>>>>>>>>>>> front at KSPSetUp().
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well, I run the first solve w/o OMP and I see Vec dups in
>>>>>>>>>>>>>> cuSparse Vecs in the 2nd solve.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   Why is creating vectors "at every solve" a problem? It is
>>>>>>>>>>>>>>> not thread safe I guess?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It dies when it looks at the options database, in a Free in
>>>>>>>>>>>>>> the get-options method to be exact (see stacks).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ======= Backtrace: =========
>>>>>>>>>>>>>> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Richardson works except the convergence test gets confused,
>>>>>>>>>>>>>>> presumably because MPI reductions with PETSC_COMM_SELF is not threadsafe.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One fix for the norms might be to create each
>>>>>>>>>>>>>>> subdomain solver with a different communicator.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    Yes you could do that. It might actually be the correct
>>>>>>>>>>>>>>> thing to do also, if you have multiple threads call MPI reductions on the
>>>>>>>>>>>>>>> same communicator that would be a problem. Each KSP should get a new
>>>>>>>>>>>>>>> MPI_Comm.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> OK. I will only do this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> What most experimenters take for granted before they begin
>>>>>>>>>>>> their experiments is infinitely more interesting than any results to which
>>>>>>>>>>>> their experiments lead.
>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>
>>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>>>> experiments lead.
>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>
>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>> experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210124/7b2e2373/attachment-0001.html>


More information about the petsc-dev mailing list