[petsc-dev] Memory problem with OpenMP and Fieldsplit sub solvers

Barry Smith bsmith at petsc.dev
Sun Jan 17 18:31:45 CST 2021


   Mark,

   There could be multiple possible issues.

1)  Calling PetscOptions inside threads. I looked quickly at the code and it seems like it should be ok but perhaps not. This is one reason why having stuff like PetscOptionsBegin inside a low-level creation VecCreate_SeqCUDA_Private is normally not done in PETSc. Eventually this needs to be moved or reworked. 

2) PetscCUDAInitializeCheck is not thread safe.If it is being call for the first timeby multiple threads there can be trouble. So edit init.c and under

#if defined(PETSC_HAVE_CUDA)
  ierr = PetscOptionsCheckCUDA(logView);CHKERRQ(ierr);
#endif

#if defined(PETSC_HAVE_HIP)
  ierr = PetscOptionsCheckHIP(logView);CHKERRQ(ierr);
#endif

put in 
#if defined thread safety
PetscCUPMInitializeCheck
#endif

this will force the initialize to be done before any threads are used

3) a million more possibilities

I would fix 2 first and if that does not solve the problem comment out the options stuff in 1 and see if that resolves the problem.

Good luck,

  Barry




> On Jan 17, 2021, at 4:45 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> I have put the Fieldsplit additive solver loop in an OpenMP loop and it seems to work with CPU solvers. The VecScatters seem to work and the KSP solver seems to work on the CPU. THis is an MPI serial code.
> 
> It fails with a cuSparse Jacobi solver with the error below, on each thread (two threads/blocks in this example).
> I am not following this stack trace completely. eg, is the error in VecInitializePackage perhaps? 
> I don't understand the thread safety mechanism. (eg, I'm not sure why the CPU solvers worked), but I am thinking that this mechanism did not get used properly in cuSparse. Understandable. 
> 
> Can anyone shed any light on what might be going wrong here?
> 
> Thanks,
> Mark
> 
> ======= Backtrace: =========
> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x16b8b84)[0x2000017a8b84]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPSetUp+0x918)[0x2000016d8368]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x15ec754)[0x2000016dc754]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPSolve+0x44)[0x2000016df100]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x14ff5a8)[0x2000015ef5a8]
> /sw/summit/gcc/6.4.0/lib64/libgomp.so.1(GOMP_parallel+0x74)[0x200021711074]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x14efb28)[0x2000015dfb28]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PCApply+0x588)[0x200001685c70]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x16b6374)[0x2000017a6374]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x16b6738)[0x2000017a6738]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x15ed594)[0x2000016dd594]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPSolve+0x44)[0x2000016df100]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(SNESSolve_NEWTONLS+0x11f4)[0x20000189dcd8]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(SNESSolve+0x1b54)[0x2000018401a8]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x18b0d3c)[0x2000019a0d3c]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(TSStep+0x408)[0x2000019284a0]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(TSSolve+0x15cc)[0x20000192a334]
> ./ex2[0x1000a348]
> /lib64/libc.so.6(+0x25200)[0x2000217c5200]
> /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000217c53f4]



More information about the petsc-dev mailing list