<div dir="ltr">Also, this uses the branch: adams/mat-rap-blocksize<div>that has fixes to get the block sizes moved up in P'AP.</div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, Feb 18, 2025 at 9:29 AM Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I've got a bug in pbjacobi that only shows up on<b> the Galerkin coarse grid </b>(I have not been able to reproduce it a fine grid at least), and in <b>parallel</b>, and on <b>GPUs</b>/kokkos.<div><br></div><div>I have modified ex55 to take P from GAMG and give it (one) to PCMG with Galerkin coarse grids,and solve (code and command lines appended).</div><div>I see this with ex56 (bs=3 & 6), ex55 (bs=2), but ex54 (bs=1) is fine (does pbjacobi switch to jacobi?)</div><div><br></div><div>With 4 processors I get these valgrind errors only on these bad solves (no false positives). Note that <b>only one process has errors</b>, and note some solver output before and after:</div><div><br></div><div>I'm going to keeps digging but ideas are welcome,</div><div>Thanks,</div><div>Mark</div><div><br></div><div>[0] <pc:gamg> PCSetUp_GAMG(): (null): 1) N=4, n data cols=2, nnz/row (ave)=4, 1 active pes<br>[0] <pc:gamg> PCSetUp_GAMG(): (null): 2 levels, operator complexity = 1.04<br>[0] <pc:gamg> PCSetUp_GAMG(): (null): PCSetUp_GAMG: call KSPChebyshevSetEigenvalues on level 0 (N=32) with emax = 2.26125 emin = 0.0198344<br>[0] <pc:gamg> PCSetUp_MG(): Using outer operators to define finest grid operator <br> because PCMGGetSmoother(pc,nlevels-1,&ksp);KSPSetOperators(ksp,...); was not called.<br>[0] <pc:mg> PCSetUp_MG(): Using outer operators to define finest grid operator <br> because PCMGGetSmoother(pc,nlevels-1,&ksp);KSPSetOperators(ksp,...); was not called.<br>==978424== Invalid read of size 16<br>==978424== at 0x42B9AFA2: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42AF0F12: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42C3EE7B: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42EEC474: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42B4B495: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42EC748F: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42B44781: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42CF766D: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42445504: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x42417A04: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x42468730: cudaMemcpy (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x4896979: cuda_memcpy_wrapper<> (Kokkos_Cuda_Instance.hpp:365)<br>==978424== by 0x4896979: Kokkos::Impl::DeepCopyCuda(void*, void const*, unsigned long) (Kokkos_CudaSpace.cpp:62)<br>==978424== Address 0xcb722dfc is 1,644 bytes inside a block of size 1,652 alloc'd<br>==978424== at 0x4E0A926: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)<br>==978424== by 0x4E0AA69: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)<br>==978424== by 0x57E3BD9: PetscMallocAlign (mal.c:52)<br>==978424== by 0x57E892F: PetscTrMallocDefault (mtr.c:175)<br>==978424== by 0x57E5A77: PetscMallocA (mal.c:421)<br>==978424== by 0x63B02F5: MatInvertBlockDiagonal_SeqAIJ (aij.c:3333)<br>==978424== by 0x69B0599: MatInvertBlockDiagonal (matrix.c:10908)<br>==978424== by 0x629D585: MatInvertBlockDiagonal_MPIAIJ (mpiaij.c:2588)<br>==978424== by 0x69B0599: MatInvertBlockDiagonal (matrix.c:10908)<br>==978424== by 0x7C7F726: PCSetUp_PBJacobi_Host (pbjacobi.c:256)<br>==978424== by 0x7D737D8: PCSetUp_PBJacobi_Kokkos (pbjacobi_kok.kokkos.cxx:90)<br>==978424== by 0x7C803F8: PCSetUp_PBJacobi (pbjacobi.c:296)<br>==978424== <br>==978424== Invalid read of size 16<br>==978424== at 0x42B9AFB4: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42AF0F12: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42C3EE7B: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42EEC474: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42B4B495: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42EC748F: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42B44781: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42CF766D: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42445504: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x42417A04: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x42468730: cudaMemcpy (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x4896979: cuda_memcpy_wrapper<> (Kokkos_Cuda_Instance.hpp:365)<br>==978424== by 0x4896979: Kokkos::Impl::DeepCopyCuda(void*, void const*, unsigned long) (Kokkos_CudaSpace.cpp:62)<br>==978424== Address 0xcb722e0c is 8 bytes after a block of size 1,652 alloc'd<br>==978424== at 0x4E0A926: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)<br>==978424== by 0x4E0AA69: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)<br>==978424== by 0x57E3BD9: PetscMallocAlign (mal.c:52)<br>==978424== by 0x57E892F: PetscTrMallocDefault (mtr.c:175)<br>==978424== by 0x57E5A77: PetscMallocA (mal.c:421)<br>==978424== by 0x63B02F5: MatInvertBlockDiagonal_SeqAIJ (aij.c:3333)<br>==978424== by 0x69B0599: MatInvertBlockDiagonal (matrix.c:10908)<br>==978424== by 0x629D585: MatInvertBlockDiagonal_MPIAIJ (mpiaij.c:2588)<br>==978424== by 0x69B0599: MatInvertBlockDiagonal (matrix.c:10908)<br>==978424== by 0x7C7F726: PCSetUp_PBJacobi_Host (pbjacobi.c:256)<br>==978424== by 0x7D737D8: PCSetUp_PBJacobi_Kokkos (pbjacobi_kok.kokkos.cxx:90)<br>==978424== by 0x7C803F8: PCSetUp_PBJacobi (pbjacobi.c:296)<br>==978424== <br>==978424== Invalid read of size 4<br>==978424== at 0x42B9B103: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42AF0F12: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42C3EE7B: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42EEC474: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42B4B495: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42EC748F: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42B44781: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42CF766D: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42445504: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x42417A04: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x42468730: cudaMemcpy (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x4896979: cuda_memcpy_wrapper<> (Kokkos_Cuda_Instance.hpp:365)<br>==978424== by 0x4896979: Kokkos::Impl::DeepCopyCuda(void*, void const*, unsigned long) (Kokkos_CudaSpace.cpp:62)<br>==978424== Address 0xcb722e1c is 12 bytes after a block of size 1,664 in arena "client"<br>==978424== <br>==978424== Invalid read of size 4<br>==978424== at 0x42B9B107: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42AF0F12: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42C3EE7B: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42EEC474: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42B4B495: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42EC748F: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42B44781: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42CF766D: ??? (in /usr/lib64/libcuda.so.550.127.08)<br>==978424== by 0x42445504: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x42417A04: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x42468730: cudaMemcpy (in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/targets/x86_64-linux/lib/libcudart.so.12.2.53)<br>==978424== by 0x4896979: cuda_memcpy_wrapper<> (Kokkos_Cuda_Instance.hpp:365)<br>==978424== by 0x4896979: Kokkos::Impl::DeepCopyCuda(void*, void const*, unsigned long) (Kokkos_CudaSpace.cpp:62)<br>==978424== Address 0xcb722e1c is 12 bytes after a block of size 1,664 in arena "client"<br>==978424== <br> Residual norms for rap_mg_coarse_ solve.<br> 0 KSP Residual norm 1.118121970641e+01<br> 1 KSP Residual norm 3.575439993035e-01<br></div><div>[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br>[0]PETSC ERROR: Diverged due to indefinite preconditioner, beta -0.00553023, betaold 5.27744<br></div><div><br></div><div><br></div><div>ksp/ex55:</div><div><br></div><div><br> <font face="monospace">PC pc;<br> PetscCall(KSPGetPC(ksp, &pc));<br> PC_MG *mg = (PC_MG *)pc->data;<br> PC_MG_Levels **mglevels = mg->levels;<br> Mat P = mglevels[mg->nlevels-1]->interpolate;<br> PetscCall(MatViewFromOptions(mglevels[mg->nlevels-1]->A, NULL, "-rap_mat_view"));<br> PetscCall(MatViewFromOptions(Amat, NULL, "-rap_mat_view"));<br> KSP ksp2;<br> PetscCall(KSPCreate(PETSC_COMM_WORLD, &ksp2));<br> PetscCall(KSPSetOptionsPrefix(ksp2, "rap_"));<br> PetscCall(KSPSetFromOptions(ksp2));<br> PetscCall(KSPGetPC(ksp2, &pc));<br> PetscCall(KSPSetOperators(ksp2, Amat, Amat));<br> PetscCall(PCMGSetGalerkin(pc, PC_MG_GALERKIN_PMAT));<br> PetscCall(PCMGSetInterpolation(pc, 1, P));<br> PetscCall(VecSet(bb, 1.0));<br> PetscCall(PetscLogStagePush(stage[1]));<br> PetscCall(KSPSolve(ksp2, bb, xx));<br> PetscCall(PetscLogStagePop());<br> PetscCall(PetscFinalize());<br> exit(12);<br></font><br><br> PetscCall(PetscLogStagePush(stage[1])); // original ex55 code<br> PetscCall(KSPSolve(ksp, bb, xx));</div><div><br></div><div><br></div><div><i>$ srun -n 4 valgrind --tool=memcheck --leak-check=no ./ex55 -ne 3 -pc_type gamg -rap_pc_type mg -rap_ksp_monitor -rap_mg_levels_pc_type jacobi -rap_mg_coarse_pc_type pbjacobi -rap_mg_coarse_ksp_monitor -options_left -rap_pc_mg_levels 2 -rap_mg_coarse_ksp_type cg -mat_type aijkokkos -fp_trap -ksp_monitor -rap_ksp_viewxx -info :pc,dm -rap_mg_coarse_ksp_error_if_not_converged </i></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div>
</blockquote></div>