<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div> Any way to run with valgrind (or a HIP variant of valgrind)? It looks like a memory corruption issue and tracking down exactly when the corruption begins is 3/4's of the way to finding the exact cause.</div><div><br class=""></div><div> Are the crashes reproducible in the same place with identical runs?</div><div><br class=""></div><div><br class=""><blockquote type="cite" class=""><div class="">On Jan 26, 2022, at 10:46 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov" class="">mfadams@lbl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">I think it is an MPI bug. It works with GPU aware MPI turned off. <div class="">I am sure Summit will be fine.<div class="">We have had users fix this error by switching thier MPI.</div></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" class="">junchao.zhang@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class="">I don't know if this is due to bugs in petsc/kokkos backend. See if you can run 6 nodes (48 mpi ranks). If it fails, then run the same problem on Summit with 8 nodes to see if it still fails. If yes, it is likely a bug of our own.</div><div class=""><br clear="all" class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class="">--Junchao Zhang</div></div></div><br class=""></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 26, 2022 at 8:44 AM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank" class="">mfadams@lbl.gov</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="">I am not able to reproduce this with a small problem. 2 nodes or less refinement works. This is from the 8 node test, the -dm_refine 5 version.<div class="">I see that it comes from PtAP.<div class=""><div class="">This is on the fine grid. (I was thinking it could be on a reduced grid with idle processors, but no)</div><div class=""><br class=""></div><div class="">[15]PETSC ERROR: Argument out of range<br class="">[15]PETSC ERROR: Key <= 0<br class="">[15]PETSC ERROR: See <a href="https://petsc.org/release/faq/" target="_blank" class="">https://petsc.org/release/faq/</a> for trouble shooting.<br class="">[15]PETSC ERROR: Petsc Development GIT revision: v3.16.3-696-g46640c56cb GIT Date: 2022-01-25 09:20:51 -0500<br class="">[15]PETSC ERROR: /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022<br class="">[15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 --download-p4est=1 --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4 PETSC_ARCH=arch-olcf-crusher<br class="">[15]PETSC ERROR: #1 PetscTableFind() at /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131<br class="">[15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35<br class="">[15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735<br class="">[15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14<br class="">[15]PETSC ERROR: #5 MatAssemblyEnd() at /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678<br class="">[15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices() at /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267<br class="">[15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825<br class="">[15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167<br class="">[15]PETSC ERROR: #9 MatProductSymbolic() at /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825<br class="">[15]PETSC ERROR: #10 MatPtAP() at /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656<br class="">[15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87<br class="">[15]PETSC ERROR: #12 PCSetUp_GAMG() at /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663<br class="">[15]PETSC ERROR: #13 PCSetUp() at /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017<br class="">[15]PETSC ERROR: #14 KSPSetUp() at /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:417<br class="">[15]PETSC ERROR: #15 KSPSolve_Private() at /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:863<br class="">[15]PETSC ERROR: #16 KSPSolve() at /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:1103<br class="">[15]PETSC ERROR: #17 SNESSolve_KSPONLY() at /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/impls/ksponly/ksponly.c:51<br class="">[15]PETSC ERROR: #18 SNESSolve() at /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:4810<br class="">[15]PETSC ERROR: #19 main() at ex13.c:169<br class="">[15]PETSC ERROR: PETSc Option Table entries:<br class="">[15]PETSC ERROR: -benchmark_it 10<br class=""></div></div></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 26, 2022 at 7:26 AM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank" class="">mfadams@lbl.gov</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="">The GPU aware MPI is dying going 1 to 8 nodes, 8 processes per node.<div class="">I will make a minimum reproducer. start with 2 nodes, one process on each node.<br class=""><div class=""><br class=""></div></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jan 25, 2022 at 10:19 PM Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank" class="">bsmith@petsc.dev</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class=""><br class=""></div> So the MPI is killing you in going from 8 to 64. (The GPU flop rate scales almost perfectly, but the overall flop rate is only half of what it should be at 64).<br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jan 25, 2022, at 9:24 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank" class="">mfadams@lbl.gov</a>> wrote:</div><br class=""><div class=""><div dir="ltr" class="">It looks like we have our instrumentation and job configuration in decent shape so on to scaling with AMG.<div class="">In using multiple nodes I got errors with table entries not found, which can be caused by a buggy MPI, and the problem does go away when I turn GPU aware MPI off.</div><div class="">Jed's analysis, if I have this right, is that at <b class="">0.7T</b> flops we are at about 35% of theoretical peal wrt memory bandwidth.</div><div class="">I run out of memory with the next step in this study (7 levels of refinement), with 2M equations per GPU. This seems low to me and we will see if we can fix this.</div><div class="">So this 0.7Tflops is with only 1/4 M equations so 35% is not terrible.</div><div class="">Here are the solve times with 001, 008 and 064 nodes, and 5 or 6 levels of refinement.</div><div class=""><br class=""></div><div class="">out_001_kokkos_Crusher_5_1.txt:KSPSolve 10 1.0 1.2933e+00 1.0 4.13e+10 1.1 1.8e+05 8.4e+03 5.8e+02 3 87 86 78 48 100100100100100 248792 423857 6840 3.85e+02 6792 3.85e+02 100<br class="">out_001_kokkos_Crusher_6_1.txt:KSPSolve 10 1.0 5.3667e+00 1.0 3.89e+11 1.0 2.1e+05 3.3e+04 6.7e+02 2 87 86 79 48 100100100100100 571572 <b class="">700002</b> 7920 1.74e+03 7920 1.74e+03 100<br class="">out_008_kokkos_Crusher_5_1.txt:KSPSolve 10 1.0 1.9407e+00 1.0 4.94e+10 1.1 3.5e+06 6.2e+03 6.7e+02 5 87 86 79 47 100100100100100 1581096 3034723 7920 6.88e+02 7920 6.88e+02 100<br class="">out_008_kokkos_Crusher_6_1.txt:KSPSolve 10 1.0 7.4478e+00 1.0 4.49e+11 1.0 4.1e+06 2.3e+04 7.6e+02 2 88 87 80 49 100100100100100 3798162 5557106 9367 3.02e+03 9359 3.02e+03 100<br class="">out_064_kokkos_Crusher_5_1.txt:KSPSolve 10 1.0 2.4551e+00 1.0 5.40e+10 1.1 4.2e+07 5.4e+03 7.3e+02 5 88 87 80 47 100100100100100 11065887 23792978 8684 8.90e+02 8683 8.90e+02 100<br class="">out_064_kokkos_Crusher_6_1.txt:KSPSolve 10 1.0 1.1335e+01 1.0 5.38e+11 1.0 5.4e+07 2.0e+04 9.1e+02 4 88 88 82 49 100100100100100 24130606 43326249 11249 4.26e+03 11249 4.26e+03 100<br class=""></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jan 25, 2022 at 1:49 PM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank" class="">mfadams@lbl.gov</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br class="">
Note that Mark's logs have been switching back and forth between -use_gpu_aware_mpi and changing number of ranks -- we won't have that information if we do manual timing hacks. This is going to be a routine thing we'll need on the mailing list and we need the provenance to go with it.<br class=""></blockquote><div class=""><br class=""></div><div class="">GPU aware MPI crashes sometimes so to be safe, while debugging, I had it off. It works fine here so it has been on in the last tests.</div><div class="">Here is a comparison.</div><div class=""> <br class=""></div></div></div>
</blockquote></div>
<span id="gmail-m_7669255563719717889gmail-m_1724000574568372151gmail-m_5466356221866279500gmail-m_9169069360609563739gmail-m_8313387576528822624cid:f_kyuxa5800" class=""><tt.tar></span></div></blockquote></div><br class=""></div></blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>
</div></blockquote></div><br class=""></body></html>