<div dir="ltr">Yes, the problem is that each KSP solver is running in an OMP thread (So at this point it only works for SELF and its Landau so it is all I need). It looks like MPI reductions called with a comm_self are not thread safe (eg, the could say, this is one proc, thus, just copy send --> recv, but they don't)</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">It looks like PETSc is just too clever for me. I am trying to get a different MPI_Comm into each block, but PETSc is thwarting me:</div></blockquote><div><br></div><div>It looks like you are using SELF. Is that what you want? Do you want a bunch of comms with the same group, but independent somehow? I am confused.</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div> if (jac->use_openmp) {<br> ierr = KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr);<br>PetscPrintf(PETSC_COMM_SELF,"In PCFieldSplitSetFields_FieldSplit with -------------- link: %p. Comms %p %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp));<br> } else {<br> ierr = KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr);<br> }<br></div><div><br></div><div>produces:</div><div><br></div><div>In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x7e9cb4f0. Comms 0x660c6ad0 0x660c6ad0<br>In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x7e88f7d0. Comms 0x660c6ad0 0x660c6ad0<br></div><div><br></div><div>How can I work around this?</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><br><div><br><blockquote type="cite"><div>On Jan 20, 2021, at 3:09 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:</div><br><div><div dir="ltr">So I put in a temporary hack to get the first Fieldsplit apply to NOT use OMP and it sort of works. <div><br></div><div>Preonly/lu is fine. GMRES calls vector creates/dups in every solve so that is a big problem. </div></div></div></blockquote><div><br></div> It should definitely not be creating vectors "in every" solve. But it does do lazy allocation of needed restarted vectors which may make it look like it is creating "every" vectors in every solve. You can use -ksp_gmres_preallocate to force it to create all the restart vectors up front at KSPSetUp(). </div></div></blockquote><div><br></div><div>Well, I run the first solve w/o OMP and I see Vec dups in cuSparse Vecs in the 2nd solve. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><br></div><div> Why is creating vectors "at every solve" a problem? It is not thread safe I guess?</div></div></blockquote><div><br></div><div>It dies when it looks at the options database, in a Free in the get-options method to be exact (see stacks). </div><div><br></div><div>======= Backtrace: =========<br>/lib64/libc.so.6(cfree+0x4a0)[0x200021839be0]<br>/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c]<br>/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0]<br>/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28]<br>/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510]<br>/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664]<br>/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0]<br>/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198]<br>/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168]<br>/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820]<br>/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4]<br>/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec]<br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><br><blockquote type="cite"><div><div dir="ltr"><div>Richardson works except the convergence test gets confused, presumably because MPI reductions with PETSC_COMM_SELF is not threadsafe.</div></div></div></blockquote><div><br></div><blockquote type="cite"><div><div dir="ltr"><div><br></div><div>One fix for the norms might be to create each subdomain solver with a different communicator.</div></div></div></blockquote><div><br></div> Yes you could do that. It might actually be the correct thing to do also, if you have multiple threads call MPI reductions on the same communicator that would be a problem. Each KSP should get a new MPI_Comm. </div></div></blockquote><div><br></div><div>OK. I will only do this.</div><div><br></div></div></div>
</blockquote></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div>