<div dir="ltr"><div>Hi Barry:</div><div><br></div><div>Thanks! I can capture an issue from my local run, although I am not 100% sure this is the reason causing the code hanging.</div><div><br></div><div>When I run with -pc_hypre_boomeramg_relax_type_all Chebyshev, valgrind captures a memory leak:</div><div><br></div><div>==4410== 192 bytes in 8 blocks are indirectly lost in loss record 1 of 5<br>==4410== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)<br>==4410== by 0x73FED84: hypre_HostMalloc (hypre_memory.c:192)<br>==4410== by 0x73FEE53: hypre_MAllocWithInit (hypre_memory.c:301)<br>==4410== by 0x73FEF1A: hypre_CAlloc (hypre_memory.c:338)<br>==4410== by 0x726E4C4: hypre_ParCSRRelax_Cheby_Setup (par_cheby.c:70)<br>==4410== by 0x7265A4C: hypre_BoomerAMGSetup (par_amg_setup.c:2738)<br>==4410== by 0x7240F96: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:52)<br>==4410== by 0x694FFC2: PCSetUp_HYPRE (hypre.c:322)<br>==4410== by 0x69DBE0F: PCSetUp (precon.c:923)<br>==4410== by 0x6B2BDDC: KSPSetUp (itfunc.c:381)<br>==4410== by 0x6B2DABF: KSPSolve (itfunc.c:612)</div><div><br></div><div>Best,</div><div><br></div><div>Mike<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Sep 29, 2019 at 9:24 AM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
If you have TotalView or DDT or some other parallel debugger you can wait until it is "hanging" and then send a single to one or more of the processes to stop in and from this get the stack trace. You'll have to figure out for your debugger how that is done.<br>
<br>
If you can start your 72 rank job in "interactive" mode you can launch it with the option -start_in_debugger noxterm -debugger_nodes 0 then it will only start the debugger on the first rank. Now wait until it hangs and do a control c and then you can type bt to get the traceback.<br>
<br>
Barry<br>
<br>
Note it is possible to run 72 rank jobs even on a laptop/workstations/non-cluster (so long as they don't use too much memory and take too long to get to the hang point) and the you can use the debugger as I indicated above.<br>
<br>
<br>
> On Sep 28, 2019, at 5:32 AM, Michael Wick via petsc-maint <<a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a>> wrote:<br>
> <br>
> I attached a debugger to my run. The code just hangs without throwing an error message, interestingly. I uses 72 processors. I turned on the ksp monitor. And I can see it hangs either at the beginning or the end of KSP iteration. I also uses valgrind to debug my code on my local machine, which does not detect any issue. I uses fgmres + fieldsplit, which is really a standard option.<br>
> <br>
> Do you have any suggestions to do?<br>
> <br>
> On Fri, Sep 27, 2019 at 8:17 PM Zhang, Junchao <<a href="mailto:jczhang@mcs.anl.gov" target="_blank">jczhang@mcs.anl.gov</a>> wrote:<br>
> How many MPI ranks did you use? If it is done on your desktop, you can just attach a debugger to a MPI process to see what is going on.<br>
> <br>
> --Junchao Zhang<br>
> <br>
> <br>
> On Fri, Sep 27, 2019 at 4:24 PM Michael Wick via petsc-maint <<a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a>> wrote:<br>
> Hi PETSc:<br>
> <br>
> I have been experiencing a code stagnation at certain KSP iterations. This happens rather randomly, which means the code may stop at the middle of a KSP solve and hangs there.<br>
> <br>
> I have used valgrind and detect nothing. I just wonder if you have any suggestions.<br>
> <br>
> Thanks!!!<br>
> M<br>
<br>
</blockquote></div>