[petsc-users] [petsc-maint] petsc ksp solver hangs

Smith, Barry F. bsmith at mcs.anl.gov
Mon Sep 30 01:50:46 CDT 2019


   This is just a memory leak in hypre; you might report it to them.  

   Memory leaks don't cause hangs 

   Barry


> On Sep 30, 2019, at 12:32 AM, Michael Wick <michael.wick.1980 at gmail.com> wrote:
> 
> Hi Barry:
> 
> Thanks! I can capture an issue from my local run, although I am not 100% sure this is the reason causing the code hanging.
> 
> When I run with -pc_hypre_boomeramg_relax_type_all Chebyshev, valgrind captures a memory leak:
> 
> ==4410== 192 bytes in 8 blocks are indirectly lost in loss record 1 of 5
> ==4410==    at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==4410==    by 0x73FED84: hypre_HostMalloc (hypre_memory.c:192)
> ==4410==    by 0x73FEE53: hypre_MAllocWithInit (hypre_memory.c:301)
> ==4410==    by 0x73FEF1A: hypre_CAlloc (hypre_memory.c:338)
> ==4410==    by 0x726E4C4: hypre_ParCSRRelax_Cheby_Setup (par_cheby.c:70)
> ==4410==    by 0x7265A4C: hypre_BoomerAMGSetup (par_amg_setup.c:2738)
> ==4410==    by 0x7240F96: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:52)
> ==4410==    by 0x694FFC2: PCSetUp_HYPRE (hypre.c:322)
> ==4410==    by 0x69DBE0F: PCSetUp (precon.c:923)
> ==4410==    by 0x6B2BDDC: KSPSetUp (itfunc.c:381)
> ==4410==    by 0x6B2DABF: KSPSolve (itfunc.c:612)
> 
> Best,
> 
> Mike
> 
> 
> On Sun, Sep 29, 2019 at 9:24 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>    If you have TotalView or DDT or some other parallel debugger you can wait until it is "hanging" and then send a single to  one or more of the processes to stop in and from this get the stack trace. You'll have to figure out for your debugger how that is done.
> 
>    If you can start your 72 rank job in "interactive" mode you can launch it with the option -start_in_debugger noxterm -debugger_nodes 0  then it will only start the debugger on the first rank. Now wait until it hangs and do a control c and then you can type bt to get the traceback.
> 
>   Barry
> 
>   Note it is possible to run 72 rank jobs even on a laptop/workstations/non-cluster (so long as they don't use too much memory and take too long to get to the hang point) and the you can use the debugger as I indicated above.
> 
> 
> > On Sep 28, 2019, at 5:32 AM, Michael Wick via petsc-maint <petsc-maint at mcs.anl.gov> wrote:
> > 
> > I attached a debugger to my run. The code just hangs without throwing an error message, interestingly. I uses 72 processors. I turned on the ksp monitor. And I can see it hangs either at the beginning or the end of KSP iteration. I also uses valgrind to debug my code on my local machine, which does not detect any issue. I uses fgmres + fieldsplit, which is really a standard option.
> > 
> > Do you have any suggestions to do?
> > 
> > On Fri, Sep 27, 2019 at 8:17 PM Zhang, Junchao <jczhang at mcs.anl.gov> wrote:
> > How many MPI ranks did you use? If it is done on your desktop, you can just attach a debugger to a MPI process to see what is going on.
> > 
> > --Junchao Zhang
> > 
> > 
> > On Fri, Sep 27, 2019 at 4:24 PM Michael Wick via petsc-maint <petsc-maint at mcs.anl.gov> wrote:
> > Hi PETSc:
> > 
> > I have been experiencing a code stagnation at certain KSP iterations. This happens rather randomly, which means the code may stop at the middle of a KSP solve and hangs there.
> > 
> > I have used valgrind and detect nothing. I just wonder if you have any suggestions.
> > 
> > Thanks!!!
> > M
> 



More information about the petsc-users mailing list