[petsc-users] [petsc-maint] petsc ksp solver hangs

Michael Wick michael.wick.1980 at gmail.com
Mon Sep 30 00:32:01 CDT 2019


Hi Barry:

Thanks! I can capture an issue from my local run, although I am not 100%
sure this is the reason causing the code hanging.

When I run with -pc_hypre_boomeramg_relax_type_all Chebyshev, valgrind
captures a memory leak:

==4410== 192 bytes in 8 blocks are indirectly lost in loss record 1 of 5
==4410==    at 0x4C2FB55: calloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4410==    by 0x73FED84: hypre_HostMalloc (hypre_memory.c:192)
==4410==    by 0x73FEE53: hypre_MAllocWithInit (hypre_memory.c:301)
==4410==    by 0x73FEF1A: hypre_CAlloc (hypre_memory.c:338)
==4410==    by 0x726E4C4: hypre_ParCSRRelax_Cheby_Setup (par_cheby.c:70)
==4410==    by 0x7265A4C: hypre_BoomerAMGSetup (par_amg_setup.c:2738)
==4410==    by 0x7240F96: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:52)
==4410==    by 0x694FFC2: PCSetUp_HYPRE (hypre.c:322)
==4410==    by 0x69DBE0F: PCSetUp (precon.c:923)
==4410==    by 0x6B2BDDC: KSPSetUp (itfunc.c:381)
==4410==    by 0x6B2DABF: KSPSolve (itfunc.c:612)

Best,

Mike


On Sun, Sep 29, 2019 at 9:24 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>    If you have TotalView or DDT or some other parallel debugger you can
> wait until it is "hanging" and then send a single to  one or more of the
> processes to stop in and from this get the stack trace. You'll have to
> figure out for your debugger how that is done.
>
>    If you can start your 72 rank job in "interactive" mode you can launch
> it with the option -start_in_debugger noxterm -debugger_nodes 0  then it
> will only start the debugger on the first rank. Now wait until it hangs and
> do a control c and then you can type bt to get the traceback.
>
>   Barry
>
>   Note it is possible to run 72 rank jobs even on a
> laptop/workstations/non-cluster (so long as they don't use too much memory
> and take too long to get to the hang point) and the you can use the
> debugger as I indicated above.
>
>
> > On Sep 28, 2019, at 5:32 AM, Michael Wick via petsc-maint <
> petsc-maint at mcs.anl.gov> wrote:
> >
> > I attached a debugger to my run. The code just hangs without throwing an
> error message, interestingly. I uses 72 processors. I turned on the ksp
> monitor. And I can see it hangs either at the beginning or the end of KSP
> iteration. I also uses valgrind to debug my code on my local machine, which
> does not detect any issue. I uses fgmres + fieldsplit, which is really a
> standard option.
> >
> > Do you have any suggestions to do?
> >
> > On Fri, Sep 27, 2019 at 8:17 PM Zhang, Junchao <jczhang at mcs.anl.gov>
> wrote:
> > How many MPI ranks did you use? If it is done on your desktop, you can
> just attach a debugger to a MPI process to see what is going on.
> >
> > --Junchao Zhang
> >
> >
> > On Fri, Sep 27, 2019 at 4:24 PM Michael Wick via petsc-maint <
> petsc-maint at mcs.anl.gov> wrote:
> > Hi PETSc:
> >
> > I have been experiencing a code stagnation at certain KSP iterations.
> This happens rather randomly, which means the code may stop at the middle
> of a KSP solve and hangs there.
> >
> > I have used valgrind and detect nothing. I just wonder if you have any
> suggestions.
> >
> > Thanks!!!
> > M
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190929/766ec0a2/attachment.html>


More information about the petsc-users mailing list