[petsc-users] Code sometimes work, sometimes hang when increase cpu usage

TAY wee-beng zonexo at gmail.com
Thu Dec 24 22:37:02 CST 2015


Hi,

I tried valgrind in MPI but it aborts very early, with the error msg 
regarding PETSc initialize.

I retry again, using a lower resolution.

GAMG  works, but BoomerAMG and hypre doesn't. Increasing cpu too high 
(80) also cause it to hang. 60 works fine.

My grid size is 98x169x169

But when I increase the resolution, GAMG can't work again.

I tried to increase the cpu no but it still doesn't work.

Previously, using single z direction partition, it work using GAMG and 
hypre. So what could be the problem?

Thank you.

Yours sincerely,

TAY wee-beng

On 25/12/2015 12:33 AM, Matthew Knepley wrote:
> It sounds like you have memory corruption in a different part of the 
> code. Run in valgrind.
>
>   Matt
>
> On Thu, Dec 24, 2015 at 10:14 AM, TAY wee-beng <zonexo at gmail.com 
> <mailto:zonexo at gmail.com>> wrote:
>
>     Hi,
>
>     I have this strange error. I converted my CFD code from a z
>     directon only partition to the yz direction partition. The code
>     works fine but when I increase the cpu no, strange things happen
>     when solving the Poisson eqn.
>
>     I increase cpu no from 24 to 40.
>
>     Sometimes it works, sometimes it doesn't. When it doesn't, it just
>     hangs there with no output, or it gives the error below:
>
>     Using MPI_Barrier during debug shows that it hangs at
>
>     call KSPSolve(ksp,b_rhs,xx,ierr).
>
>     I use hypre BoomerAMG and GAMG (-poisson_pc_gamg_agg_nsmooths 1
>     -poisson_pc_type gamg)
>
>
>     Why is this so random? Also how do I debug this type of problem.
>
>
>     [32]PETSC ERROR:
>     ------------------------------------------------------------------------
>     [32]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>     Violation, probably memory access out of range
>     [32]PETSC ERROR: Try option -start_in_debugger or
>     -on_error_attach_debugger
>     [32]PETSC ERROR: or see
>     http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>     [32]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple
>     Mac OS X to find memory corruption errors
>     [32]PETSC ERROR: likely location of problem given in stack below
>     [32]PETSC ERROR: ---------------------  Stack Frames
>     ------------------------------------
>     [32]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>     available,
>     [32]PETSC ERROR:       INSTEAD the line number of the start of the
>     function
>     [32]PETSC ERROR:       is given.
>     [32]PETSC ERROR: [32] HYPRE_SetupXXX line 174
>     /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/impls/hypre/hypre.c
>     [32]PETSC ERROR: [32] PCSetUp_HYPRE line 122
>     /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/impls/hypre/hypre.c
>     [32]PETSC ERROR: [32] PCSetUp line 945
>     /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/interface/precon.c
>     [32]PETSC ERROR: [32] KSPSetUp line 247
>     /home/wtay/Codes/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c
>     [32]PETSC ERROR: [32] KSPSolve line 510
>     /home/wtay/Codes/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c
>     [32]PETSC ERROR: --------------------- Error Message
>     --------------------------------------------------------------
>     [32]PETSC ERROR: Signal received
>     [32]PETSC ERROR: See
>     http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>     shooting.
>     [32]PETSC ERROR: Petsc Release Version 3.6.2, Oct, 02, 2015
>     [32]PETSC ERROR: ./a.out on a petsc-3.6.2_shared_gnu_debug named
>     n12-40 by wtay Thu Dec 24 17:01:51 2015
>     [32]PETSC ERROR: Configure options
>     --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --download-fblaslapack=1
>     --with-debugging=1 --download-hypre=1
>     --prefix=/home/wtay/Lib/petsc-3.6.2_shared_gnu_debug
>     --known-mpi-shared=1 --with-shared-libraries
>     --with-fortran-interfaces=1
>     [32]PETSC ERROR: #1 User provided function() line 0 in unknown file
>     --------------------------------------------------------------------------
>     MPI_ABORT was invoked on rank 32 in communicator MPI_COMM_WORLD
>     with errorcode 59.
>
>     -- 
>     Thank you.
>
>     Yours sincerely,
>
>     TAY wee-beng
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151225/8d4a320f/attachment.html>


More information about the petsc-users mailing list