[petsc-users] Code sometimes work, sometimes hang when increase cpu usage

TAY wee-beng zonexo at gmail.com
Wed Dec 30 19:20:09 CST 2015


Hi,

I've been debugging and removing some errors. Now the code works on most 
cluster nodes but fails on 2 of them. The strange thing is that I'm 
using the same gnu compiler but only deploying to the newer setup nodes.

The newer nodes work when using my old code, which is similar except 
that its domain partition is only in the z direction. The new code 
partitions in y and z direction.

It fails at the Poisson eqn solving part. Is there anyway I can find out 
why this is happening?

Thank you.

Yours sincerely,

TAY wee-beng

On 25/12/2015 10:29 PM, Matthew Knepley wrote:
> It appears that you have an uninitialized variable (or more than one). 
> When compiled with debugging, variables
> are normally initialized to zero.
>
>   Thanks,
>
>      Matt
>
> On Fri, Dec 25, 2015 at 5:41 AM, TAY wee-beng <zonexo at gmail.com 
> <mailto:zonexo at gmail.com>> wrote:
>
>     Hi,
>
>     Sorry, there seems to be some problems with my valgrind. I have
>     repeated it again, with the optimized and debug version
>
>
>
>
>
>     Thank you.
>
>     Yours sincerely,
>
>     TAY wee-beng
>
>     On 25/12/2015 12:42 PM, Barry Smith wrote:
>
>             On Dec 24, 2015, at 10:37 PM, TAY wee-beng
>             <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>
>             Hi,
>
>             I tried valgrind in MPI but it aborts very early, with the
>             error msg regarding PETSc initialize.
>
>            It shouldn't "abort" it should print some error message and
>         continue. Please send all the output when running with valgrind.
>
>             It is possible you are solving large enough problem that
>         require configure --with-64-bit-indices . Does that resolve
>         the problem?
>
>            Barry
>
>             I retry again, using a lower resolution.
>
>             GAMG  works, but BoomerAMG and hypre doesn't. Increasing
>             cpu too high (80) also cause it to hang. 60 works fine.
>
>             My grid size is 98x169x169
>
>             But when I increase the resolution, GAMG can't work again.
>
>             I tried to increase the cpu no but it still doesn't work.
>
>             Previously, using single z direction partition, it work
>             using GAMG and hypre. So what could be the problem?
>             Thank you.
>
>             Yours sincerely,
>
>             TAY wee-beng
>
>             On 25/12/2015 12:33 AM, Matthew Knepley wrote:
>
>                 It sounds like you have memory corruption in a
>                 different part of the code. Run in valgrind.
>
>                    Matt
>
>                 On Thu, Dec 24, 2015 at 10:14 AM, TAY wee-beng
>                 <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>                 Hi,
>
>                 I have this strange error. I converted my CFD code
>                 from a z directon only partition to the yz direction
>                 partition. The code works fine but when I increase the
>                 cpu no, strange things happen when solving the Poisson
>                 eqn.
>
>                 I increase cpu no from 24 to 40.
>
>                 Sometimes it works, sometimes it doesn't. When it
>                 doesn't, it just hangs there with no output, or it
>                 gives the error below:
>
>                 Using MPI_Barrier during debug shows that it hangs at
>
>                 call KSPSolve(ksp,b_rhs,xx,ierr).
>
>                 I use hypre BoomerAMG and GAMG
>                 (-poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg)
>
>
>                 Why is this so random? Also how do I debug this type
>                 of problem.
>
>
>                 [32]PETSC ERROR:
>                 ------------------------------------------------------------------------
>                 [32]PETSC ERROR: Caught signal number 11 SEGV:
>                 Segmentation Violation, probably memory access out of
>                 range
>                 [32]PETSC ERROR: Try option -start_in_debugger or
>                 -on_error_attach_debugger
>                 [32]PETSC ERROR: or see
>                 http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>                 [32]PETSC ERROR: or try http://valgrind.org on
>                 GNU/linux and Apple Mac OS X to find memory corruption
>                 errors
>                 [32]PETSC ERROR: likely location of problem given in
>                 stack below
>                 [32]PETSC ERROR: ---------------------  Stack Frames
>                 ------------------------------------
>                 [32]PETSC ERROR: Note: The EXACT line numbers in the
>                 stack are not available,
>                 [32]PETSC ERROR:       INSTEAD the line number of the
>                 start of the function
>                 [32]PETSC ERROR:       is given.
>                 [32]PETSC ERROR: [32] HYPRE_SetupXXX line 174
>                 /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/impls/hypre/hypre.c
>                 [32]PETSC ERROR: [32] PCSetUp_HYPRE line 122
>                 /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/impls/hypre/hypre.c
>                 [32]PETSC ERROR: [32] PCSetUp line 945
>                 /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/interface/precon.c
>                 [32]PETSC ERROR: [32] KSPSetUp line 247
>                 /home/wtay/Codes/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c
>                 [32]PETSC ERROR: [32] KSPSolve line 510
>                 /home/wtay/Codes/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c
>                 [32]PETSC ERROR: --------------------- Error Message
>                 --------------------------------------------------------------
>                 [32]PETSC ERROR: Signal received
>                 [32]PETSC ERROR: See
>                 http://www.mcs.anl.gov/petsc/documentation/faq.html
>                 for trouble shooting.
>                 [32]PETSC ERROR: Petsc Release Version 3.6.2, Oct, 02,
>                 2015
>                 [32]PETSC ERROR: ./a.out on a
>                 petsc-3.6.2_shared_gnu_debug named n12-40 by wtay Thu
>                 Dec 24 17:01:51 2015
>                 [32]PETSC ERROR: Configure options
>                 --with-mpi-dir=/opt/ud/openmpi-1.8.8/
>                 --download-fblaslapack=1 --with-debugging=1
>                 --download-hypre=1
>                 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_gnu_debug
>                 --known-mpi-shared=1 --with-shared-libraries
>                 --with-fortran-interfaces=1
>                 [32]PETSC ERROR: #1 User provided function() line 0
>                 in  unknown file
>                 --------------------------------------------------------------------------
>                 MPI_ABORT was invoked on rank 32 in communicator
>                 MPI_COMM_WORLD
>                 with errorcode 59.
>
>                 -- 
>                 Thank you.
>
>                 Yours sincerely,
>
>                 TAY wee-beng
>
>
>
>
>                 -- 
>                 What most experimenters take for granted before they
>                 begin their experiments is infinitely more interesting
>                 than any results to which their experiments lead.
>                 -- Norbert Wiener
>
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151231/331111e9/attachment.html>


More information about the petsc-users mailing list