[petsc-users] Code sometimes work, sometimes hang when increase cpu usage
Barry Smith
bsmith at mcs.anl.gov
Thu Dec 24 22:42:36 CST 2015
> On Dec 24, 2015, at 10:37 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>
> Hi,
>
> I tried valgrind in MPI but it aborts very early, with the error msg regarding PETSc initialize.
It shouldn't "abort" it should print some error message and continue. Please send all the output when running with valgrind.
It is possible you are solving large enough problem that require configure --with-64-bit-indices . Does that resolve the problem?
Barry
>
> I retry again, using a lower resolution.
>
> GAMG works, but BoomerAMG and hypre doesn't. Increasing cpu too high (80) also cause it to hang. 60 works fine.
>
> My grid size is 98x169x169
>
> But when I increase the resolution, GAMG can't work again.
>
> I tried to increase the cpu no but it still doesn't work.
>
> Previously, using single z direction partition, it work using GAMG and hypre. So what could be the problem?
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 25/12/2015 12:33 AM, Matthew Knepley wrote:
>> It sounds like you have memory corruption in a different part of the code. Run in valgrind.
>>
>> Matt
>>
>> On Thu, Dec 24, 2015 at 10:14 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>> Hi,
>>
>> I have this strange error. I converted my CFD code from a z directon only partition to the yz direction partition. The code works fine but when I increase the cpu no, strange things happen when solving the Poisson eqn.
>>
>> I increase cpu no from 24 to 40.
>>
>> Sometimes it works, sometimes it doesn't. When it doesn't, it just hangs there with no output, or it gives the error below:
>>
>> Using MPI_Barrier during debug shows that it hangs at
>>
>> call KSPSolve(ksp,b_rhs,xx,ierr).
>>
>> I use hypre BoomerAMG and GAMG (-poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg)
>>
>>
>> Why is this so random? Also how do I debug this type of problem.
>>
>>
>> [32]PETSC ERROR: ------------------------------------------------------------------------
>> [32]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>> [32]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [32]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [32]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [32]PETSC ERROR: likely location of problem given in stack below
>> [32]PETSC ERROR: --------------------- Stack Frames ------------------------------------
>> [32]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [32]PETSC ERROR: INSTEAD the line number of the start of the function
>> [32]PETSC ERROR: is given.
>> [32]PETSC ERROR: [32] HYPRE_SetupXXX line 174 /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/impls/hypre/hypre.c
>> [32]PETSC ERROR: [32] PCSetUp_HYPRE line 122 /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/impls/hypre/hypre.c
>> [32]PETSC ERROR: [32] PCSetUp line 945 /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/interface/precon.c
>> [32]PETSC ERROR: [32] KSPSetUp line 247 /home/wtay/Codes/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c
>> [32]PETSC ERROR: [32] KSPSolve line 510 /home/wtay/Codes/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c
>> [32]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [32]PETSC ERROR: Signal received
>> [32]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [32]PETSC ERROR: Petsc Release Version 3.6.2, Oct, 02, 2015
>> [32]PETSC ERROR: ./a.out on a petsc-3.6.2_shared_gnu_debug named n12-40 by wtay Thu Dec 24 17:01:51 2015
>> [32]PETSC ERROR: Configure options --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --download-fblaslapack=1 --with-debugging=1 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_gnu_debug --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1
>> [32]PETSC ERROR: #1 User provided function() line 0 in unknown file
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 32 in communicator MPI_COMM_WORLD
>> with errorcode 59.
>>
>> --
>> Thank you.
>>
>> Yours sincerely,
>>
>> TAY wee-beng
>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>
More information about the petsc-users
mailing list