[petsc-users] signal received error; MatNullSpaceTest; Stokes flow solver with pc fieldsplit and schur complement

Bishesh Khanal bisheshkh at gmail.com
Thu Oct 17 09:26:19 CDT 2013


On Thu, Oct 17, 2013 at 3:47 PM, Bishesh Khanal <bisheshkh at gmail.com> wrote:

>
>
>
> On Thu, Oct 17, 2013 at 3:00 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>
>> Bishesh Khanal <bisheshkh at gmail.com> writes:
>> > The program crashes only for a bigger domain size. Even in the cluster,
>> it
>> > does not crash for the domain size up to a certain size.  So I need to
>> run
>> > in the debugger for the case when it crashes to get the stack trace from
>> > the SEGV, right ? I do not know how to attach a debugger when
>> submitting a
>> > job to the cluster if that is possible at all!
>>
>> Most machines allow you to get "interactive" sessions.  You can usually
>> run debuggers within those.  Some facilities also have commercial
>> debuggers.
>>
>
> Thanks, I'll have a look at that.
>
>
>>
>> > Or are you asking me to run the program in the debugger in my laptop
>> > for the biggest size ? (I have not tried running the code for the
>> > biggest size in my laptop fearing it might take forever)
>>
>> Your laptop probably doesn't have enough memory for that.
>>
>
> Yes, I tried it just a while ago and this is happened I think. (Just to
> confirm, I have put the error message for this case at the very end of this
> reply.*)
>
>
>>
>> Can you try running on the cluster with one MPI rank per node?  We
>> should rule out simple out-of-memory problems, confirm that the code
>> executes correctly with MPICH, and finally figure out why it fails with
>> Open MPI (assuming that the previous hunch was correct).
>>
>>
> I tried running on the cluster with one core per node with 4 nodes and I
got the following errors (note: using valgrind, and openmpi of the cluster)
at the very end after the many usual "unconditional jump ... errors"  which
might be interesting

mpiexec: killing job...

mpiexec: abort is already in progress...hit ctrl-c again to forcibly
terminate

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the
batch system) has told this process to end
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR:
or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: [0] MatSetValues_MPIAIJ line 505
/tmp/petsc-3.4.3/src/mat/impls/aij/mpi/mpiaij.c
[0]PETSC ERROR: [0] MatSetValues line 1071
/tmp/petsc-3.4.3/src/mat/interface/matrix.c
[0]PETSC ERROR: [0] MatSetValuesLocal line 1935
/tmp/petsc-3.4.3/src/mat/interface/matrix.c
[0]PETSC ERROR: [0] DMCreateMatrix_DA_3d_MPIAIJ line 1051
/tmp/petsc-3.4.3/src/dm/impls/da/fdda.c
[0]PETSC ERROR: [0] DMCreateMatrix_DA line 627
/tmp/petsc-3.4.3/src/dm/impls/da/fdda.c
[0]PETSC ERROR: [0] DMCreateMatrix line 900
/tmp/petsc-3.4.3/src/dm/interface/dm.c
[0]PETSC ERROR: [0] KSPSetUp line 192
/tmp/petsc-3.4.3/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: [0] solveModel line 122
"unknowndirectory/"/epi/asclepios2/bkhanal/works/AdLemModel/src/PetscAdLemTaras3D.cxx
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR:
/epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a
arch-linux2-cxx-debug named nef002 by bkhanal Thu Oct 17 15:55:33 2013
[0]PETSC ERROR: Libraries linked from /epi/asclepios2/bkhanal/petscDebug/lib
[0]PETSC ERROR: Configure run at Wed Oct 16 14:18:48 2013
[0]PETSC ERROR: Configure options --with-mpi-dir=/opt/openmpi-gcc/current/
--with-shared-libraries --prefix=/epi/asclepios2/bkhanal/petscDebug
-download-f-blas-lapack=1 --download-metis --download-parmetis
--download-superlu_dist --download-scalapack --download-mumps
--download-hypre --with-clanguage=cxx
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: User provided function() line 0 in unknown directory
unknown file
==47363==
==47363== HEAP SUMMARY:
==47363==     in use at exit: 10,939,838,029 bytes in 8,091 blocks
==47363==   total heap usage: 1,936,963 allocs, 1,928,872 frees,
11,530,164,042 bytes allocated
==47363==

Does it mean it is crashing near MatSetValues_MPIAIJ ?



> I'm sorry but I'm a complete beginner with MPI and clusters; so what does
> one MPI rank per node means and what should I do to do that ? My guess is
> that I set one core per node and use multiple nodes in my job script file ?
> Or do I need to do something in the petsc code ?
>
> *Here is the error I get when running for the full domain size in my
> laptop:
> [3]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [3]PETSC ERROR: Out of memory. This could be due to allocating
> [3]PETSC ERROR: too large an object or bleeding by not properly
> [3]PETSC ERROR: destroying unneeded objects.
> [1]PETSC ERROR: Memory allocated 0 Memory used by process 1700159488
> [1]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
> [1]PETSC ERROR: Memory requested 6234924800!
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013
> [1]PETSC ERROR: See docs/changes/index.html for recent updates.
> [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [1]PETSC ERROR: See docs/index.html for manual pages.
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: [2]PETSC ERROR: Memory allocated 0 Memory used by process
> 1695793152
> [2]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
> [2]PETSC ERROR: Memory requested 6223582208!
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013
> [2]PETSC ERROR: See docs/changes/index.html for recent updates.
> [2]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [2]PETSC ERROR: See docs/index.html for manual pages.
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: src/AdLemMain on a arch-linux2-cxx-debug named edwards by
> bkhanal Thu Oct 17 15:19:22 2013
> [1]PETSC ERROR: Libraries linked from
> /home/bkhanal/Documents/softwares/petsc-3.4.3/arch-linux2-cxx-debug/lib
> [1]PETSC ERROR: Configure run at Wed Oct 16 15:13:05 2013
> [1]PETSC ERROR: Configure options --download-mpich
> -download-f-blas-lapack=1 --download-metis --download-parmetis
> --download-superlu_dist --download-scalapack --download-mumps
> --download-hypre --with-clanguage=cxx
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: PetscMallocAlign() line 46 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/sys/memory/mal.c
> src/AdLemMain on a arch-linux2-cxx-debug named edwards by bkhanal Thu Oct
> 17 15:19:22 2013
> [2]PETSC ERROR: Libraries linked from
> /home/bkhanal/Documents/softwares/petsc-3.4.3/arch-linux2-cxx-debug/lib
> [2]PETSC ERROR: Configure run at Wed Oct 16 15:13:05 2013
> [2]PETSC ERROR: Configure options --download-mpich
> -download-f-blas-lapack=1 --download-metis --download-parmetis
> --download-superlu_dist --download-scalapack --download-mumps
> --download-hypre --with-clanguage=cxx
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: PetscMallocAlign() line 46 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/sys/memory/mal.c
> [1]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 3551 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/seq/aij.c
> [1]PETSC ERROR: MatSeqAIJSetPreallocation() line 3496 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/seq/aij.c
> [2]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 3551 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/seq/aij.c
> [2]PETSC ERROR: MatSeqAIJSetPreallocation() line 3496 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/seq/aij.c
> [1]PETSC ERROR: MatMPIAIJSetPreallocation_MPIAIJ() line 3307 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/mpi/mpiaij.c
> [1]PETSC ERROR: MatMPIAIJSetPreallocation() line 4015 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/mpi/mpiaij.c
> [2]PETSC ERROR: MatMPIAIJSetPreallocation_MPIAIJ() line 3307 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/mpi/mpiaij.c
> [2]PETSC ERROR: MatMPIAIJSetPreallocation() line 4015 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/mpi/mpiaij.c
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Out of memory. This could be due to allocating
> [0]PETSC ERROR: too large an object or bleeding by not properly
> [0]PETSC ERROR: destroying unneeded objects.
> [2]PETSC ERROR: [1]PETSC ERROR: DMCreateMatrix_DA_3d_MPIAIJ() line 1101 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/impls/da/fdda.c
> [1]PETSC ERROR: DMCreateMatrix_DA_3d_MPIAIJ() line 1101 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/impls/da/fdda.c
> [2]PETSC ERROR: DMCreateMatrix_DA() line 771 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/impls/da/fdda.c
> DMCreateMatrix_DA() line 771 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/impls/da/fdda.c
> [3]PETSC ERROR: Memory allocated 0 Memory used by process 1675407360
> [3]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
> [3]PETSC ERROR: Memory requested 6166659200!
> [3]PETSC ERROR:
> ------------------------------------------------------------------------
> [3]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013
> [3]PETSC ERROR: See docs/changes/index.html for recent updates.
> [3]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [3]PETSC ERROR: See docs/index.html for manual pages.
> [3]PETSC ERROR:
> ------------------------------------------------------------------------
> [3]PETSC ERROR: src/AdLemMain on a arch-linux2-cxx-debug named edwards by
> bkhanal Thu Oct 17 15:19:22 2013
> [3]PETSC ERROR: Libraries linked from
> /home/bkhanal/Documents/softwares/petsc-3.4.3/arch-linux2-cxx-debug/lib
> [3]PETSC ERROR: Configure run at Wed Oct 16 15:13:05 2013
> [3]PETSC ERROR: Configure options --download-mpich
> -download-f-blas-lapack=1 --download-metis --download-parmetis
> --download-superlu_dist --download-scalapack --download-mumps
> --download-hypre --with-clanguage=cxx
> [3]PETSC ERROR:
> ------------------------------------------------------------------------
> [3]PETSC ERROR: [1]PETSC ERROR: DMCreateMatrix() line 910 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/interface/dm.c
> [2]PETSC ERROR: DMCreateMatrix() line 910 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/interface/dm.c
> PetscMallocAlign() line 46 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/sys/memory/mal.c
> [3]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 3551 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/seq/aij.c
> [3]PETSC ERROR: MatSeqAIJSetPreallocation() line 3496 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/seq/aij.c
> [1]PETSC ERROR: KSPSetUp() line 207 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/ksp/ksp/interface/itfunc.c
> [2]PETSC ERROR: KSPSetUp() line 207 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: MatMPIAIJSetPreallocation_MPIAIJ() line 3307 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/mpi/mpiaij.c
> [3]PETSC ERROR: MatMPIAIJSetPreallocation() line 4015 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/mpi/mpiaij.c
> [3]PETSC ERROR: DMCreateMatrix_DA_3d_MPIAIJ() line 1101 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/impls/da/fdda.c
> [3]PETSC ERROR: DMCreateMatrix_DA() line 771 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/impls/da/fdda.c
> [3]PETSC ERROR: DMCreateMatrix() line 910 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/interface/dm.c
> [3]PETSC ERROR: KSPSetUp() line 207 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/ksp/ksp/interface/itfunc.c
> [1]PETSC ERROR: solveModel() line 128 in
> "unknowndirectory/"/user/bkhanal/home/works/AdLemModel/src/PetscAdLemTaras3D.cxx
> [2]PETSC ERROR: solveModel() line 128 in
> "unknowndirectory/"/user/bkhanal/home/works/AdLemModel/src/PetscAdLemTaras3D.cxx
> [3]PETSC ERROR: solveModel() line 128 in
> "unknowndirectory/"/user/bkhanal/home/works/AdLemModel/src/PetscAdLemTaras3D.cxx
> [0]PETSC ERROR: Memory allocated 0 Memory used by process 1711476736
> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
> [0]PETSC ERROR: Memory requested 6292477952!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: src/AdLemMain on a arch-linux2-cxx-debug named edwards by
> bkhanal Thu Oct 17 15:19:22 2013
> [0]PETSC ERROR: Libraries linked from
> /home/bkhanal/Documents/softwares/petsc-3.4.3/arch-linux2-cxx-debug/lib
> [0]PETSC ERROR: Configure run at Wed Oct 16 15:13:05 2013
> [0]PETSC ERROR: Configure options --download-mpich
> -download-f-blas-lapack=1 --download-metis --download-parmetis
> --download-superlu_dist --download-scalapack --download-mumps
> --download-hypre --with-clanguage=cxx
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: PetscMallocAlign() line 46 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/sys/memory/mal.c
> [0]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 3551 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/seq/aij.c
> [0]PETSC ERROR: MatSeqAIJSetPreallocation() line 3496 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/seq/aij.c
> [0]PETSC ERROR: MatMPIAIJSetPreallocation_MPIAIJ() line 3307 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/mpi/mpiaij.c
> [0]PETSC ERROR: MatMPIAIJSetPreallocation() line 4015 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/mat/impls/aij/mpi/mpiaij.c
> [0]PETSC ERROR: DMCreateMatrix_DA_3d_MPIAIJ() line 1101 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/impls/da/fdda.c
> [0]PETSC ERROR: DMCreateMatrix_DA() line 771 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/impls/da/fdda.c
> [0]PETSC ERROR: DMCreateMatrix() line 910 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/dm/interface/dm.c
> [0]PETSC ERROR: KSPSetUp() line 207 in
> /home/bkhanal/Documents/softwares/petsc-3.4.3/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: solveModel() line 128 in
> "unknowndirectory/"/user/bkhanal/home/works/AdLemModel/src/PetscAdLemTaras3D.cxx
> --9345:0:aspacem  Valgrind: FATAL: VG_N_SEGMENTS is too low.
> --9345:0:aspacem    Increase it and rebuild.  Exiting now.
> --9344:0:aspacem  Valgrind: FATAL: VG_N_SEGMENTS is too low.
> --9344:0:aspacem    Increase it and rebuild.  Exiting now.
> --9343:0:aspacem  Valgrind: FATAL: VG_N_SEGMENTS is too low.
> --9343:0:aspacem    Increase it and rebuild.  Exiting now.
> --9346:0:aspacem  Valgrind: FATAL: VG_N_SEGMENTS is too low.
> --9346:0:aspacem    Increase it and rebuild.  Exiting now.
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20131017/883229c3/attachment-0001.html>


More information about the petsc-users mailing list