[petsc-users] Error with parallel solve

Manav Bhatia bhatiamanav at gmail.com
Mon Apr 8 12:12:06 CDT 2019


Hi,
  
    I am running a code a nonlinear simulation using mesh-refinement on libMesh. The code runs without issues on a Mac (can run for days without issues), but crashes on Linux (Centos 6). I am using version 3.11 on Linux with openmpi 3.1.3 and gcc8.2. 

    I tried to use the -on_error_attach_debugger, but it only gave me this message. Does this message imply something to the more experienced eyes? 

    I am going to try to build a debug version of petsc to figure out what is going wrong. I will get and share more detailed logs in a bit. 

Regards,
Manav

[8]PETSC ERROR: ------------------------------------------------------------------------
[8]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[8]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[8]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[8]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[8]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[8]PETSC ERROR: to get more information on the crash.
[8]PETSC ERROR: User provided function() line 0 in  unknown file  
PETSC: Attaching gdb to /cavs/projects/brg_codes/users/bhatia/mast/mast_topology/opt/examples/structural/example_5/structural_example_5 of pid 2108 on display localhost:10.0 on machine Warhawk1.HPC.MsState.Edu
PETSC: Attaching gdb to /cavs/projects/brg_codes/users/bhatia/mast/mast_topology/opt/examples/structural/example_5/structural_example_5 of pid 2112 on display localhost:10.0 on machine Warhawk1.HPC.MsState.Edu
           0 :INTERNAL Error: recvd root arrowhead 
           0 :not belonging to me. IARR,JARR=       67525       67525
           0 :IROW_GRID,JCOL_GRID=           0           4
           0 :MYROW, MYCOL=           0           0
           0 :IPOSROOT,JPOSROOT=    92264688    92264688
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -99.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190408/25b954eb/attachment.html>


More information about the petsc-users mailing list