[petsc-users] Debug AOCreateBasic
Rongliang Chen
rongliang.chan at gmail.com
Mon Nov 11 23:06:23 CST 2013
Hi Jed,
I tried the mpich version petsc on Janus (configured with option
--download-mpich) and my code stopped at another place. The error
message is followed. Do you have any suggestions?
For the core dump, I emailed the administrators of the Janus for help
about a week ago but have not get any reply yet.
Best,
Rongliang
----------------------------
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the
batch system) has told this process to end
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC
ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: --------------------- Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR: INSTEAD the line number of the start of the function
[0]PETSC ERROR: is given.
[0]PETSC ERROR: [0] PetscBinaryRead line 234
/projects/ronglian/soft/petsc-3.4.3/src/sys/fileio/sysio.c
[0]PETSC ERROR: [0] GetPieceData line 1096 readbinary3d.c
[0]PETSC ERROR: [0] DataReadAndSplitGeneric line 962 readbinary3d.c
[0]PETSC ERROR: [0] DataRead line 621 readbinary3d.c
[0]PETSC ERROR: [0] ReadBinary line 184 readbinary3d.c
[0]PETSC ERROR: [0] LoadGrid line 720 loadgrid3d.c
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: ./fsi3d on a Janus-debug-64bit-mpich named node0718 by
ronglian Mon Nov 11 20:54:09 2013
[0]PETSC ERROR: Libraries linked from
/projects/ronglian/soft/petsc-3.4.3/Janus-debug-64bit-mpich/lib
[0]PETSC ERROR: Configure run at Mon Nov 11 20:49:25 2013
[0]PETSC ERROR: Configure options --known-level1-dcache-size=32768
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=8
--known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
--known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
--known-sizeof-long-long=8 --known-sizeof-float=4
--known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8
--known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4
--known-mpi-long-double=1 --known-mpi-c-double-complex=1
--download-mpich=1 --download-blacs=1 --download-f-blas-lapack=1
--download-metis=1 --download-parmetis=1 --download-scalapack=1
--download-superlu_dist=1 --known-mpi-shared-libraries=0
--with-64-bit-indices --with-batch=1 --download-exodusii=1
--download-hdf5=1 --download-netcdf=1 --known-64-bit-blas-indices
--with-debugging=1
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: User provided function() line 0 in unknown directory
unknown file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
[unset]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the
batch system) has told this process to end
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC
ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
On 11/07/2013 10:38 AM, Jed Brown wrote:
> Rongliang Chen <rongliang.chan at gmail.com> writes:
>
>> Hi Jed,
>>
>> I have not find a way to "dump core on selected ranks" yet and I will
>> continue to do that.
> Ask the administrators at your facility. There are a few common ways,
> but I'm not going to play a guessing game on the mailing list.
>
>> I run my code with the option "-on_error_attach_debugger" and got the
>> following message:
>>
>> --------------------------------------------------------------------------
>> An MPI process has executed an operation involving a call to the
>> "fork()" system call to create a child process. Open MPI is currently
>> operating in a condition that could result in memory corruption or
>> other system errors; your MPI job may hang, crash, or produce silent
>> data corruption. The use of fork() (or system() or other calls that
>> create child processes) is strongly discouraged.
>>
>> The process that invoked fork was:
>>
>> Local host: node1529 (PID 3701)
>> MPI_COMM_WORLD rank: 0
>>
>> If you are *absolutely sure* that your application will successfully
>> and correctly survive a call to fork(), you may disable this warning
>> by setting the mpi_warn_on_fork MCA parameter to 0.
>> --------------------------------------------------------------------------
>> [node1529:03700] 13 more processes have sent help message
>> help-mpi-runtime.txt / mpi_init:warn-fork
>> [node1529:03700] Set MCA parameter "orte_base_help_aggregate" to 0 to
>> see all help / error messages
>> --------------------------------------------------------------------------
>>
>> Is this message useful for the debugging?
> This is just a possibly technical problem attaching a debugger in your
> environment, but you have to actually attach the debugger and poke
> around (stack trace, etc).
>
> Can you create an interactive session and run your job from there?
More information about the petsc-users
mailing list