[petsc-users] Debug AOCreateBasic
Matthew Knepley
knepley at gmail.com
Tue Nov 12 05:45:32 CST 2013
On Mon, Nov 11, 2013 at 11:06 PM, Rongliang Chen
<rongliang.chan at gmail.com>wrote:
> Hi Jed,
>
> I tried the mpich version petsc on Janus (configured with option
> --download-mpich) and my code stopped at another place. The error message
> is followed. Do you have any suggestions?
>
1) I believe you said that you ran under valgrind without errors, so we
guess that GetPieceData() is fine.
2) I think it is quite unlikely that there is an error in PetscBinaryRead()
3) Wrong file size or permission should not cause an SEGV
4) Thus to me it clearly looks like a driver issue here with the
filesystem. If this is reproducible, it should be easy
for the administrators of the machine to look at, and this is
definitely there job. Move up the hierarchy now.
Matt
> For the core dump, I emailed the administrators of the Janus for help
> about a week ago but have not get any reply yet.
>
> Best,
> Rongliang
>
> ----------------------------
> [0]PETSC ERROR: ------------------------------
> ------------------------------------------
> [0]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the
> batch system) has told this process to end
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/
> documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.orgon GNU/linux and Apple Mac OS X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR: INSTEAD the line number of the start of the function
> [0]PETSC ERROR: is given.
> [0]PETSC ERROR: [0] PetscBinaryRead line 234 /projects/ronglian/soft/petsc-
> 3.4.3/src/sys/fileio/sysio.c
> [0]PETSC ERROR: [0] GetPieceData line 1096 readbinary3d.c
> [0]PETSC ERROR: [0] DataReadAndSplitGeneric line 962 readbinary3d.c
> [0]PETSC ERROR: [0] DataRead line 621 readbinary3d.c
> [0]PETSC ERROR: [0] ReadBinary line 184 readbinary3d.c
> [0]PETSC ERROR: [0] LoadGrid line 720 loadgrid3d.c
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Signal received!
> [0]PETSC ERROR: ------------------------------
> ------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR: ------------------------------
> ------------------------------------------
> [0]PETSC ERROR: ./fsi3d on a Janus-debug-64bit-mpich named node0718 by
> ronglian Mon Nov 11 20:54:09 2013
> [0]PETSC ERROR: Libraries linked from /projects/ronglian/soft/petsc-
> 3.4.3/Janus-debug-64bit-mpich/lib
> [0]PETSC ERROR: Configure run at Mon Nov 11 20:49:25 2013
> [0]PETSC ERROR: Configure options --known-level1-dcache-size=32768
> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=8
> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8
> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1
> --known-mpi-c-double-complex=1 --download-mpich=1 --download-blacs=1
> --download-f-blas-lapack=1 --download-metis=1 --download-parmetis=1
> --download-scalapack=1 --download-superlu_dist=1
> --known-mpi-shared-libraries=0 --with-64-bit-indices --with-batch=1
> --download-exodusii=1 --download-hdf5=1 --download-netcdf=1
> --known-64-bit-blas-indices --with-debugging=1
> [0]PETSC ERROR: ------------------------------
> ------------------------------------------
> [0]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> [unset]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> [0]PETSC ERROR: ------------------------------
> ------------------------------------------
> [0]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the
> batch system) has told this process to end
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/
> documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.orgon GNU/linux and Apple Mac OS X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
>
>
> On 11/07/2013 10:38 AM, Jed Brown wrote:
>
>> Rongliang Chen <rongliang.chan at gmail.com> writes:
>>
>> Hi Jed,
>>>
>>> I have not find a way to "dump core on selected ranks" yet and I will
>>> continue to do that.
>>>
>> Ask the administrators at your facility. There are a few common ways,
>> but I'm not going to play a guessing game on the mailing list.
>>
>> I run my code with the option "-on_error_attach_debugger" and got the
>>> following message:
>>>
>>> ------------------------------------------------------------
>>> --------------
>>> An MPI process has executed an operation involving a call to the
>>> "fork()" system call to create a child process. Open MPI is currently
>>> operating in a condition that could result in memory corruption or
>>> other system errors; your MPI job may hang, crash, or produce silent
>>> data corruption. The use of fork() (or system() or other calls that
>>> create child processes) is strongly discouraged.
>>>
>>> The process that invoked fork was:
>>>
>>> Local host: node1529 (PID 3701)
>>> MPI_COMM_WORLD rank: 0
>>>
>>> If you are *absolutely sure* that your application will successfully
>>> and correctly survive a call to fork(), you may disable this warning
>>> by setting the mpi_warn_on_fork MCA parameter to 0.
>>> ------------------------------------------------------------
>>> --------------
>>> [node1529:03700] 13 more processes have sent help message
>>> help-mpi-runtime.txt / mpi_init:warn-fork
>>> [node1529:03700] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>> see all help / error messages
>>> ------------------------------------------------------------
>>> --------------
>>>
>>> Is this message useful for the debugging?
>>>
>> This is just a possibly technical problem attaching a debugger in your
>> environment, but you have to actually attach the debugger and poke
>> around (stack trace, etc).
>>
>> Can you create an interactive session and run your job from there?
>>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20131112/1941211f/attachment.html>
More information about the petsc-users
mailing list