[petsc-users] parallel IO messages

Fande Kong fdkong.jd at gmail.com
Fri Nov 27 18:24:38 CST 2015


Hi Barry,

You are highly possibly right. Not 100% because this happens randomly. I
have tried several tests, and all of them passed.  Any reason to put SIGTRAP
into IO system?

Thanks,

Fande,

On Fri, Nov 27, 2015 at 2:29 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   SIGTRAP is a way a process can interact with itself or another process
> asynchronously. It is possible that in all the mess of HDF5/MPI IO/OS code
> that manages getting the data in parallel from the MPI process memory to
> the hard disk some of the code uses SIGTRAP. PETSc, by default, always
> traps the SIGTRAP; thinking that it is indicating an error condition. The
> "randomness" could come from the fact that depending on how quickly the
> data is moving from the MPI processes to the disk only sometimes will the
> mess of code actually use a SIGTRAP.   I could also be totally wrong and
> the SIGTRAP may just be triggered by errors in the IO system. Anyways give
> my suggestion a try and see if it helps, there is nothing else you can do.
>
>   Barry
>
>
>
>
> > On Nov 27, 2015, at 2:27 PM, Fande Kong <fdkong.jd at gmail.com> wrote:
> >
> > Thanks, Barry,
> >
> > I also was wondering why this happens randomly? Any explanations? If
> this is something in PETSc, that should happen always?
> >
> > Thanks,
> >
> > Fande Kong,
> >
> > On Fri, Nov 27, 2015 at 1:20 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   Edit PETSC_ARCH/include/petscconf.h and add
> >
> > #if !defined(PETSC_MISSING_SIGTRAP)
> > #define PETSC_MISSING_SIGTRAP
> > #endif
> >
> > then do
> >
> > make gnumake
> >
> > It is possible that they system you are using uses SIGTRAP in managing
> the IO; by making the change above you are telling PETSc to ignore
> SIGTRAPS.   Let us know how this works out.
> >
> >    Barry
> >
> >
> > > On Nov 27, 2015, at 1:05 PM, Fande Kong <fdkong.jd at gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > I implemented a parallel IO based on the Vec and IS which uses HDF5. I
> am testing this loader on a supercomputer. I occasionally (not always)
> encounter the following errors (using 8192 cores):
> > >
> > > [7689]PETSC ERROR:
> ------------------------------------------------------------------------
> > > [7689]PETSC ERROR: Caught signal number 5 TRAP
> > > [7689]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> > > [7689]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> > > [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple
> Mac OS X to find memory corruption errors
> > > [7689]PETSC ERROR: configure using --with-debugging=yes, recompile,
> link, and run
> > > [7689]PETSC ERROR: to get more information on the crash.
> > > [7689]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > > [7689]PETSC ERROR: Signal received
> > > [7689]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown
> > > [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by
> fandek Fri Nov 27 11:26:30 2015
> > > [7689]PETSC ERROR: Configure options --with-clanguage=cxx
> --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1
> --download-parmetis=1 --download-metis=1 --with-netcdf=1
> --download-exodusii=1
> --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5
> --with-debugging=no --with-c2html=0 --with-64-bit-indices=1
> > > [7689]PETSC ERROR: #1 User provided function() line 0 in  unknown file
> > > Abort(59) on node 7689 (rank 7689 in comm 1140850688): application
> called MPI_Abort(MPI_COMM_WORLD, 59) - process 7689
> > > ERROR: 0031-300  Forcing all remote tasks to exit due to exit code 1
> in task 7689
> > >
> > > Make and configure logs are attached.
> > >
> > > Thanks,
> > >
> > > Fande Kong,
> > >
> > > <configure_log><make_log>
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151127/dbda285c/attachment.html>


More information about the petsc-users mailing list