[petsc-users] parallel IO messages

Fande Kong fdkong.jd at gmail.com
Fri Nov 27 14:27:42 CST 2015


Thanks, Barry,

I also was wondering why this happens randomly? Any explanations? If this
is something in PETSc, that should happen always?

Thanks,

Fande Kong,

On Fri, Nov 27, 2015 at 1:20 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   Edit PETSC_ARCH/include/petscconf.h and add
>
> #if !defined(PETSC_MISSING_SIGTRAP)
> #define PETSC_MISSING_SIGTRAP
> #endif
>
> then do
>
> make gnumake
>
> It is possible that they system you are using uses SIGTRAP in managing the
> IO; by making the change above you are telling PETSc to ignore SIGTRAPS.
>  Let us know how this works out.
>
>    Barry
>
>
> > On Nov 27, 2015, at 1:05 PM, Fande Kong <fdkong.jd at gmail.com> wrote:
> >
> > Hi all,
> >
> > I implemented a parallel IO based on the Vec and IS which uses HDF5. I
> am testing this loader on a supercomputer. I occasionally (not always)
> encounter the following errors (using 8192 cores):
> >
> > [7689]PETSC ERROR:
> ------------------------------------------------------------------------
> > [7689]PETSC ERROR: Caught signal number 5 TRAP
> > [7689]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> > [7689]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> > [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple
> Mac OS X to find memory corruption errors
> > [7689]PETSC ERROR: configure using --with-debugging=yes, recompile,
> link, and run
> > [7689]PETSC ERROR: to get more information on the crash.
> > [7689]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [7689]PETSC ERROR: Signal received
> > [7689]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown
> > [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by fandek
> Fri Nov 27 11:26:30 2015
> > [7689]PETSC ERROR: Configure options --with-clanguage=cxx
> --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1
> --download-parmetis=1 --download-metis=1 --with-netcdf=1
> --download-exodusii=1
> --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5
> --with-debugging=no --with-c2html=0 --with-64-bit-indices=1
> > [7689]PETSC ERROR: #1 User provided function() line 0 in  unknown file
> > Abort(59) on node 7689 (rank 7689 in comm 1140850688): application
> called MPI_Abort(MPI_COMM_WORLD, 59) - process 7689
> > ERROR: 0031-300  Forcing all remote tasks to exit due to exit code 1 in
> task 7689
> >
> > Make and configure logs are attached.
> >
> > Thanks,
> >
> > Fande Kong,
> >
> > <configure_log><make_log>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151127/1d1402ac/attachment.html>


More information about the petsc-users mailing list