[petsc-dev] SuperLU failure with valgrind

Mark Adams mfadams at lbl.gov
Mon Oct 16 09:21:57 CDT 2017


FYI, I get this error on one processor with SuperLU under valgrind. Could
this just be a valgrind issue?

Mark

/Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 1 valgrind
--dsymutil=yes --leak-check=no --gen-suppressions=no --num-callers=20
--error-limit=no ./ex48 -debug 2 -dim 2 -dm_refine 3 -ts_monitor -implicit
true -ts_type beuler -pc_type lu -pc_factor_mat_solver_package superlu_dist
-ksp_type preonly -snes_monitor -snes_rtol 1.e-10 -snes_stol 1.e-10
-snes_converged_reason -snes_atol 1.e-18 -snes_converged_reason
-petscspace_order 2 -petscspace_poly_tensor -ts_max_steps 1 -ts_dt 1.e-3
-eps 1.e-12 -eta 0.001 -ves 0.005 -beta 0.01 -mu 0.0002 -dm_view
hdf5:sol.h5 -vec_view hdf5:sol.h5::append -dm_plex_periodic_cut
-y_periodicity PERIODIC -cells 2,4 -Jop 4.99 -line_dir 1,1 -line_coord
3.14159265359,1.57079632679 -real_view :u.m:ascii_matlab -fft_view
:spectra.m:ascii_matlab
==63582== Memcheck, a memory error detector
==63582== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==63582== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==63582== Command: ./ex48 -debug 2 -dim 2 -dm_refine 3 -ts_monitor
-implicit true -ts_type beuler -pc_type lu -pc_factor_mat_solver_package
superlu_dist -ksp_type preonly -snes_monitor -snes_rtol 1.e-10 -snes_stol
1.e-10 -snes_converged_reason -snes_atol 1.e-18 -snes_converged_reason
-petscspace_order 2 -petscspace_poly_tensor -ts_max_steps 1 -ts_dt 1.e-3
-eps 1.e-12 -eta 0.001 -ves 0.005 -beta 0.01 -mu 0.0002 -dm_view
hdf5:sol.h5 -vec_view hdf5:sol.h5::append -dm_plex_periodic_cut
-y_periodicity PERIODIC -cells 2,4 -Jop 4.99 -line_dir 1,1 -line_coord
3.14159265359,1.57079632679 -real_view :u.m:ascii_matlab -fft_view
:spectra.m:ascii_matlab
==63582==
==63582== Syscall param msg->desc.port.name points to uninitialised byte(s)
==63582==    at 0x103FE134A: mach_msg_trap (in /usr/lib/system/libsystem_
kernel.dylib)
==63582==    by 0x103FE0796: mach_msg (in /usr/lib/system/libsystem_
kernel.dylib)
==63582==    by 0x103FDA485: task_set_special_port (in
/usr/lib/system/libsystem_kernel.dylib)
==63582==    by 0x10817810E: _os_trace_create_debug_control_port (in
/usr/lib/system/libsystem_trace.dylib)
==63582==    by 0x108178458: _libtrace_init (in /usr/lib/system/libsystem_
trace.dylib)
==63582==    by 0x1036119DF: libSystem_initializer (in
/usr/lib/libSystem.B.dylib)
==63582==    by 0x100034A1A:
ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext
const&) (in /usr/lib/dyld)
==63582==    by 0x100034C1D: ImageLoaderMachO::
doInitialization(ImageLoader::LinkContext const&) (in /usr/lib/dyld)
==63582==    by 0x1000304A9:
ImageLoader::recursiveInitialization(ImageLoader::LinkContext
const&, unsigned int, char const*, ImageLoader::InitializerTimingList&,
ImageLoader::UninitedUpwards&) (in /usr/lib/dyld)
==63582==    by 0x100030440:
ImageLoader::recursiveInitialization(ImageLoader::LinkContext
const&, unsigned int, char const*, ImageLoader::InitializerTimingList&,
ImageLoader::UninitedUpwards&) (in /usr/lib/dyld)
==63582==    by 0x10002F523:
ImageLoader::processInitializers(ImageLoader::LinkContext
const&, unsigned int, ImageLoader::InitializerTimingList&,
ImageLoader::UninitedUpwards&) (in /usr/lib/dyld)
==63582==    by 0x10002F5B8:
ImageLoader::runInitializers(ImageLoader::LinkContext
const&, ImageLoader::InitializerTimingList&) (in /usr/lib/dyld)
==63582==    by 0x100021433: dyld::initializeMainExecutable() (in
/usr/lib/dyld)
==63582==    by 0x1000258C5: dyld::_main(macho_header const*, unsigned
long, int, char const**, char const**, char const**, unsigned long*) (in
/usr/lib/dyld)
==63582==    by 0x100020248: dyldbootstrap::start(macho_header const*, int,
char const**, long, macho_header const*, unsigned long*) (in /usr/lib/dyld)
==63582==    by 0x100020035: _dyld_start (in /usr/lib/dyld)
==63582==    by 0x3E: ???
==63582==    by 0x1080A84C2: ???
==63582==    by 0x1080A84C9: ???
==63582==    by 0x1080A84D0: ???
==63582==  Address 0x1080a60fc is on thread 1's stack
==63582==  in frame #2, created by task_set_special_port (???:)
==63582==
--63582-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--63582-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2
times)
--63582-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4
times)
Jop=4.99
DeltaPrime=1.81627
eta=0.001
beta=0.01
mu=0.0002
ves=0.005
==63582== Warning: invalid file descriptor -1 in syscall read()
0) total perturbed mass = 0.
0 TS dt 0.001 time 0.
    0 SNES Function norm 5.917661770415e-01
==63582== Conditional jump or move depends on uninitialised value(s)
==63582==    at 0x103A5FAA8: MPIR_Process_status (mpiimpl.h:4394)
==63582==    by 0x103A6152F: MPIC_Waitall (helper_fns.c:774)
==63582==    by 0x1038E2A34: MPIR_Alltoall_intra (alltoall.c:369)
==63582==    by 0x1038E35E1: MPIR_Alltoall (alltoall.c:564)
==63582==    by 0x1038E37E6: MPIR_Alltoall_impl (alltoall.c:599)
==63582==    by 0x1037106AD: MPI_Alltoall (alltoall.c:722)
==63582==    by 0x10236EA7C: static_schedule (in
/Users/markadams/Codes/petsc/arch-macosx-gnu-g/lib/
libsuperlu_dist.5.1.3.dylib)
==63582==    by 0x10239923C: pdgstrf (in /Users/markadams/Codes/petsc/
arch-macosx-gnu-g/lib/libsuperlu_dist.5.1.3.dylib)
==63582==    by 0x10237D696: pdgssvx_ABglobal (in
/Users/markadams/Codes/petsc/arch-macosx-gnu-g/lib/
libsuperlu_dist.5.1.3.dylib)
==63582==    by 0x100AB1F02: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:423)
==63582==    by 0x10053AD98: MatLUFactorNumeric (matrix.c:3039)
==63582==    by 0x1012075CD: PCSetUp_LU (lu.c:131)
==63582==    by 0x10134D65B: PCSetUp (precon.c:924)
==63582==    by 0x101496E11: KSPSetUp (itfunc.c:378)
==63582==    by 0x101499143: KSPSolve (itfunc.c:609)
==63582==    by 0x1015F9410: SNESSolve_NEWTONLS (ls.c:224)
==63582==    by 0x101574290: SNESSolve (snes.c:4106)
==63582==    by 0x10179B43C: TS_SNESSolve (theta.c:176)
==63582==    by 0x10178F7CE: TSStep_Theta (theta.c:216)
==63582==    by 0x1016C1D62: TSStep (ts.c:4120)
==63582==
==63582== Conditional jump or move depends on uninitialised value(s)
==63582==    at 0x103A5FAA8: MPIR_Process_status (mpiimpl.h:4394)
==63582==    by 0x103A6152F: MPIC_Waitall (helper_fns.c:774)
==63582==    by 0x1038E5E88: MPIR_Alltoallv_intra (alltoallv.c:194)
==63582==    by 0x1038E67F9: MPIR_Alltoallv (alltoallv.c:339)
==63582==    by 0x1038E6A53: MPIR_Alltoallv_impl (alltoallv.c:376)
==63582==    by 0x103712112: MPI_Alltoallv (alltoallv.c:527)
==63582==    by 0x10236ECF1: static_schedule (in
/Users/markadams/Codes/petsc/arch-macosx-gnu-g/lib/
libsuperlu_dist.5.1.3.dylib)
==63582==    by 0x10239923C: pdgstrf (in /Users/markadams/Codes/petsc/
arch-macosx-gnu-g/lib/libsuperlu_dist.5.1.3.dylib)
==63582==    by 0x10237D696: pdgssvx_ABglobal (in
/Users/markadams/Codes/petsc/arch-macosx-gnu-g/lib/
libsuperlu_dist.5.1.3.dylib)
==63582==    by 0x100AB1F02: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:423)
==63582==    by 0x10053AD98: MatLUFactorNumeric (matrix.c:3039)
==63582==    by 0x1012075CD: PCSetUp_LU (lu.c:131)
==63582==    by 0x10134D65B: PCSetUp (precon.c:924)
==63582==    by 0x101496E11: KSPSetUp (itfunc.c:378)
==63582==    by 0x101499143: KSPSolve (itfunc.c:609)
==63582==    by 0x1015F9410: SNESSolve_NEWTONLS (ls.c:224)
==63582==    by 0x101574290: SNESSolve (snes.c:4106)
==63582==    by 0x10179B43C: TS_SNESSolve (theta.c:176)
==63582==    by 0x10178F7CE: TSStep_Theta (theta.c:216)
==63582==    by 0x1016C1D62: TSStep (ts.c:4120)
==63582==
==63582== Thread 2:
==63582== Invalid read of size 4
==63582==    at 0x10814A2B1: _pthread_wqthread (in
/usr/lib/system/libsystem_pthread.dylib)
==63582==    by 0x10814A07C: start_wqthread (in /usr/lib/system/libsystem_
pthread.dylib)
==63582==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==63582==
==63582== Invalid read of size 8
==63582==    at 0x1081489D6: pthread_getspecific (in
/usr/lib/system/libsystem_pthread.dylib)
==63582==    by 0x100286A5B: PetscVSNPrintf (mprint.c:132)
==63582==    by 0x1002871A3: PetscVFPrintfDefault (mprint.c:241)
==63582==    by 0x10028A1E6: PetscFPrintf (mprint.c:546)
==63582==    by 0x1002A1BE9: PetscErrorPrintfDefault (errtrace.c:114)
==63582==    by 0x1002A3C5D: PetscSignalHandlerDefault (signal.c:135)
==63582==    by 0x1002A4A79: PetscSignalHandler_Private (signal.c:47)
==63582==    by 0x25805BDBD: ???
==63582==    by 0x10814A07C: start_wqthread (in /usr/lib/system/libsystem_
pthread.dylib)
==63582==  Address 0x50 is not stack'd, malloc'd or (recently) free'd
==63582==
==63582==
==63582== Process terminating with default action of signal 11 (SIGSEGV)
==63582==  Access not within mapped region at address 0x50
==63582==    at 0x1081489D6: pthread_getspecific (in
/usr/lib/system/libsystem_pthread.dylib)
==63582==    by 0x100286A5B: PetscVSNPrintf (mprint.c:132)
==63582==    by 0x1002871A3: PetscVFPrintfDefault (mprint.c:241)
==63582==    by 0x10028A1E6: PetscFPrintf (mprint.c:546)
==63582==    by 0x1002A1BE9: PetscErrorPrintfDefault (errtrace.c:114)
==63582==    by 0x1002A3C5D: PetscSignalHandlerDefault (signal.c:135)
==63582==    by 0x1002A4A79: PetscSignalHandler_Private (signal.c:47)
==63582==    by 0x25805BDBD: ???
==63582==    by 0x10814A07C: start_wqthread (in /usr/lib/system/libsystem_
pthread.dylib)
==63582==  If you believe this happened as a result of a stack
==63582==  overflow in your program's main thread (unlikely but
==63582==  possible), you can try to increase the size of the
==63582==  main thread stack using the --main-stacksize= flag.
==63582==  The main thread stack size used in this run was 67104768.

valgrind: m_scheduler/scheduler.c:881 (void run_thread_for_a_while(HWord *,
Int *, ThreadId, HWord, Bool)): Assertion 'VG_(stats__n_xindirs_32) == 0'
failed.

host stacktrace:
==63582==    at 0x25804121C: ???
==63582==    by 0x258041587: ???
==63582==    by 0x25804156A: ???
==63582==    by 0x2580BB25F: ???
==63582==    by 0x2580B95EA: ???
==63582==    by 0x2580CA83B: ???
==63582==    by 0x2580CAAF8: ???

sched status:
  running_tid=3

Thread 1: status = VgTs_Yielding (lwpid 771)
==63582==    at 0x10239F9DE: dscatter_u (in /Users/markadams/Codes/petsc/
arch-macosx-gnu-g/lib/libsuperlu_dist.5.1.3.dylib)
==63582==    by 0x10239EF4F: pdgstrf (in /Users/markadams/Codes/petsc/
arch-macosx-gnu-g/lib/libsuperlu_dist.5.1.3.dylib)
==63582==    by 0x10237D696: pdgssvx_ABglobal (in
/Users/markadams/Codes/petsc/arch-macosx-gnu-g/lib/
libsuperlu_dist.5.1.3.dylib)
==63582==    by 0x100AB1F02: MatLUFactorNumeric_SuperLU_DIST
(superlu_dist.c:423)
==63582==    by 0x10053AD98: MatLUFactorNumeric (matrix.c:3039)
==63582==    by 0x1012075CD: PCSetUp_LU (lu.c:131)
==63582==    by 0x10134D65B: PCSetUp (precon.c:924)
==63582==    by 0x101496E11: KSPSetUp (itfunc.c:378)
==63582==    by 0x101499143: KSPSolve (itfunc.c:609)
==63582==    by 0x1015F9410: SNESSolve_NEWTONLS (ls.c:224)
==63582==    by 0x101574290: SNESSolve (snes.c:4106)
==63582==    by 0x10179B43C: TS_SNESSolve (theta.c:176)
==63582==    by 0x10178F7CE: TSStep_Theta (theta.c:216)
==63582==    by 0x1016C1D62: TSStep (ts.c:4120)
==63582==    by 0x1016C56A3: TSSolve (ts.c:4374)
==63582==    by 0x100004E0E: main (ex48.c:1061)

Thread 2: status = VgTs_Yielding (lwpid 4099)
==63582==    at 0x1081489D6: pthread_getspecific (in
/usr/lib/system/libsystem_pthread.dylib)
==63582==    by 0x100286A5B: PetscVSNPrintf (mprint.c:132)
==63582==    by 0x1002871A3: PetscVFPrintfDefault (mprint.c:241)
==63582==    by 0x10028A1E6: PetscFPrintf (mprint.c:546)
==63582==    by 0x1002A1BE9: PetscErrorPrintfDefault (errtrace.c:114)
==63582==    by 0x1002A3C5D: PetscSignalHandlerDefault (signal.c:135)
==63582==    by 0x1002A4A79: PetscSignalHandler_Private (signal.c:47)
==63582==    by 0x25805BDBD: ???
==63582==    by 0x10814A07C: start_wqthread (in /usr/lib/system/libsystem_
pthread.dylib)

Thread 3: status = VgTs_Runnable (lwpid 3843)
==63582==    at 0x10814A070: start_wqthread (in /usr/lib/system/libsystem_
pthread.dylib)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20171016/1be78839/attachment-0001.html>


More information about the petsc-dev mailing list