[petsc-dev] "thread safe"

Mark Adams mfadams at lbl.gov
Fri Feb 20 20:22:24 CST 2015


OK, I have a code setup to test it so feel free to make branch and I can
test it.
Mark

On Fri, Feb 20, 2015 at 7:13 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   Mark,
>
>    Yes, after looking at the code it does make sense. The reason is that
> Matt made me "improve" the -xxx_converged_reason to use viewers; but in
> your case there will be multiple threads (each associated with different
> KSP objects) each monkeying with the same (default) viewer thus possibly
> corrupting it.
>
>    I'll have to think a little bit about the best way to keep the
> functionality but be thread safe.
>
>   Barry
>
> > On Feb 20, 2015, at 5:57 PM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > Barry,
> >
> > We had a problem with the thread safe version and found, by pure luck,
> that apparently if we use -ksp_converged_reason we get segv type failure.
> Does this sound sensible?
> >
> > I can give you an executable and environment the run this on Edison if
> that is useful.
> >
> > Thanks,
> > Mark
> >
> >
> > On Tue, Feb 17, 2015 at 9:27 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   You need to configure with --with-threadsafety and --with-log=0 and
> --with-debugging=0
> >
> >   Eventually we'll support at least the debugging with thread safety.
> >
> >   Barry
> >
> > Not sure about that strange message from the cray system.
> >
> >
> > > On Feb 17, 2015, at 8:14 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > > We have been testing master with a code that calls PETSc serial LU
> solvers from threads.  I have seen system messages with OMP (see way below)
> and Robert (cc'ed) reported this useful stack trace.
> > >
> > > I have not modified my (non-thread) build.  Perhaps I need to or are
> there PETSc runtime options?
> > >
> > > This is a Cray XC30 with Intel.
> > >
> > > Thanks,
> > > Mark
> > >
> > > SC[0;39mESC[0;49m[116]PETSC ERROR: Object is in wrong state
> > > [116]PETSC ERROR: Logging event had unbalanced begin/end pairs
> > > [116]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > [116]PETSC ERROR: Petsc Development GIT revision:
> v3.5.3-1570-gcaf1481  GIT Date: 2015-02-07 17:34:17 -0600
> > > [116]PETSC ERROR: ./xgca_petsc36_col on a arch-xc30-opt64-intel named
> nid05975 by rhager Tue Feb 17 10:46:32 2015
> > > [116]PETSC ERROR: Configure options --COPTFLAGS="-fast -no-ipo"
> --CXXOPTFLAGS="-fast -no-ipoi" --FOPTFLAGS="-fast -no-ipo" --download-hypre
> --download-superlu_dist --
> > > download-parmetis --download-metis --with-ssl=0 --with-cc=cc
> --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0
> --with-debugging=0 --with-fc=ftn --with
> > > -fortranlib-autodetect=0
> --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.13/intel/140/
> --with-shared-libraries=0 --with-x=0 --with-mpiexec=aprun LIBS=-lstdc++
> --with-64-b
> > > it-indices PETSC_ARCH=arch-xc30-opt64-intel
> PETSC_DIR=/global/u2/m/madams/petsc_master
> > > [116]PETSC ERROR: #1 PetscLogEventEndDefault() line 694 in
> /global/u2/m/madams/petsc_master/src/sys/logging/utils/eventlog.c
> > > [116]PETSC ERROR: #2 MatLUFactorSymbolic() line 2894 in
> /global/u2/m/madams/petsc_master/src/mat/interface/matrix.c
> > > [116]PETSC ERROR: #3 PCSetUp_LU() line 127 in
> /global/u2/m/madams/petsc_master/src/ksp/pc/impls/factor/lu/lu.c
> > > [116]PETSC ERROR: #4 PCSetUp() line 918 in
> /global/u2/m/madams/petsc_master/src/ksp/pc/interface/precon.c
> > > [116]PETSC ERROR: #5 KSPSetUp() line 306 in
> /global/u2/m/madams/petsc_master/src/ksp/ksp/interface/itfunc.c
> > > [116]PETSC ERROR: #6 KSPSolve() line 503 in
> /global/u2/m/madams/petsc_master/src/ksp/ksp/interface/itfunc.c
> > >
> > >
> > > Other error message:
> > >
> > >
> > > OMP: Error #13: Assertion failure at kmp_runtime.c(1588).
> > > OMP: Hint: Please submit a bug report with this message, compile and
> run commands used, and machine configuration info including native compiler
> and operating system versions. Faster response will be obtained by
> including all program sources. For information on submitting this issue,
> please see http://www.intel.com/software/products/support/.
> > > _pmiu_daemon(SIGCHLD): [NID 05979] [c7-3c0s6n3] [Tue Feb 17 15:14:43
> 2015] PE RANK 23 exit signal Killed
> > > _pmiu_daemon(SIGCHLD): [NID 05976] [c7-3c0s6n0] [Tue Feb 17 15:14:43
> 2015] PE RANK 10 exit signal Killed
> > > [NID 05979] 2015-02-17 15:14:43 Apid 10147992: initiated application
> termination
> > > [NID 05979] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 239]. Please contact admin for details. Killing
> pid 18637(xgca)
> > > [NID 05976] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 73]. Please contact admin for details. Killing
> pid 15380(xgca)
> > > [NID 05984] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 0]. Please contact admin for details. Killing
> pid 34636(xgca)
> > > [NID 05988] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 59]. Please contact admin for details. Killing
> pid 38496(xgca)
> > > [NID 06019] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 0]. Please contact admin for details. Killing
> pid 11132(xgca)
> > > [NID 05980] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 0]. Please contact admin for details. Killing
> pid 8320(xgca)
> > > [NID 05993] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 0]. Please contact admin for details. Killing
> pid 46182(xgca)
> > > [NID 06020] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 249]. Please contact admin for details. Killing
> pid 23753(xgca)
> > > [NID 05987] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 87]. Please contact admin for details. Killing
> pid 11254(xgca)
> > > [NID 05986] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 41]. Please contact admin for details. Killing
> pid 6630(xgca)
> > > [NID 05981] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 31]. Please contact admin for details. Killing
> pid 10520(xgca)
> > > [NID 05999] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 7]. Please contact admin for details. Killing
> pid 1843(xgca)
> > > [NID 05985] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 0]. Please contact admin for details. Killing
> pid 26498(xgca)
> > > [NID 05998] 2015-02-17 15:14:43 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 209]. Please contact admin for details. Killing
> pid 20387(xgca)
> > > [NID 05994] 2015-02-17 15:14:53 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 0]. Please contact admin for details. Killing
> pid 39462(xgca)
> > > [NID 05983] 2015-02-17 15:14:53 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 0]. Please contact admin for details. Killing
> pid 18598(xgca)
> > > [NID 05995] 2015-02-17 15:14:54 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 0]. Please contact admin for details. Killing
> pid 42322(xgca)
> > > [NID 05996] 2015-02-17 15:14:54 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 0]. Please contact admin for details. Killing
> pid 34248(xgca)
> > > [NID 05978] 2015-02-17 15:14:55 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 0]. Please contact admin for details. Killing
> pid 9483(xgca)
> > > [NID 05975] 2015-02-17 15:14:56 Apid 10147992: Cray HSN detected
> critical error 0x4416[ptag 0]. Please contact admin for details. Killing
> pid 11470(xgca)
> > > Application 10147992 exit codes: 137
> > > Application 10147992 exit signals: Killed
> > > Application 10147992 resources: utime ~2194s, stime ~199s, Rss
> ~488560, inblocks ~908164, outblocks ~2571652
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20150220/0e8c8d81/attachment.html>


More information about the petsc-dev mailing list