[petsc-users] Error with SuperLU_DIST (mkl related?)

Eric Chamberland Eric.Chamberland at giref.ulaval.ca
Sun Jan 1 07:04:39 CST 2017


Thanks!

Bboth filename and #defines are ok now.

Eric



Le 2016-12-31 à 16:18, Xiaoye S. Li a écrit :
> I just updated version string in git repo and tarball.
>
> Sherry
>
> On Sat, Dec 31, 2016 at 10:39 AM, Satish Balay <balay at mcs.anl.gov 
> <mailto:balay at mcs.anl.gov>> wrote:
>
>     Ok - one more place superlu_dist stores version number - that
>     needs updating with every release.
>
>     cc:ing Sherry
>
>     Satish
>
>     On Sat, 31 Dec 2016, Eric Chamberland wrote:
>
>     > I think there is definitly a problem.
>     >
>     > After looking at the files installed either from petsc-master
>     tarball or the
>     > manual configure I just did with
>     --download-superlu_dist-commit=v5.1.3, the
>     > file include/superlu_defs.h have these values:
>     >
>     > #define SUPERLU_DIST_MAJOR_VERSION     5
>     > #define SUPERLU_DIST_MINOR_VERSION     1
>     > #define SUPERLU_DIST_PATCH_VERSION     0
>     >
>     > What's wrong?
>     >
>     > Eric
>     >
>     >
>     > Le 2016-12-31 à 13:26, Eric Chamberland a écrit :
>     > > Ah ok, I see!  Here look at the file name in the configure.log:
>     > >
>     > > Install the project...
>     > > /usr/bin/cmake -P cmake_install.cmake
>     > > -- Install configuration: "DEBUG"
>     > > -- Installing:
>     /opt/petsc-master_debug/lib/libsuperlu_dist.so.5.1.0
>     > > -- Installing: /opt/petsc-master_debug/lib/libsuperlu_dist.so.5
>     > >
>     > > It is saying 5.1.0, but in fact you are right: it is 5.1.3 that is
>     > > downloaded!!! :)
>     > >
>     > > And FWIW, the nighlty automatic compilation of PETSc starts
>     within a brand
>     > > new and empty directory each night...
>     > >
>     > > Thanks to both of you again! :)
>     > >
>     > > Eric
>     > >
>     > >
>     > > Le 2016-12-31 à 13:17, Satish Balay a écrit :
>     > > >
>     ===============================================================================
>     > > >                            Trying to download
>     > > > git://https://github.com/xiaoyeli/superlu_dist
>     <https://github.com/xiaoyeli/superlu_dist> for SUPERLU_DIST
>     > > >
>     ===============================================================================
>     > > > Executing: git clone
>     > > > https://github.com/xiaoyeli/superlu_dist
>     <https://github.com/xiaoyeli/superlu_dist>
>     > > >
>     /pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-master-debug/arch-linux2-c-debug/externalpackages/git.superlu_dist
>     > > > stdout: Cloning into
>     > > >
>     '/pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-master-debug/arch-linux2-c-debug/externalpackages/git.superlu_dist'...
>     > > >                      Looking for SUPERLU_DIST at
>     git.superlu_dist,
>     > > > hg.superlu_dist or a directory starting with ['superlu_dist']
>     > > >                      Found a copy of SUPERLU_DIST in
>     git.superlu_dist
>     > > > Executing: ['git', 'rev-parse', '--git-dir']
>     > > > stdout: .git
>     > > > Executing: ['git', 'cat-file', '-e', 'v5.1.3^{commit}']
>     > > > Executing: ['git', 'rev-parse', 'v5.1.3']
>     > > > stdout: 7306f704c6c8d5113def649b76def3c8eb607690
>     > > > Executing: ['git', 'stash']
>     > > > stdout: No local changes to save
>     > > > Executing: ['git', 'clean', '-f', '-d', '-x']
>     > > > Executing: ['git', 'checkout', '-f',
>     > > > '7306f704c6c8d5113def649b76def3c8eb607690']
>     > > > <<<<<<<<
>     > > >
>     > > > Per log below - its using 5.1.3. Why did you think you got
>     5.1.0?
>     > > >
>     > > > Satish
>     > > >
>     > > > On Sat, 31 Dec 2016, Eric Chamberland wrote:
>     > > >
>     > > > > Hi,
>     > > > >
>     > > > > ok I will test with 5.1.3 with the option you gave me
>     > > > > (--download-superlu_dit-commit=v5.1.3).
>     > > > >
>     > > > > But from what you and Matthew said, I should have 5.1.3 with
>     > > > > petsc-master, but
>     > > > > the last night log shows me library file name 5.1.0:
>     > > > >
>     > > > >
>     http://www.giref.ulaval.ca/~cmpgiref/petsc-master-debug/2016.12.31.02h00m01s_configure.log
>     <http://www.giref.ulaval.ca/%7Ecmpgiref/petsc-master-debug/2016.12.31.02h00m01s_configure.log>
>     > > > >
>     > > > >
>     > > > > So I am a bit confused: Why did I got 5.1.0 last night? (I
>     use the
>     > > > > petsc-master tarball, is it the reason?)
>     > > > >
>     > > > > Thanks,
>     > > > >
>     > > > > Eric
>     > > > >
>     > > > >
>     > > > > Le 2016-12-31 à 11:52, Satish Balay a écrit :
>     > > > > > On Sat, 31 Dec 2016, Eric Chamberland wrote:
>     > > > > >
>     > > > > > > Hi,
>     > > > > > >
>     > > > > > > I am just starting to debug a bug encountered with and
>     only with
>     > > > > > > SuperLU_Dist
>     > > > > > > combined with MKL on a 2 processes validation test.
>     > > > > > >
>     > > > > > > (the same test works fine with MUMPS on 2 processes).
>     > > > > > >
>     > > > > > > I just noticed that the SuperLU_Dist version installed
>     by PETSc
>     > > > > > > configure
>     > > > > > > script is 5.1.0 and the latest SuperLU_DIST is 5.1.3.
>     > > > > > If you use petsc-master - it will install 5.1.3 by default.
>     > > > > > > Before going further, I just want to ask:
>     > > > > > >
>     > > > > > > Is there any specific reason to stick to 5.1.0?
>     > > > > > We don't usually upgrade externalpackage version in
>     PETSc releases
>     > > > > > [unless its tested to work and fixes known bugs]. There
>     could be API
>     > > > > > changes - or build changes that can potentially conflict.
>     > > > > >
>     > > > > > >From what I know - 5.1.3 should work with petsc-3.7 [it
>     fixes a
>     > > > > > couple of
>     > > > > > bugs].
>     > > > > >
>     > > > > > You might be able to do the following with petsc-3.7
>     [with git
>     > > > > > externalpackage repos]
>     > > > > >
>     > > > > > --download-superlu_dist --download-superlu_dit-commit=v5.1.3
>     > > > > >
>     > > > > > Satish
>     > > > > >
>     > > > > > > Here is some more information:
>     > > > > > >
>     > > > > > > On process 2 I have this printed in stdout:
>     > > > > > >
>     > > > > > > Intel MKL ERROR: Parameter 6 was incorrect on entry to
>     DTRSM .
>     > > > > > >
>     > > > > > > and in stderr:
>     > > > > > >
>     > > > > > > Test.ProblemeEFGen.opt: malloc.c:2369: sysmalloc:
>     Assertion
>     > > > > > > `(old_top ==
>     > > > > > > (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) -
>     > > > > > > __builtin_offsetof
>     > > > > > > (struct malloc_chunk, fd)))) && old_size == 0) ||
>     ((unsigned long)
>     > > > > > > (old_size)
>     > > > > > > > = (unsigned long)((((__builtin_offsetof (struct
>     malloc_chunk,
>     > > > > > > fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2
>     *(sizeof(size_t)))
>     > > > > > > -
>     > > > > > > 1))) &&
>     > > > > > > ((old_top)->size & 0x1) && ((unsigned long) old_end &
>     pagemask) ==
>     > > > > > > 0)'
>     > > > > > > failed.
>     > > > > > > [saruman:15771] *** Process received signal ***
>     > > > > > >
>     > > > > > > This is the 7th call to KSPSolve in the same
>     execution. Here is the
>     > > > > > > last
>     > > > > > > KSPView:
>     > > > > > >
>     > > > > > > KSP Object:(o_slin) 2 MPI processes
>     > > > > > >     type: preonly
>     > > > > > >     maximum iterations=10000, initial guess is zero
>     > > > > > >     tolerances: relative=1e-05, absolute=1e-50,
>     divergence=10000.
>     > > > > > >     left preconditioning
>     > > > > > >     using NONE norm type for convergence test
>     > > > > > > PC Object:(o_slin) 2 MPI processes
>     > > > > > >     type: lu
>     > > > > > >       LU: out-of-place factorization
>     > > > > > >       tolerance for zero pivot 2.22045e-14
>     > > > > > >       matrix ordering: natural
>     > > > > > >       factor fill ratio given 0., needed 0.
>     > > > > > >         Factored matrix follows:
>     > > > > > >           Mat Object:    2 MPI processes
>     > > > > > >             type: mpiaij
>     > > > > > >             rows=382, cols=382
>     > > > > > >             package used to perform factorization:
>     superlu_dist
>     > > > > > >             total: nonzeros=0, allocated nonzeros=0
>     > > > > > >             total number of mallocs used during
>     MatSetValues calls
>     > > > > > >             =0
>     > > > > > >               SuperLU_DIST run parameters:
>     > > > > > >                 Process grid nprow 2 x npcol 1
>     > > > > > >  Equilibrate matrix TRUE
>     > > > > > >                 Matrix input mode 1
>     > > > > > >                 Replace tiny pivots FALSE
>     > > > > > >                 Use iterative refinement FALSE
>     > > > > > >                 Processors in row 2 col partition 1
>     > > > > > >                 Row permutation LargeDiag
>     > > > > > >                 Column permutation METIS_AT_PLUS_A
>     > > > > > >                 Parallel symbolic factorization FALSE
>     > > > > > >                 Repeated factorization SamePattern
>     > > > > > >     linear system matrix = precond matrix:
>     > > > > > >     Mat Object:  (o_slin)  2 MPI processes
>     > > > > > >       type: mpiaij
>     > > > > > >       rows=382, cols=382
>     > > > > > >       total: nonzeros=4458, allocated nonzeros=4458
>     > > > > > >       total number of mallocs used during MatSetValues
>     calls =0
>     > > > > > >         using I-node (on process 0) routines: found
>     109 nodes, limit
>     > > > > > > used
>     > > > > > >         is 5
>     > > > > > >
>     > > > > > > I know this information is not enough to help debug,
>     but I would
>     > > > > > > like to
>     > > > > > > know
>     > > > > > > if PETSc guys will upgrade to 5.1.3 before trying to
>     debug anything.
>     > > > > > >
>     > > > > > > Thanks,
>     > > > > > > Eric
>     > > > > > >
>     > > > > > >
>     > > > >
>     >
>     >
>     >
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170101/ecf424f4/attachment-0001.html>


More information about the petsc-users mailing list