[petsc-users] Error with SuperLU_DIST (mkl related?)
Satish Balay
balay at mcs.anl.gov
Sat Dec 31 12:39:10 CST 2016
Ok - one more place superlu_dist stores version number - that needs updating with every release.
cc:ing Sherry
Satish
On Sat, 31 Dec 2016, Eric Chamberland wrote:
> I think there is definitly a problem.
>
> After looking at the files installed either from petsc-master tarball or the
> manual configure I just did with --download-superlu_dist-commit=v5.1.3, the
> file include/superlu_defs.h have these values:
>
> #define SUPERLU_DIST_MAJOR_VERSION 5
> #define SUPERLU_DIST_MINOR_VERSION 1
> #define SUPERLU_DIST_PATCH_VERSION 0
>
> What's wrong?
>
> Eric
>
>
> Le 2016-12-31 à 13:26, Eric Chamberland a écrit :
> > Ah ok, I see! Here look at the file name in the configure.log:
> >
> > Install the project...
> > /usr/bin/cmake -P cmake_install.cmake
> > -- Install configuration: "DEBUG"
> > -- Installing: /opt/petsc-master_debug/lib/libsuperlu_dist.so.5.1.0
> > -- Installing: /opt/petsc-master_debug/lib/libsuperlu_dist.so.5
> >
> > It is saying 5.1.0, but in fact you are right: it is 5.1.3 that is
> > downloaded!!! :)
> >
> > And FWIW, the nighlty automatic compilation of PETSc starts within a brand
> > new and empty directory each night...
> >
> > Thanks to both of you again! :)
> >
> > Eric
> >
> >
> > Le 2016-12-31 à 13:17, Satish Balay a écrit :
> > > ===============================================================================
> > > Trying to download
> > > git://https://github.com/xiaoyeli/superlu_dist for SUPERLU_DIST
> > > ===============================================================================
> > > Executing: git clone
> > > https://github.com/xiaoyeli/superlu_dist
> > > /pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-master-debug/arch-linux2-c-debug/externalpackages/git.superlu_dist
> > > stdout: Cloning into
> > > '/pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-master-debug/arch-linux2-c-debug/externalpackages/git.superlu_dist'...
> > > Looking for SUPERLU_DIST at git.superlu_dist,
> > > hg.superlu_dist or a directory starting with ['superlu_dist']
> > > Found a copy of SUPERLU_DIST in git.superlu_dist
> > > Executing: ['git', 'rev-parse', '--git-dir']
> > > stdout: .git
> > > Executing: ['git', 'cat-file', '-e', 'v5.1.3^{commit}']
> > > Executing: ['git', 'rev-parse', 'v5.1.3']
> > > stdout: 7306f704c6c8d5113def649b76def3c8eb607690
> > > Executing: ['git', 'stash']
> > > stdout: No local changes to save
> > > Executing: ['git', 'clean', '-f', '-d', '-x']
> > > Executing: ['git', 'checkout', '-f',
> > > '7306f704c6c8d5113def649b76def3c8eb607690']
> > > <<<<<<<<
> > >
> > > Per log below - its using 5.1.3. Why did you think you got 5.1.0?
> > >
> > > Satish
> > >
> > > On Sat, 31 Dec 2016, Eric Chamberland wrote:
> > >
> > > > Hi,
> > > >
> > > > ok I will test with 5.1.3 with the option you gave me
> > > > (--download-superlu_dit-commit=v5.1.3).
> > > >
> > > > But from what you and Matthew said, I should have 5.1.3 with
> > > > petsc-master, but
> > > > the last night log shows me library file name 5.1.0:
> > > >
> > > > http://www.giref.ulaval.ca/~cmpgiref/petsc-master-debug/2016.12.31.02h00m01s_configure.log
> > > >
> > > >
> > > > So I am a bit confused: Why did I got 5.1.0 last night? (I use the
> > > > petsc-master tarball, is it the reason?)
> > > >
> > > > Thanks,
> > > >
> > > > Eric
> > > >
> > > >
> > > > Le 2016-12-31 à 11:52, Satish Balay a écrit :
> > > > > On Sat, 31 Dec 2016, Eric Chamberland wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am just starting to debug a bug encountered with and only with
> > > > > > SuperLU_Dist
> > > > > > combined with MKL on a 2 processes validation test.
> > > > > >
> > > > > > (the same test works fine with MUMPS on 2 processes).
> > > > > >
> > > > > > I just noticed that the SuperLU_Dist version installed by PETSc
> > > > > > configure
> > > > > > script is 5.1.0 and the latest SuperLU_DIST is 5.1.3.
> > > > > If you use petsc-master - it will install 5.1.3 by default.
> > > > > > Before going further, I just want to ask:
> > > > > >
> > > > > > Is there any specific reason to stick to 5.1.0?
> > > > > We don't usually upgrade externalpackage version in PETSc releases
> > > > > [unless its tested to work and fixes known bugs]. There could be API
> > > > > changes - or build changes that can potentially conflict.
> > > > >
> > > > > >From what I know - 5.1.3 should work with petsc-3.7 [it fixes a
> > > > > couple of
> > > > > bugs].
> > > > >
> > > > > You might be able to do the following with petsc-3.7 [with git
> > > > > externalpackage repos]
> > > > >
> > > > > --download-superlu_dist --download-superlu_dit-commit=v5.1.3
> > > > >
> > > > > Satish
> > > > >
> > > > > > Here is some more information:
> > > > > >
> > > > > > On process 2 I have this printed in stdout:
> > > > > >
> > > > > > Intel MKL ERROR: Parameter 6 was incorrect on entry to DTRSM .
> > > > > >
> > > > > > and in stderr:
> > > > > >
> > > > > > Test.ProblemeEFGen.opt: malloc.c:2369: sysmalloc: Assertion
> > > > > > `(old_top ==
> > > > > > (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) -
> > > > > > __builtin_offsetof
> > > > > > (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long)
> > > > > > (old_size)
> > > > > > > = (unsigned long)((((__builtin_offsetof (struct malloc_chunk,
> > > > > > fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t)))
> > > > > > -
> > > > > > 1))) &&
> > > > > > ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) ==
> > > > > > 0)'
> > > > > > failed.
> > > > > > [saruman:15771] *** Process received signal ***
> > > > > >
> > > > > > This is the 7th call to KSPSolve in the same execution. Here is the
> > > > > > last
> > > > > > KSPView:
> > > > > >
> > > > > > KSP Object:(o_slin) 2 MPI processes
> > > > > > type: preonly
> > > > > > maximum iterations=10000, initial guess is zero
> > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> > > > > > left preconditioning
> > > > > > using NONE norm type for convergence test
> > > > > > PC Object:(o_slin) 2 MPI processes
> > > > > > type: lu
> > > > > > LU: out-of-place factorization
> > > > > > tolerance for zero pivot 2.22045e-14
> > > > > > matrix ordering: natural
> > > > > > factor fill ratio given 0., needed 0.
> > > > > > Factored matrix follows:
> > > > > > Mat Object: 2 MPI processes
> > > > > > type: mpiaij
> > > > > > rows=382, cols=382
> > > > > > package used to perform factorization: superlu_dist
> > > > > > total: nonzeros=0, allocated nonzeros=0
> > > > > > total number of mallocs used during MatSetValues calls
> > > > > > =0
> > > > > > SuperLU_DIST run parameters:
> > > > > > Process grid nprow 2 x npcol 1
> > > > > > Equilibrate matrix TRUE
> > > > > > Matrix input mode 1
> > > > > > Replace tiny pivots FALSE
> > > > > > Use iterative refinement FALSE
> > > > > > Processors in row 2 col partition 1
> > > > > > Row permutation LargeDiag
> > > > > > Column permutation METIS_AT_PLUS_A
> > > > > > Parallel symbolic factorization FALSE
> > > > > > Repeated factorization SamePattern
> > > > > > linear system matrix = precond matrix:
> > > > > > Mat Object: (o_slin) 2 MPI processes
> > > > > > type: mpiaij
> > > > > > rows=382, cols=382
> > > > > > total: nonzeros=4458, allocated nonzeros=4458
> > > > > > total number of mallocs used during MatSetValues calls =0
> > > > > > using I-node (on process 0) routines: found 109 nodes, limit
> > > > > > used
> > > > > > is 5
> > > > > >
> > > > > > I know this information is not enough to help debug, but I would
> > > > > > like to
> > > > > > know
> > > > > > if PETSc guys will upgrade to 5.1.3 before trying to debug anything.
> > > > > >
> > > > > > Thanks,
> > > > > > Eric
> > > > > >
> > > > > >
> > > >
>
>
>
More information about the petsc-users
mailing list