[petsc-users] Error with SuperLU_DIST (mkl related?)
Satish Balay
balay at mcs.anl.gov
Sat Dec 31 10:52:47 CST 2016
On Sat, 31 Dec 2016, Eric Chamberland wrote:
> Hi,
>
> I am just starting to debug a bug encountered with and only with SuperLU_Dist
> combined with MKL on a 2 processes validation test.
>
> (the same test works fine with MUMPS on 2 processes).
>
> I just noticed that the SuperLU_Dist version installed by PETSc configure
> script is 5.1.0 and the latest SuperLU_DIST is 5.1.3.
If you use petsc-master - it will install 5.1.3 by default.
>
> Before going further, I just want to ask:
>
> Is there any specific reason to stick to 5.1.0?
We don't usually upgrade externalpackage version in PETSc releases
[unless its tested to work and fixes known bugs]. There could be API
changes - or build changes that can potentially conflict.
>From what I know - 5.1.3 should work with petsc-3.7 [it fixes a couple of bugs].
You might be able to do the following with petsc-3.7 [with git externalpackage repos]
--download-superlu_dist --download-superlu_dit-commit=v5.1.3
Satish
> Here is some more information:
>
> On process 2 I have this printed in stdout:
>
> Intel MKL ERROR: Parameter 6 was incorrect on entry to DTRSM .
>
> and in stderr:
>
> Test.ProblemeEFGen.opt: malloc.c:2369: sysmalloc: Assertion `(old_top ==
> (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof
> (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size)
> >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk,
> fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) &&
> ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
> [saruman:15771] *** Process received signal ***
>
> This is the 7th call to KSPSolve in the same execution. Here is the last
> KSPView:
>
> KSP Object:(o_slin) 2 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object:(o_slin) 2 MPI processes
> type: lu
> LU: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 0., needed 0.
> Factored matrix follows:
> Mat Object: 2 MPI processes
> type: mpiaij
> rows=382, cols=382
> package used to perform factorization: superlu_dist
> total: nonzeros=0, allocated nonzeros=0
> total number of mallocs used during MatSetValues calls =0
> SuperLU_DIST run parameters:
> Process grid nprow 2 x npcol 1
> Equilibrate matrix TRUE
> Matrix input mode 1
> Replace tiny pivots FALSE
> Use iterative refinement FALSE
> Processors in row 2 col partition 1
> Row permutation LargeDiag
> Column permutation METIS_AT_PLUS_A
> Parallel symbolic factorization FALSE
> Repeated factorization SamePattern
> linear system matrix = precond matrix:
> Mat Object: (o_slin) 2 MPI processes
> type: mpiaij
> rows=382, cols=382
> total: nonzeros=4458, allocated nonzeros=4458
> total number of mallocs used during MatSetValues calls =0
> using I-node (on process 0) routines: found 109 nodes, limit used is 5
>
> I know this information is not enough to help debug, but I would like to know
> if PETSc guys will upgrade to 5.1.3 before trying to debug anything.
>
> Thanks,
> Eric
>
>
More information about the petsc-users
mailing list