[petsc-users] Error with SuperLU_DIST (mkl related?)

Matthew Knepley knepley at gmail.com
Sat Dec 31 10:51:47 CST 2016


On Sat, Dec 31, 2016 at 9:53 AM, Eric Chamberland <
Eric.Chamberland at giref.ulaval.ca> wrote:

> Hi,
>
> I am just starting to debug a bug encountered with and only with
> SuperLU_Dist combined with MKL on a 2 processes validation test.
>
> (the same test works fine with MUMPS on 2 processes).
>
> I just noticed that the SuperLU_Dist version installed by PETSc configure
> script is 5.1.0 and the latest SuperLU_DIST is 5.1.3.
>
> Before going further, I just want to ask:
>
> Is there any specific reason to stick to 5.1.0?
>

Can you debug in 'master' which does have 5.1.3, including an important bug
fix?

   Matt


>
> Here is some more information:
>
> On process 2 I have this printed in stdout:
>
> Intel MKL ERROR: Parameter 6 was incorrect on entry to DTRSM .
>
> and in stderr:
>
> Test.ProblemeEFGen.opt: malloc.c:2369: sysmalloc: Assertion `(old_top ==
> (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof
> (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long)
> (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk,
> fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1)))
> && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)'
> failed.
> [saruman:15771] *** Process received signal ***
>
> This is the 7th call to KSPSolve in the same execution. Here is the last
> KSPView:
>
> KSP Object:(o_slin) 2 MPI processes
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   left preconditioning
>   using NONE norm type for convergence test
> PC Object:(o_slin) 2 MPI processes
>   type: lu
>     LU: out-of-place factorization
>     tolerance for zero pivot 2.22045e-14
>     matrix ordering: natural
>     factor fill ratio given 0., needed 0.
>       Factored matrix follows:
>         Mat Object:         2 MPI processes
>           type: mpiaij
>           rows=382, cols=382
>           package used to perform factorization: superlu_dist
>           total: nonzeros=0, allocated nonzeros=0
>           total number of mallocs used during MatSetValues calls =0
>             SuperLU_DIST run parameters:
>               Process grid nprow 2 x npcol 1
>               Equilibrate matrix TRUE
>               Matrix input mode 1
>               Replace tiny pivots FALSE
>               Use iterative refinement FALSE
>               Processors in row 2 col partition 1
>               Row permutation LargeDiag
>               Column permutation METIS_AT_PLUS_A
>               Parallel symbolic factorization FALSE
>               Repeated factorization SamePattern
>   linear system matrix = precond matrix:
>   Mat Object:  (o_slin)   2 MPI processes
>     type: mpiaij
>     rows=382, cols=382
>     total: nonzeros=4458, allocated nonzeros=4458
>     total number of mallocs used during MatSetValues calls =0
>       using I-node (on process 0) routines: found 109 nodes, limit used is
> 5
>
> I know this information is not enough to help debug, but I would like to
> know if PETSc guys will upgrade to 5.1.3 before trying to debug anything.
>
> Thanks,
> Eric
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161231/f55ca253/attachment.html>


More information about the petsc-users mailing list