<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sat, Dec 31, 2016 at 9:53 AM, Eric Chamberland <span dir="ltr"><<a href="mailto:Eric.Chamberland@giref.ulaval.ca" target="_blank">Eric.Chamberland@giref.ulaval.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
I am just starting to debug a bug encountered with and only with SuperLU_Dist combined with MKL on a 2 processes validation test.<br>
<br>
(the same test works fine with MUMPS on 2 processes).<br>
<br>
I just noticed that the SuperLU_Dist version installed by PETSc configure script is 5.1.0 and the latest SuperLU_DIST is 5.1.3.<br>
<br>
Before going further, I just want to ask:<br>
<br>
Is there any specific reason to stick to 5.1.0?<br></blockquote><div><br></div><div>Can you debug in 'master' which does have 5.1.3, including an important bug fix?</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Here is some more information:<br>
<br>
On process 2 I have this printed in stdout:<br>
<br>
Intel MKL ERROR: Parameter 6 was incorrect on entry to DTRSM .<br>
<br>
and in stderr:<br>
<br>
Test.ProblemeEFGen.opt: malloc.c:2369: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.<br>
[saruman:15771] *** Process received signal ***<br>
<br>
This is the 7th call to KSPSolve in the same execution. Here is the last KSPView:<br>
<br>
KSP Object:(o_slin) 2 MPI processes<br>
type: preonly<br>
maximum iterations=10000, initial guess is zero<br>
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br>
left preconditioning<br>
using NONE norm type for convergence test<br>
PC Object:(o_slin) 2 MPI processes<br>
type: lu<br>
LU: out-of-place factorization<br>
tolerance for zero pivot 2.22045e-14<br>
matrix ordering: natural<br>
factor fill ratio given 0., needed 0.<br>
Factored matrix follows:<br>
Mat Object: 2 MPI processes<br>
type: mpiaij<br>
rows=382, cols=382<br>
package used to perform factorization: superlu_dist<br>
total: nonzeros=0, allocated nonzeros=0<br>
total number of mallocs used during MatSetValues calls =0<br>
SuperLU_DIST run parameters:<br>
Process grid nprow 2 x npcol 1<br>
Equilibrate matrix TRUE<br>
Matrix input mode 1<br>
Replace tiny pivots FALSE<br>
Use iterative refinement FALSE<br>
Processors in row 2 col partition 1<br>
Row permutation LargeDiag<br>
Column permutation METIS_AT_PLUS_A<br>
Parallel symbolic factorization FALSE<br>
Repeated factorization SamePattern<br>
linear system matrix = precond matrix:<br>
Mat Object: (o_slin) 2 MPI processes<br>
type: mpiaij<br>
rows=382, cols=382<br>
total: nonzeros=4458, allocated nonzeros=4458<br>
total number of mallocs used during MatSetValues calls =0<br>
using I-node (on process 0) routines: found 109 nodes, limit used is 5<br>
<br>
I know this information is not enough to help debug, but I would like to know if PETSc guys will upgrade to 5.1.3 before trying to debug anything.<br>
<br>
Thanks,<br>
Eric<br>
<br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>
</div></div>