[petsc-users] Error with SuperLU_DIST (mkl related?)

Eric Chamberland Eric.Chamberland at giref.ulaval.ca
Sat Dec 31 12:26:47 CST 2016


Ah ok, I see!  Here look at the file name in the configure.log:

Install the project...
/usr/bin/cmake -P cmake_install.cmake
-- Install configuration: "DEBUG"
-- Installing: /opt/petsc-master_debug/lib/libsuperlu_dist.so.5.1.0
-- Installing: /opt/petsc-master_debug/lib/libsuperlu_dist.so.5

It is saying 5.1.0, but in fact you are right: it is 5.1.3 that is 
downloaded!!! :)

And FWIW, the nighlty automatic compilation of PETSc starts within a 
brand new and empty directory each night...

Thanks to both of you again! :)

Eric


Le 2016-12-31 à 13:17, Satish Balay a écrit :
>                        ===============================================================================
>                            Trying to download git://https://github.com/xiaoyeli/superlu_dist for SUPERLU_DIST
>                        ===============================================================================
>                      
> Executing: git clone https://github.com/xiaoyeli/superlu_dist /pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-master-debug/arch-linux2-c-debug/externalpackages/git.superlu_dist
> stdout: Cloning into '/pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-master-debug/arch-linux2-c-debug/externalpackages/git.superlu_dist'...
>                      Looking for SUPERLU_DIST at git.superlu_dist, hg.superlu_dist or a directory starting with ['superlu_dist']
>                      Found a copy of SUPERLU_DIST in git.superlu_dist
> Executing: ['git', 'rev-parse', '--git-dir']
> stdout: .git
> Executing: ['git', 'cat-file', '-e', 'v5.1.3^{commit}']
> Executing: ['git', 'rev-parse', 'v5.1.3']
> stdout: 7306f704c6c8d5113def649b76def3c8eb607690
> Executing: ['git', 'stash']
> stdout: No local changes to save
> Executing: ['git', 'clean', '-f', '-d', '-x']
> Executing: ['git', 'checkout', '-f', '7306f704c6c8d5113def649b76def3c8eb607690']
> <<<<<<<<
>
> Per log below - its using 5.1.3. Why did you think you got 5.1.0?
>
> Satish
>
> On Sat, 31 Dec 2016, Eric Chamberland wrote:
>
>> Hi,
>>
>> ok I will test with 5.1.3 with the option you gave me
>> (--download-superlu_dit-commit=v5.1.3).
>>
>> But from what you and Matthew said, I should have 5.1.3 with petsc-master, but
>> the last night log shows me library file name 5.1.0:
>>
>> http://www.giref.ulaval.ca/~cmpgiref/petsc-master-debug/2016.12.31.02h00m01s_configure.log
>>
>> So I am a bit confused: Why did I got 5.1.0 last night? (I use the
>> petsc-master tarball, is it the reason?)
>>
>> Thanks,
>>
>> Eric
>>
>>
>> Le 2016-12-31 à 11:52, Satish Balay a écrit :
>>> On Sat, 31 Dec 2016, Eric Chamberland wrote:
>>>
>>>> Hi,
>>>>
>>>> I am just starting to debug a bug encountered with and only with
>>>> SuperLU_Dist
>>>> combined with MKL on a 2 processes validation test.
>>>>
>>>> (the same test works fine with MUMPS on 2 processes).
>>>>
>>>> I just noticed that the SuperLU_Dist version installed by PETSc configure
>>>> script is 5.1.0 and the latest SuperLU_DIST is 5.1.3.
>>> If you use petsc-master - it will install 5.1.3 by default.
>>>> Before going further, I just want to ask:
>>>>
>>>> Is there any specific reason to stick to 5.1.0?
>>> We don't usually upgrade externalpackage version in PETSc releases
>>> [unless its tested to work and fixes known bugs]. There could be API
>>> changes - or build changes that can potentially conflict.
>>>
>>> >From what I know - 5.1.3 should work with petsc-3.7 [it fixes a couple of
>>> bugs].
>>>
>>> You might be able to do the following with petsc-3.7 [with git
>>> externalpackage repos]
>>>
>>> --download-superlu_dist --download-superlu_dit-commit=v5.1.3
>>>
>>> Satish
>>>
>>>> Here is some more information:
>>>>
>>>> On process 2 I have this printed in stdout:
>>>>
>>>> Intel MKL ERROR: Parameter 6 was incorrect on entry to DTRSM .
>>>>
>>>> and in stderr:
>>>>
>>>> Test.ProblemeEFGen.opt: malloc.c:2369: sysmalloc: Assertion `(old_top ==
>>>> (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof
>>>> (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long)
>>>> (old_size)
>>>>> = (unsigned long)((((__builtin_offsetof (struct malloc_chunk,
>>>> fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) -
>>>> 1))) &&
>>>> ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)'
>>>> failed.
>>>> [saruman:15771] *** Process received signal ***
>>>>
>>>> This is the 7th call to KSPSolve in the same execution. Here is the last
>>>> KSPView:
>>>>
>>>> KSP Object:(o_slin) 2 MPI processes
>>>>     type: preonly
>>>>     maximum iterations=10000, initial guess is zero
>>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>     left preconditioning
>>>>     using NONE norm type for convergence test
>>>> PC Object:(o_slin) 2 MPI processes
>>>>     type: lu
>>>>       LU: out-of-place factorization
>>>>       tolerance for zero pivot 2.22045e-14
>>>>       matrix ordering: natural
>>>>       factor fill ratio given 0., needed 0.
>>>>         Factored matrix follows:
>>>>           Mat Object:         2 MPI processes
>>>>             type: mpiaij
>>>>             rows=382, cols=382
>>>>             package used to perform factorization: superlu_dist
>>>>             total: nonzeros=0, allocated nonzeros=0
>>>>             total number of mallocs used during MatSetValues calls =0
>>>>               SuperLU_DIST run parameters:
>>>>                 Process grid nprow 2 x npcol 1
>>>>                 Equilibrate matrix TRUE
>>>>                 Matrix input mode 1
>>>>                 Replace tiny pivots FALSE
>>>>                 Use iterative refinement FALSE
>>>>                 Processors in row 2 col partition 1
>>>>                 Row permutation LargeDiag
>>>>                 Column permutation METIS_AT_PLUS_A
>>>>                 Parallel symbolic factorization FALSE
>>>>                 Repeated factorization SamePattern
>>>>     linear system matrix = precond matrix:
>>>>     Mat Object:  (o_slin)   2 MPI processes
>>>>       type: mpiaij
>>>>       rows=382, cols=382
>>>>       total: nonzeros=4458, allocated nonzeros=4458
>>>>       total number of mallocs used during MatSetValues calls =0
>>>>         using I-node (on process 0) routines: found 109 nodes, limit used
>>>>         is 5
>>>>
>>>> I know this information is not enough to help debug, but I would like to
>>>> know
>>>> if PETSc guys will upgrade to 5.1.3 before trying to debug anything.
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>>
>>



More information about the petsc-users mailing list