[petsc-dev] Error during KSPDestroy

Alexander Grayver agrayver at gfz-potsdam.de
Tue May 8 02:00:40 CDT 2012


On 08.05.2012 08:59, Alexander Grayver wrote:
> Barry,
>
> Not it works.

I mean NOW, sorry :)

> Thanks everybody!
>
> On 07.05.2012 22:36, Barry Smith wrote:
>>     Alexander
>>
>>     Satish and I have determined the problem (took some valgrind and 
>> debugger work). We were not allocating enough "workspace" for one of 
>> the work arrays passed to zgesvd(). We have fixed it in petsc-dev. 
>> You should be able to do a hg pull -u  then recompile the libraries 
>> with make cmake then relink and run your example.
>>
>>      Thank you for your patience.
>>
>>       Barry
>>
>> On May 7, 2012, at 9:10 AM, Alexander Grayver wrote:
>>
>>> On 07.05.2012 15:04, Barry Smith wrote:
>>>>     I am also running complex.
>>>>
>>>>      Look in the file dlasq2.f (it will be in the externalpackages 
>>>> subdirectory of the PETSc directory. Look at line 215, this is 
>>>> where valgrind has a problem. In my copy
>>>>
>>>>        END IF
>>>> *
>>>> *     Check for negative data and compute sums of q's and e's.
>>>> *<------ this is line 215
>>>>        Z( 2*N ) = ZERO
>>>>
>>>> it is a comment, which is not good. Is lione 215 also a comment in 
>>>> your copy of dlasq2.f?
>>> Barry,
>>>
>>> *
>>> *     Rearrange data for locality: Z=(q1,qq1,e1,ee1,q2,qq2,e2,ee2,...).
>>> *
>>>       DO 30 K = 2*N, 2, -2
>>>          Z( 2*K ) = ZERO                                             
>>> !<----------- LINE 215
>>>          Z( 2*K-1 ) = Z( K )
>>>          Z( 2*K-2 ) = ZERO
>>>          Z( 2*K-3 ) = Z( K-1 )
>>>    30 CONTINUE
>>>
>>> In valgrind log you can see that it complaints about following lines 
>>> as well:
>>>
>>> ==9009== Invalid write of size 8
>>> ==9009==    at 0x10651D5: dlasq2_ (dlasq2.f:215)
>>> ==9009==    by 0x1064683: dlasq1_ (dlasq1.f:135)
>>> ==9009==    by 0x104EB3F: zbdsqr_ (zbdsqr.f:225)
>>> ==9009==    by 0x1023B74: zgesvd_ (zgesvd.f:2040)
>>> ==9009==    by 0xD38725: KSPComputeExtremeSingularValues_GMRES 
>>> (gmreig.c:46)
>>> ==9009==    by 0xCB3CC7: KSPComputeExtremeSingularValues (itfunc.c:47)
>>> ==9009==    by 0x406DF2: main (solveTest.c:47)
>>> ==9009==  Address 0x6ef5d88 is 8 bytes before a block of size 832 
>>> alloc'd
>>> ==9009==    at 0x4C2786E: memalign (vg_replace_malloc.c:581)
>>> ==9009==    by 0x47E3CB: PetscMallocAlign (mal.c:30)
>>> ==9009==    by 0xD2E286: KSPSetUp_GMRES (gmres.c:73)
>>> ==9009==    by 0xCB5464: KSPSetUp (itfunc.c:239)
>>> ==9009==    by 0xCB6E56: KSPSolve (itfunc.c:402)
>>> ==9009==    by 0x406DDB: main (solveTest.c:46)
>>> ==9009==
>>> ==9009== Invalid write of size 8
>>> ==9009==    at 0x1065204: dlasq2_ (dlasq2.f:216)
>>> ....
>>> ==9009==
>>> ==9009== Invalid write of size 8
>>> ==9009==    at 0x1065223: dlasq2_ (dlasq2.f:217)
>>> ....
>>> ==9009==
>>> ==9009== Invalid write of size 8
>>> ==9009==    at 0x1065255: dlasq2_ (dlasq2.f:218)
>>> ....
>>>
>>> All further output is also related to the Z array.
>>> Hard to believe this is a LAPACK problem... I tried 3 
>>> implementations over 2 machines.
>>> I have bad feeling it's my stupid mistake somewhere... :)
>>>
>>> Just in case, I run ubuntu 11.1 and PETSc is configured like this 
>>> with default gcc compiler:
>>> ./configure --with-petsc-arch=mpich-gcc-complex-debug-c 
>>> --download-f-blas-lapack --with-precision=double 
>>> --with-scalar-type=complex --download-mpich
>>>
>>>> There are two possible causes I can think of for your problem
>>>>
>>>> 1) PETSc does not allocate enough work space for zgesvd() or
>>>> 2) the BLAS/LAPACK routines have a bug where they sometimes access 
>>>> out of their work space.
>>>>
>>>>
>>>>     Satish,
>>>>
>>>>       Can you try the same build options on a Linux machine as 
>>>> close to Alexander as we have and see if you can reproduce this?
>>>>
>>>>
>>>>     Barry
>>>>
>>>>
>>>>
>>>> On May 7, 2012, at 2:16 AM, Alexander Grayver wrote:
>>>>
>>>>> On 06.05.2012 22:24, Barry Smith wrote:
>>>>>>    Alexander,
>>>>>>
>>>>>>       I cannot reproduce this on my mac with 3 different 
>>>>>> blas/lapack.
>>>>> Barry,
>>>>>
>>>>> I'm surprised. I ran it on my home PC with ubuntu and PETSc 
>>>>> configured from scratch as following:
>>>>> --download-mpich --with-fortran-interfaces=1 --download-scalapack 
>>>>> --download-blacs --with-scalar-type=complex --download-blas-lapack 
>>>>> --with-precision=double
>>>>>
>>>>> And it's still there.
>>>>> Please note that all my numbers are complex.
>>>>>
>>>>>>       Could you please run the case below but with 
>>>>>> --download-f-blas-lapack   (you forgot the -f last time)? Send us 
>>>>>> the valgrind results. This will tell use the exact line number in 
>>>>>> dlasq3() that is triggering the bad read.
>>>>> I did:
>>>>> ./configure --with-petsc-arch=openmpi-intel-complex-debug-c 
>>>>> --download-scalapack --download-blacs --download-f-blas-lapack 
>>>>> --with-precision=double --with-scalar-type=complex
>>>>>
>>>>> And then valgrind program. The first message from log:
>>>>>
>>>>> ==27656== Invalid write of size 8
>>>>> ==27656==    at 0x15A8E9E: dlasq2_ (dlasq2.f:215)
>>>>> ==27656==    by 0x15A83A4: dlasq1_ (dlasq1.f:135)
>>>>> ==27656==    by 0x158ACEC: zbdsqr_ (zbdsqr.f:225)
>>>>> ==27656==    by 0x154EC27: zgesvd_ (zgesvd.f:2038)
>>>>> ==27656==    by 0x695DD3: KSPComputeExtremeSingularValues_GMRES 
>>>>> (gmreig.c:46)
>>>>> ==27656==    by 0x69DD76: KSPComputeExtremeSingularValues 
>>>>> (itfunc.c:47)
>>>>> ==27656==    by 0x44E98C: main (solveTest.c:62)
>>>>> ==27656==  Address 0xfad2d98 is 8 bytes before a block of size 832 
>>>>> alloc'd
>>>>> ==27656==    at 0x4C25D66: memalign (vg_replace_malloc.c:694)
>>>>> ==27656==    by 0x4B642B: PetscMallocAlign (mal.c:30)
>>>>> ==27656==    by 0x687775: KSPSetUp_GMRES (gmres.c:73)
>>>>> ==27656==    by 0x69FE4A: KSPSetUp (itfunc.c:239)
>>>>> ==27656==    by 0x6A2058: KSPSolve (itfunc.c:402)
>>>>> ==27656==    by 0x44E969: main (solveTest.c:61)
>>>>>
>>>>> Please find full log attached.
>>>>>
>>>>>>      Barry
>>>>>>
>>>>>>
>>>>>> On May 6, 2012, at 9:16 AM, Alexander Grayver wrote:
>>>>>>
>>>>>>> On 06.05.2012 15:34, Matthew Knepley wrote:
>>>>>>>> On Sun, May 6, 2012 at 9:24 AM, Alexander 
>>>>>>>> Grayver<agrayver at gfz-potsdam.de>    wrote:
>>>>>>>> Hm, valgrind gives a lot of output like that (see full log in 
>>>>>>>> previous message):
>>>>>>>>
>>>>>>>> Can you run this with --download-f-blas-lapack? This sounds 
>>>>>>>> much more like an MKL bug.
>>>>>>> I did:
>>>>>>> --download-scalapack --download-blacs --download-blas-lapack 
>>>>>>> --with-precision=double --with-scalar-type=complex
>>>>>>>
>>>>>>> The error is still there. I checked "ldd solveTest", mkl is not 
>>>>>>> used for sure. This is not an MKL problem I guess:
>>>>>>>
>>>>>>> ==13600== Invalid read of size 8
>>>>>>> ==13600==    at 0x58636AF: dlasq3_ (in 
>>>>>>> /usr/local/lib/liblapack.so.3.2.2)
>>>>>>> ==13600==    by 0x5862C84: dlasq2_ (in 
>>>>>>> /usr/local/lib/liblapack.so.3.2.2)
>>>>>>> ==13600==    by 0x5861F2C: dlasq1_ (in 
>>>>>>> /usr/local/lib/liblapack.so.3.2.2)
>>>>>>> ==13600==    by 0x571A479: zbdsqr_ (in 
>>>>>>> /usr/local/lib/liblapack.so.3.2.2)
>>>>>>> ==13600==    by 0x57466A7: zgesvd_ (in 
>>>>>>> /usr/local/lib/liblapack.so.3.2.2)
>>>>>>> ==13600==    by 0x694687: KSPComputeExtremeSingularValues_GMRES 
>>>>>>> (gmreig.c:46)
>>>>>>> ==13600==    by 0x69C62A: KSPComputeExtremeSingularValues 
>>>>>>> (itfunc.c:47)
>>>>>>> ==13600==    by 0x44E02C: main (solveTest.c:62)
>>>>>>> ==13600==  Address 0x10826b90 is 16 bytes before a block of size 
>>>>>>> 832 alloc'd
>>>>>>> ==13600==    at 0x4C25D66: memalign (vg_replace_malloc.c:694)
>>>>>>> ==13600==    by 0x4B5ACB: PetscMallocAlign (mal.c:30)
>>>>>>> ==13600==    by 0x686181: KSPSetUp_GMRES (gmres.c:73)
>>>>>>> ==13600==    by 0x69E6FE: KSPSetUp (itfunc.c:239)
>>>>>>> ==13600==    by 0x6A090C: KSPSolve (itfunc.c:402)
>>>>>>> ==13600==    by 0x44E009: main (solveTest.c:61)
>>>>>>>
>>>>>>> The weird thing is that the it gives correct result, so zgesvd 
>>>>>>> works fine.
>>>>>>>
>>>>>>> And also running this program with 10 iterations in valgrind 
>>>>>>> doesn't produce error. The low above is with 100 iterations.
>>>>>>> Without valgrind the error is always there.
>>>>>>>
>>>>>>> -- 
>>>>>>> Regards,
>>>>>>> Alexander
>>>>>>>
>>>>> -- 
>>>>> Regards,
>>>>> Alexander
>>>>>
>>>>> <valgrind.zip>
>>>
>>> -- 
>>> Regards,
>>> Alexander
>>>
>
>


-- 
Regards,
Alexander




More information about the petsc-dev mailing list