[petsc-dev] Error during KSPDestroy
Alexander Grayver
agrayver at gfz-potsdam.de
Tue May 8 02:00:40 CDT 2012
On 08.05.2012 08:59, Alexander Grayver wrote:
> Barry,
>
> Not it works.
I mean NOW, sorry :)
> Thanks everybody!
>
> On 07.05.2012 22:36, Barry Smith wrote:
>> Alexander
>>
>> Satish and I have determined the problem (took some valgrind and
>> debugger work). We were not allocating enough "workspace" for one of
>> the work arrays passed to zgesvd(). We have fixed it in petsc-dev.
>> You should be able to do a hg pull -u then recompile the libraries
>> with make cmake then relink and run your example.
>>
>> Thank you for your patience.
>>
>> Barry
>>
>> On May 7, 2012, at 9:10 AM, Alexander Grayver wrote:
>>
>>> On 07.05.2012 15:04, Barry Smith wrote:
>>>> I am also running complex.
>>>>
>>>> Look in the file dlasq2.f (it will be in the externalpackages
>>>> subdirectory of the PETSc directory. Look at line 215, this is
>>>> where valgrind has a problem. In my copy
>>>>
>>>> END IF
>>>> *
>>>> * Check for negative data and compute sums of q's and e's.
>>>> *<------ this is line 215
>>>> Z( 2*N ) = ZERO
>>>>
>>>> it is a comment, which is not good. Is lione 215 also a comment in
>>>> your copy of dlasq2.f?
>>> Barry,
>>>
>>> *
>>> * Rearrange data for locality: Z=(q1,qq1,e1,ee1,q2,qq2,e2,ee2,...).
>>> *
>>> DO 30 K = 2*N, 2, -2
>>> Z( 2*K ) = ZERO
>>> !<----------- LINE 215
>>> Z( 2*K-1 ) = Z( K )
>>> Z( 2*K-2 ) = ZERO
>>> Z( 2*K-3 ) = Z( K-1 )
>>> 30 CONTINUE
>>>
>>> In valgrind log you can see that it complaints about following lines
>>> as well:
>>>
>>> ==9009== Invalid write of size 8
>>> ==9009== at 0x10651D5: dlasq2_ (dlasq2.f:215)
>>> ==9009== by 0x1064683: dlasq1_ (dlasq1.f:135)
>>> ==9009== by 0x104EB3F: zbdsqr_ (zbdsqr.f:225)
>>> ==9009== by 0x1023B74: zgesvd_ (zgesvd.f:2040)
>>> ==9009== by 0xD38725: KSPComputeExtremeSingularValues_GMRES
>>> (gmreig.c:46)
>>> ==9009== by 0xCB3CC7: KSPComputeExtremeSingularValues (itfunc.c:47)
>>> ==9009== by 0x406DF2: main (solveTest.c:47)
>>> ==9009== Address 0x6ef5d88 is 8 bytes before a block of size 832
>>> alloc'd
>>> ==9009== at 0x4C2786E: memalign (vg_replace_malloc.c:581)
>>> ==9009== by 0x47E3CB: PetscMallocAlign (mal.c:30)
>>> ==9009== by 0xD2E286: KSPSetUp_GMRES (gmres.c:73)
>>> ==9009== by 0xCB5464: KSPSetUp (itfunc.c:239)
>>> ==9009== by 0xCB6E56: KSPSolve (itfunc.c:402)
>>> ==9009== by 0x406DDB: main (solveTest.c:46)
>>> ==9009==
>>> ==9009== Invalid write of size 8
>>> ==9009== at 0x1065204: dlasq2_ (dlasq2.f:216)
>>> ....
>>> ==9009==
>>> ==9009== Invalid write of size 8
>>> ==9009== at 0x1065223: dlasq2_ (dlasq2.f:217)
>>> ....
>>> ==9009==
>>> ==9009== Invalid write of size 8
>>> ==9009== at 0x1065255: dlasq2_ (dlasq2.f:218)
>>> ....
>>>
>>> All further output is also related to the Z array.
>>> Hard to believe this is a LAPACK problem... I tried 3
>>> implementations over 2 machines.
>>> I have bad feeling it's my stupid mistake somewhere... :)
>>>
>>> Just in case, I run ubuntu 11.1 and PETSc is configured like this
>>> with default gcc compiler:
>>> ./configure --with-petsc-arch=mpich-gcc-complex-debug-c
>>> --download-f-blas-lapack --with-precision=double
>>> --with-scalar-type=complex --download-mpich
>>>
>>>> There are two possible causes I can think of for your problem
>>>>
>>>> 1) PETSc does not allocate enough work space for zgesvd() or
>>>> 2) the BLAS/LAPACK routines have a bug where they sometimes access
>>>> out of their work space.
>>>>
>>>>
>>>> Satish,
>>>>
>>>> Can you try the same build options on a Linux machine as
>>>> close to Alexander as we have and see if you can reproduce this?
>>>>
>>>>
>>>> Barry
>>>>
>>>>
>>>>
>>>> On May 7, 2012, at 2:16 AM, Alexander Grayver wrote:
>>>>
>>>>> On 06.05.2012 22:24, Barry Smith wrote:
>>>>>> Alexander,
>>>>>>
>>>>>> I cannot reproduce this on my mac with 3 different
>>>>>> blas/lapack.
>>>>> Barry,
>>>>>
>>>>> I'm surprised. I ran it on my home PC with ubuntu and PETSc
>>>>> configured from scratch as following:
>>>>> --download-mpich --with-fortran-interfaces=1 --download-scalapack
>>>>> --download-blacs --with-scalar-type=complex --download-blas-lapack
>>>>> --with-precision=double
>>>>>
>>>>> And it's still there.
>>>>> Please note that all my numbers are complex.
>>>>>
>>>>>> Could you please run the case below but with
>>>>>> --download-f-blas-lapack (you forgot the -f last time)? Send us
>>>>>> the valgrind results. This will tell use the exact line number in
>>>>>> dlasq3() that is triggering the bad read.
>>>>> I did:
>>>>> ./configure --with-petsc-arch=openmpi-intel-complex-debug-c
>>>>> --download-scalapack --download-blacs --download-f-blas-lapack
>>>>> --with-precision=double --with-scalar-type=complex
>>>>>
>>>>> And then valgrind program. The first message from log:
>>>>>
>>>>> ==27656== Invalid write of size 8
>>>>> ==27656== at 0x15A8E9E: dlasq2_ (dlasq2.f:215)
>>>>> ==27656== by 0x15A83A4: dlasq1_ (dlasq1.f:135)
>>>>> ==27656== by 0x158ACEC: zbdsqr_ (zbdsqr.f:225)
>>>>> ==27656== by 0x154EC27: zgesvd_ (zgesvd.f:2038)
>>>>> ==27656== by 0x695DD3: KSPComputeExtremeSingularValues_GMRES
>>>>> (gmreig.c:46)
>>>>> ==27656== by 0x69DD76: KSPComputeExtremeSingularValues
>>>>> (itfunc.c:47)
>>>>> ==27656== by 0x44E98C: main (solveTest.c:62)
>>>>> ==27656== Address 0xfad2d98 is 8 bytes before a block of size 832
>>>>> alloc'd
>>>>> ==27656== at 0x4C25D66: memalign (vg_replace_malloc.c:694)
>>>>> ==27656== by 0x4B642B: PetscMallocAlign (mal.c:30)
>>>>> ==27656== by 0x687775: KSPSetUp_GMRES (gmres.c:73)
>>>>> ==27656== by 0x69FE4A: KSPSetUp (itfunc.c:239)
>>>>> ==27656== by 0x6A2058: KSPSolve (itfunc.c:402)
>>>>> ==27656== by 0x44E969: main (solveTest.c:61)
>>>>>
>>>>> Please find full log attached.
>>>>>
>>>>>> Barry
>>>>>>
>>>>>>
>>>>>> On May 6, 2012, at 9:16 AM, Alexander Grayver wrote:
>>>>>>
>>>>>>> On 06.05.2012 15:34, Matthew Knepley wrote:
>>>>>>>> On Sun, May 6, 2012 at 9:24 AM, Alexander
>>>>>>>> Grayver<agrayver at gfz-potsdam.de> wrote:
>>>>>>>> Hm, valgrind gives a lot of output like that (see full log in
>>>>>>>> previous message):
>>>>>>>>
>>>>>>>> Can you run this with --download-f-blas-lapack? This sounds
>>>>>>>> much more like an MKL bug.
>>>>>>> I did:
>>>>>>> --download-scalapack --download-blacs --download-blas-lapack
>>>>>>> --with-precision=double --with-scalar-type=complex
>>>>>>>
>>>>>>> The error is still there. I checked "ldd solveTest", mkl is not
>>>>>>> used for sure. This is not an MKL problem I guess:
>>>>>>>
>>>>>>> ==13600== Invalid read of size 8
>>>>>>> ==13600== at 0x58636AF: dlasq3_ (in
>>>>>>> /usr/local/lib/liblapack.so.3.2.2)
>>>>>>> ==13600== by 0x5862C84: dlasq2_ (in
>>>>>>> /usr/local/lib/liblapack.so.3.2.2)
>>>>>>> ==13600== by 0x5861F2C: dlasq1_ (in
>>>>>>> /usr/local/lib/liblapack.so.3.2.2)
>>>>>>> ==13600== by 0x571A479: zbdsqr_ (in
>>>>>>> /usr/local/lib/liblapack.so.3.2.2)
>>>>>>> ==13600== by 0x57466A7: zgesvd_ (in
>>>>>>> /usr/local/lib/liblapack.so.3.2.2)
>>>>>>> ==13600== by 0x694687: KSPComputeExtremeSingularValues_GMRES
>>>>>>> (gmreig.c:46)
>>>>>>> ==13600== by 0x69C62A: KSPComputeExtremeSingularValues
>>>>>>> (itfunc.c:47)
>>>>>>> ==13600== by 0x44E02C: main (solveTest.c:62)
>>>>>>> ==13600== Address 0x10826b90 is 16 bytes before a block of size
>>>>>>> 832 alloc'd
>>>>>>> ==13600== at 0x4C25D66: memalign (vg_replace_malloc.c:694)
>>>>>>> ==13600== by 0x4B5ACB: PetscMallocAlign (mal.c:30)
>>>>>>> ==13600== by 0x686181: KSPSetUp_GMRES (gmres.c:73)
>>>>>>> ==13600== by 0x69E6FE: KSPSetUp (itfunc.c:239)
>>>>>>> ==13600== by 0x6A090C: KSPSolve (itfunc.c:402)
>>>>>>> ==13600== by 0x44E009: main (solveTest.c:61)
>>>>>>>
>>>>>>> The weird thing is that the it gives correct result, so zgesvd
>>>>>>> works fine.
>>>>>>>
>>>>>>> And also running this program with 10 iterations in valgrind
>>>>>>> doesn't produce error. The low above is with 100 iterations.
>>>>>>> Without valgrind the error is always there.
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Alexander
>>>>>>>
>>>>> --
>>>>> Regards,
>>>>> Alexander
>>>>>
>>>>> <valgrind.zip>
>>>
>>> --
>>> Regards,
>>> Alexander
>>>
>
>
--
Regards,
Alexander
More information about the petsc-dev
mailing list