[petsc-users] MatScale returns different results depending on matrix size

Roland Richter roland.richter at ntnu.no
Thu Jan 7 02:15:16 CST 2021


Hei,

I think I found a solution, but no explanation for it (yet). In my
CMakeLists-file I was linking the following libraries:

    ${EIGEN3_LIBRARIES}
    ${GSL_LIBRARIES}
    ${FFTW_LIBRARIES}
    /opt/fftw3/lib64/libfftw3_mpi.so
    gfortran
    /opt/intel/mkl/lib/intel64/libmkl_rt.so
    ${PETSC_LIBRARY_DIRS}/libpetsc.so
    Boost::filesystem
    Boost::mpi
    Boost::program_options
    Boost::serialization
    ${X11_LIBRARIES}
    OpenMP::OpenMP_CXX

with

    ${FFTW_LIBRARIES}

containing

    libfftw3.so
    libfftw3_omp.so

Now, apparently the issue was a clash between the OpenMP-libraries and
mkl_rt. When removing mkl_rt, the program works as expected, and
similarly for when removing OpenMP::OpenMP_CXX and libfftw3_omp.so. As
long as both options are present in the link list, I will obtain wrong
results. Therefore, I'll now take a deeper look at what causes that
behavior. Nevertheless, for the moment it looks as if I could solve the
problem.

Thank you very much for your help!

Regards,

Roland

Am 06.01.21 um 19:44 schrieb Barry Smith:
>
> $ ./main -start_in_debugger noxterm
> PETSC: Attaching lldb to ./main of pid 17914 on
> Barry-Smiths-MacBook-Pro.local
> (lldb) process attach --pid 17914
> warning: (x86_64)
> /Users/barrysmith/soft/clang-ifort/lib/libmpifort.12.dylib empty dSYM
> file detected, dSYM was created with an executable with no debug info.
> warning: (x86_64)
> /Users/barrysmith/soft/clang-ifort/lib/libmpi.12.dylib empty dSYM file
> detected, dSYM was created with an executable with no debug info.
> warning: (x86_64)
> /Users/barrysmith/soft/clang-ifort/lib/libpmpi.12.dylib empty dSYM
> file detected, dSYM was created with an executable with no debug info.
> Process 17914 stopped
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>     frame #0:
> 0x00007fff733cb756libsystem_kernel.dylib`__semwait_signal + 10
> libsystem_kernel.dylib`__semwait_signal:
> -> 0x7fff733cb756 <+10>: jae    0x7fff733cb760            ; <+20>
>    0x7fff733cb758 <+12>: movq   %rax, %rdi
>    0x7fff733cb75b <+15>: jmp    0x7fff733ca22d            ; cerror
>    0x7fff733cb760 <+20>: retq   
> Target 0: (main) stopped.
>
> Executable module set to
> "/Users/barrysmith/Src/petsc/src/ksp/ksp/tutorials/main".
> Architecture set to: x86_64h-apple-macosx-.
> (lldb) b MatScale
> Breakpoint 1: where = libpetsc.3.014.dylib`MatScale + 48 at
> matrix.c:5281:3, address = 0x0000000102f62090
> (lldb) c
>
>
> 1.0230000000000000e+03 
> Process 17914 stopped
> * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
>     frame #0:
> 0x0000000102f62090libpetsc.3.014.dylib`MatScale(mat=0x00007fa5040c9870,
> a=0.01 + 0i) at matrix.c:5281:3
>   5278{
>   5279  PetscErrorCode ierr;
>   5280
>
> Target 0: (main) stopped.
> (lldb) c
>
> 0001e+01 1.0230000000000000e+01 
> [ 0]32 bytes PetscPushErrorHandler() line 161 in
> /Users/barrysmith/Src/petsc/src/sys/error/err.c
> Process 17914 exited with status = 0 (0x00000000) 
>
> I see no issue. Code is, of course, built with complex.
>
> Barry
>
>
>> On Jan 6, 2021, at 11:31 AM, Roland Richter <roland.richter at ntnu.no
>> <mailto:roland.richter at ntnu.no>> wrote:
>>
>> Hei,
>>
>> I removed all dependencies to armadillo and other not directly
>> necessary packages, and attached both CMakeLists.txt and main-file.
>> Even though I am only having PETSc as main dependence I still have
>> the same issues. For a scaling factor of 0.1 and a matrix size of
>> [1024, 1024] it works fine, for a scaling factor of 0.01 on the same
>> matrix the apparent scaling factor is suddenly 1e-8.
>>
>> Thank you for your help!
>>
>> Regards,
>>
>> Roland
>>
>> Am 06.01.21 um 17:36 schrieb Matthew Knepley:
>>> On Wed, Jan 6, 2021 at 11:05 AM Roland Richter
>>> <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> wrote:
>>>
>>>     Hei,
>>>
>>>     I ran the program in both versions using "valgrind
>>>     --tool=memcheck --leak-check=full --show-leak-kinds=all <binary>
>>>     -malloc_debug". I got
>>>
>>>     ==3059== LEAK SUMMARY:
>>>     ==3059==    definitely lost: 12,916 bytes in 32 blocks
>>>     ==3059==    indirectly lost: 2,415 bytes in 2 blocks
>>>     ==3059==      possibly lost: 0 bytes in 0 blocks
>>>     ==3059==    still reachable: 103,511 bytes in 123 blocks
>>>     ==3059==         suppressed: 0 bytes in 0 blocks
>>>
>>>     but none of the leaks is related to the scaling-function itself.
>>>
>>>     Did I miss something here?
>>>
>>> Here is my analysis. It is certainly the case that MatScale() does
>>> not mysteriously scale by other numbers.
>>> It is used all over the place in tests, and in the code. Your test
>>> requires another package. Thus, it seems
>>> reasonable to guess that a bad interaction with that package (memory
>>> overwrite, conflicting layout or format, etc.)
>>> is responsible for the behavior you see.
>>>
>>>   Thanks,
>>>
>>>      Matt 
>>>
>>>     Thanks!
>>>
>>>     Am 06.01.21 um 15:26 schrieb Matthew Knepley:
>>>>     On Wed, Jan 6, 2021 at 2:41 AM Roland Richter
>>>>     <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> wrote:
>>>>
>>>>         Hei,
>>>>
>>>>         I added one additional function to the code:
>>>>
>>>>         /void test_scaling_petsc_pointer(const Mat &in_mat,//
>>>>         //                                Mat &out_mat,//
>>>>         //                                const PetscScalar
>>>>         &scaling_factor) {//
>>>>         //    MatCopy (in_mat, out_mat, SAME_NONZERO_PATTERN);//
>>>>         //    PetscScalar *mat_ptr;//
>>>>         //    MatDenseGetArray (out_mat, &mat_ptr);//
>>>>         //    PetscInt r_0, r_1;//
>>>>         //    MatGetLocalSize (out_mat, &r_0, &r_1);//
>>>>         //    for(int i = 0; i < r_0 * r_1; ++i)//
>>>>         //        *(mat_ptr + i) = (*(mat_ptr + i) * scaling_factor);//
>>>>         //
>>>>         //    MatAssemblyBegin (out_mat, MAT_FINAL_ASSEMBLY);//
>>>>         //    MatAssemblyEnd (out_mat, MAT_FINAL_ASSEMBLY);//
>>>>         //}/
>>>>
>>>>         When replacing test function /test_scaling_petsc()/ with
>>>>         /test_scaling_petsc_pointer()/ everything works as it
>>>>         should, but I do not understand why.
>>>>
>>>>         Do you have any suggestions?
>>>>
>>>>     The easiest explanation is that you have a memory overwrite in
>>>>     the code somewhere. Barry's suggestion to use
>>>>     valgrind is good.
>>>>
>>>>        Matt 
>>>>
>>>>         Thanks!
>>>>
>>>>
>>>>         Am 05.01.21 um 15:24 schrieb Roland Richter:
>>>>>
>>>>>         Hei,
>>>>>
>>>>>         the code I attached to the original mail should work out
>>>>>         of the box, but requires armadillo and PETSc to
>>>>>         compile/run. Armadillo stores the data in column-major
>>>>>         order, and therefore I am transposing the matrices before
>>>>>         and after transferring using .st().
>>>>>
>>>>>         Thank you for your help!
>>>>>
>>>>>         Regards,
>>>>>
>>>>>         Roland
>>>>>
>>>>>         Am 05.01.21 um 15:21 schrieb Matthew Knepley:
>>>>>>         On Tue, Jan 5, 2021 at 7:57 AM Roland Richter
>>>>>>         <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>>
>>>>>>         wrote:
>>>>>>
>>>>>>             Hei,
>>>>>>
>>>>>>             I would like to scale a given matrix with a fixed
>>>>>>             scalar value, and
>>>>>>             therefore would like to use MatScale(). Nevertheless,
>>>>>>             I observed an
>>>>>>             interesting behavior depending on the size of the
>>>>>>             matrix, and currently
>>>>>>             I am not sure why.
>>>>>>
>>>>>>             When running the attached code, I intend to divide
>>>>>>             all elements in the
>>>>>>             matrix by a constant factor of 10. If I have three or
>>>>>>             fewer rows and
>>>>>>             1024 columns, I get the expected result. If I have
>>>>>>             four or more rows
>>>>>>             (with the same number of columns), suddenly my
>>>>>>             scaling factor seems to
>>>>>>             be 0.01 instead of 0.1 for the PETSc-matrix. The
>>>>>>             armadillo-based matrix
>>>>>>             still behaves as expected.
>>>>>>
>>>>>>
>>>>>>         1) It looks like you assume the storage in your armadillo
>>>>>>         matrix is row major. I would be surprised if this was true.
>>>>>>
>>>>>>         2) I think it is unlikely that there is a problem with
>>>>>>         MatScale, so I would guess either you have a memory overwrite
>>>>>>         or are misinterpreting your output. If you send something
>>>>>>         I can run, I will figure out which it is.
>>>>>>
>>>>>>           Thanks,
>>>>>>
>>>>>>              Matt
>>>>>>          
>>>>>>
>>>>>>             I currently do not understand that behavior, but do
>>>>>>             not see any problems
>>>>>>             with the code either. Are there any possible
>>>>>>             explanations for that behavior?
>>>>>>
>>>>>>             Thank you very much,
>>>>>>
>>>>>>             regards,
>>>>>>
>>>>>>             Roland Richter
>>>>>>
>>>>>>
>>>>>>
>>>>>>         -- 
>>>>>>         What most experimenters take for granted before they
>>>>>>         begin their experiments is infinitely more interesting
>>>>>>         than any results to which their experiments lead.
>>>>>>         -- Norbert Wiener
>>>>>>
>>>>>>         https://www.cse.buffalo.edu/~knepley/
>>>>>>         <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>>
>>>>
>>>>     -- 
>>>>     What most experimenters take for granted before they begin
>>>>     their experiments is infinitely more interesting than any
>>>>     results to which their experiments lead.
>>>>     -- Norbert Wiener
>>>>
>>>>     https://www.cse.buffalo.edu/~knepley/
>>>>     <http://www.cse.buffalo.edu/~knepley/>
>>>
>>>
>>>
>>> -- 
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which
>>> their experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>> <main.cpp><CMakeLists.txt>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210107/82e33846/attachment-0001.html>


More information about the petsc-users mailing list