[petsc-users] Rather different matrix product results on multiple processes

Wed Apr 21 05:54:34 CDT 2021

Here you have, https://gitlab.com/petsc/petsc/-/merge_requests/3903. We can
discuss the issue on gitlab.

Thanks
Stefano

Il giorno mer 21 apr 2021 alle ore 13:39 Stefano Zampini <
stefano.zampini at gmail.com> ha scritto:

> Peder
>
> I have slightly modified your code and I confirm the bug.
> The bug is not with the MatMatTranspose operation; it is within the HDF5
> reader. I will soon open an MR with the code and discussing the issues.
>
> Thanks for reporting the issue
> Stefano
>
> Il giorno mer 21 apr 2021 alle ore 12:22 Peder Jørgensgaard Olesen via
> petsc-users <petsc-users at mcs.anl.gov> ha scritto:
>
>> Dear Hong
>>
>>
>> Thank your for your reply.
>>
>>
>> I have a hunch that the issue goes beyond the minor differences that
>> might arise from floating-point computation order, however.
>>
>>
>> Writing the product matrix to a binary file using MatView() and
>> inspecting the output shows very different entries depending on the number
>> of processes. Here are the first three rows and columns of the product
>> matrix obtained in a sequential run:
>>
>> 2.58348   1.68202   1.66302
>>
>> 1.68202   4.27506   1.91897
>>
>> 1.66302   1.91897   2.70028
>>
>>
>> - and the corresponding part of the product matrix obtained on one node
>> (40 processes):
>>
>> 4.43536   2.17261   0.16430
>>
>> 2.17261   4.53224   2.53210
>>
>> 0.16430   2.53210   4.73234
>>
>>
>> The parallel result is not even close to the sequential one. Trying
>> different numbers of processes produces yet different results.
>>
>>
>> Also, the eigenvectors that I subsequently determine using a SLEPC
>> solver do not form a proper basis for the column space of the data
>> matrix as they must, which is hardly a surprise given the variability of
>> results indicated above - except when the code is run on just a single
>> process. Forming such a basis central to the intended application, and given
>> that it would need to work on rather large data sets, running on a single
>> process is hardly a viable solution.
>>
>>
>> Best regards
>>
>> Peder
>> ------------------------------
>> *Fra:* Zhang, Hong <hzhang at mcs.anl.gov>
>> *Sendt:* 19. april 2021 18:34:31
>> *Til:* petsc-users at mcs.anl.gov; Peder Jørgensgaard Olesen
>> *Emne:* Re: Rather different matrix product results on multiple processes
>>
>> Peder,
>> I tested your code on a linux machine. I got
>> $ ./acorr_mwe
>> Data matrix norm: 5.0538e+01
>> Autocorrelation matrix norm: 1.0473e+03
>>
>> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via
>> allgatherv (default)
>> Data matrix norm: 5.0538e+01
>> Autocorrelation matrix norm: 1.0363e+03
>>
>> mpiexec -n 20 ./acorr_mwe
>> Data matrix norm: 5.0538e+01
>> Autocorrelation matrix norm: 1.0897e+03
>>
>> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via cyclic
>> Data matrix norm: 5.0538e+01
>> Autocorrelation matrix norm: 1.0363e+03
>>
>> I use petsc 'main' branch (same as the latest release). You can remove
>> MatAssemblyBegin/End calls after MatMatTransposeMult():
>> MatMatTransposeMult(data_mat, data_mat, MAT_INITIAL_MATRIX,
>> PETSC_DEFAULT, &corr_mat);
>> //ierr = MatAssemblyBegin(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>> //ierr = MatAssemblyEnd(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>>
>> The communication patterns of parallel implementation led to different
>> order of floating-point computation, thus slightly different matrix norm of
>> R.
>> Hong
>>
>> ------------------------------
>> *From:* petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Peder
>> Jørgensgaard Olesen via petsc-users <petsc-users at mcs.anl.gov>
>> *Sent:* Monday, April 19, 2021 7:57 AM
>> *To:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> *Subject:* [petsc-users] Rather different matrix product results on
>> multiple processes
>>
>>
>> Hello,
>>
>>
>> When computing a matrix product of the type R = D.DT using
>> MatMatTransposeMult() I find I get rather different results depending on
>> the number of processes. In one example using a data set that is
>> small compared to the application I get Frobenius norms |R| = 1.047e3 on a
>> single process, 1.0363e3 on a single HPC node (40 cores), and 9.7307e2 on
>> two nodes.
>>
>>
>> I have ascertained that the single process result is indeed the correct
>> one (i.e., eigenvectors of R form a proper basis for the columns of D), so
>> naturally I'd love to be able to reproduce this result across different
>> parallel setups. How might I achieve this?
>>
>>
>> I'm attaching MWE code and the data set used for the example.
>>
>>
>> Thanks in advance!
>>
>>
>> Best Regards
>>
>>
>> Peder Jørgensgaard Olesen
>>
>> PhD Student, Turbulence Research Lab
>>
>> Dept. of Mechanical Engineering
>>
>> Technical University of Denmark
>>
>> Niels Koppels Allé
>>
>> Bygning 403, Rum 105
>>
>> DK-2800 Kgs. Lyngby
>>
>
>
> --
> Stefano
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210421/e6073e54/attachment-0001.html>