[petsc-users] Rather different matrix product results on multiple processes

Wed Apr 21 05:39:01 CDT 2021

Peder

I have slightly modified your code and I confirm the bug.
The bug is not with the MatMatTranspose operation; it is within the HDF5
reader. I will soon open an MR with the code and discussing the issues.

Thanks for reporting the issue
Stefano

Il giorno mer 21 apr 2021 alle ore 12:22 Peder Jørgensgaard Olesen via
petsc-users <petsc-users at mcs.anl.gov> ha scritto:

> Dear Hong
>
>
> Thank your for your reply.
>
>
> I have a hunch that the issue goes beyond the minor differences that
> might arise from floating-point computation order, however.
>
>
> Writing the product matrix to a binary file using MatView() and inspecting
> the output shows very different entries depending on the number of
> processes. Here are the first three rows and columns of the product matrix
> obtained in a sequential run:
>
> 2.58348   1.68202   1.66302
>
> 1.68202   4.27506   1.91897
>
> 1.66302   1.91897   2.70028
>
>
> - and the corresponding part of the product matrix obtained on one node
> (40 processes):
>
> 4.43536   2.17261   0.16430
>
> 2.17261   4.53224   2.53210
>
> 0.16430   2.53210   4.73234
>
>
> The parallel result is not even close to the sequential one. Trying
> different numbers of processes produces yet different results.
>
>
> Also, the eigenvectors that I subsequently determine using a SLEPC solver
> do not form a proper basis for the column space of the data matrix as
> they must, which is hardly a surprise given the variability of
> results indicated above - except when the code is run on just a single
> process. Forming such a basis central to the intended application, and given
> that it would need to work on rather large data sets, running on a single
> process is hardly a viable solution.
>
>
> Best regards
>
> Peder
> ------------------------------
> *Fra:* Zhang, Hong <hzhang at mcs.anl.gov>
> *Sendt:* 19. april 2021 18:34:31
> *Til:* petsc-users at mcs.anl.gov; Peder Jørgensgaard Olesen
> *Emne:* Re: Rather different matrix product results on multiple processes
>
> Peder,
> I tested your code on a linux machine. I got
> $ ./acorr_mwe
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0473e+03
>
> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via
> allgatherv (default)
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0363e+03
>
> mpiexec -n 20 ./acorr_mwe
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0897e+03
>
> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via cyclic
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0363e+03
>
> I use petsc 'main' branch (same as the latest release). You can remove
> MatAssemblyBegin/End calls after MatMatTransposeMult():
> MatMatTransposeMult(data_mat, data_mat, MAT_INITIAL_MATRIX, PETSC_DEFAULT,
> &corr_mat);
> //ierr = MatAssemblyBegin(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
> //ierr = MatAssemblyEnd(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>
> The communication patterns of parallel implementation led to different
> order of floating-point computation, thus slightly different matrix norm of
> R.
> Hong
>
> ------------------------------
> *From:* petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Peder
> Jørgensgaard Olesen via petsc-users <petsc-users at mcs.anl.gov>
> *Sent:* Monday, April 19, 2021 7:57 AM
> *To:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* [petsc-users] Rather different matrix product results on
> multiple processes
>
>
> Hello,
>
>
> When computing a matrix product of the type R = D.DT using
> MatMatTransposeMult() I find I get rather different results depending on
> the number of processes. In one example using a data set that is
> small compared to the application I get Frobenius norms |R| = 1.047e3 on a
> single process, 1.0363e3 on a single HPC node (40 cores), and 9.7307e2 on
> two nodes.
>
>
> I have ascertained that the single process result is indeed the correct
> one (i.e., eigenvectors of R form a proper basis for the columns of D), so
> naturally I'd love to be able to reproduce this result across different
> parallel setups. How might I achieve this?
>
>
> I'm attaching MWE code and the data set used for the example.
>
>
> Thanks in advance!
>
>
> Best Regards
>
>
> Peder Jørgensgaard Olesen
>
> PhD Student, Turbulence Research Lab
>
> Dept. of Mechanical Engineering
>
> Technical University of Denmark
>
> Niels Koppels Allé
>
> Bygning 403, Rum 105
>
> DK-2800 Kgs. Lyngby
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210421/e9c1ea6e/attachment.html>