[petsc-users] Rather different matrix product results on multiple processes
Hapla Vaclav
vaclav.hapla at erdw.ethz.ch
Thu Jun 3 05:04:23 CDT 2021
Dear Peder
The problem with HDF5 MATDENSE loader should be fixed now in the main branch.
Your datafile is now stored at https://gitlab.com/petsc/datafiles/-/blob/master/matrices/hdf5/sample_data.h5 if you are ok with that, and is used in the src/mat/tests/ex84.c test.
Let me note that
1) PETSc uses "Fortran storage convention" (= column-major) for dense matrices. However, HDF5 uses "C storage convention" (= row-major), assuming that the last listed dimension is the fastest-changing dimension and the first-listed dimension is the slowest changing.
[https://support.hdfgroup.org/HDF5/doc/UG/HDF5_Users_Guide-Responsive%20HTML5/index.html#t=HDF5_Users_Guide%2FDataspaces%2FHDF5_Dataspaces_and_Partial_I_O.htm%3Frhhlterm%3D%2522last%2520listed%2520dimension%2522%26rhsyns%3D%2520]
Hence, we decide to store dense matrices "transposed", i.e. dimension 0 is columns and dimension 1 is rows. So a matrix whose h5dump is
DATASET "B" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 3, 4 ) / ( 3, 4 ) }
DATA {
(0,0): 1, 4, 7, 10,
(1,0): 2, 5, 8, 11,
(2,0): 3, 6, 9, 12
}
will be loaded as
[ 1 2 3
4 5 6
7 8 9
10 11 12]
Another reason for this is to simplify compatibility with MATLAB. Real/imaginary part of complex numbers is the last dimension in any case.
Please check whether your dataset should be transposed.
2) There was a bug - wrong interpretation of dimensions if "MATLAB_class" attribute was not present - resolved in the merge request 4044<https://gitlab.com/petsc/petsc/-/merge_requests/4044>.
3) Complex numbers were not really supported which is now fixed in the same MR.
4) An unfortunate thing is there is currently no MatView() implementation for MATDENSE and HDF5 which would easily show how the datafile should look like. I hope to fix this soon as well.
Sorry for the delay and thanks for reporting.
Vaclav Hapla
On 22 Apr 2021, at 11:05, Peder Jørgensgaard Olesen via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Dear Stefano and Jose
Thank you for your replies. Using SVD works like a charm. I'll try to do some trickery to work around the HDF5 reader bug.
Best regards
Peder
________________________________
Fra: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>
Sendt: 21. april 2021 14:24:38
Til: Peder Jørgensgaard Olesen
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>; Stefano Zampini
Emne: Re: [petsc-users] Rather different matrix product results on multiple processes
Independently of the bug mentioned by Stefano, you may want to consider using SLEPc's SVD instead of EPS. Left singular vectors of D are equal to eigenvectors of D*D', see chapter 4 of SLEPc's users manual. The default solver 'cross' gives you flexibility to compute the product D*D' explicitly or not, and build the transpose explicitly or not.
Jose
> El 21 abr 2021, a las 12:54, Stefano Zampini <stefano.zampini at gmail.com<mailto:stefano.zampini at gmail.com>> escribió:
>
> Here you have, https://gitlab.com/petsc/petsc/-/merge_requests/3903. We can discuss the issue on gitlab.
>
> Thanks
> Stefano
>
> Il giorno mer 21 apr 2021 alle ore 13:39 Stefano Zampini <stefano.zampini at gmail.com<mailto:stefano.zampini at gmail.com>> ha scritto:
> Peder
>
> I have slightly modified your code and I confirm the bug.
> The bug is not with the MatMatTranspose operation; it is within the HDF5 reader. I will soon open an MR with the code and discussing the issues.
>
> Thanks for reporting the issue
> Stefano
>
> Il giorno mer 21 apr 2021 alle ore 12:22 Peder Jørgensgaard Olesen via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> ha scritto:
> Dear Hong
>
>
>
> Thank your for your reply.
>
>
>
> I have a hunch that the issue goes beyond the minor differences that might arise from floating-point computation order, however.
>
>
>
> Writing the product matrix to a binary file using MatView() and inspecting the output shows very different entries depending on the number of processes. Here are the first three rows and columns of the product matrix obtained in a sequential run:
>
> 2.58348 1.68202 1.66302
>
> 1.68202 4.27506 1.91897
>
> 1.66302 1.91897 2.70028
>
>
>
> - and the corresponding part of the product matrix obtained on one node (40 processes):
>
> 4.43536 2.17261 0.16430
>
> 2.17261 4.53224 2.53210
>
> 0.16430 2.53210 4.73234
>
>
>
> The parallel result is not even close to the sequential one. Trying different numbers of processes produces yet different results.
>
>
>
> Also, the eigenvectors that I subsequently determine using a SLEPC solver do not form a proper basis for the column space of the data matrix as they must, which is hardly a surprise given the variability of results indicated above - except when the code is run on just a single process. Forming such a basis central to the intended application, and given that it would need to work on rather large data sets, running on a single process is hardly a viable solution.
>
>
>
> Best regards
>
> Peder
>
> Fra: Zhang, Hong <hzhang at mcs.anl.gov<mailto:hzhang at mcs.anl.gov>>
> Sendt: 19. april 2021 18:34:31
> Til: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>; Peder Jørgensgaard Olesen
> Emne: Re: Rather different matrix product results on multiple processes
>
> Peder,
> I tested your code on a linux machine. I got
> $ ./acorr_mwe
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0473e+03
>
> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via allgatherv (default)
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0363e+03
>
> mpiexec -n 20 ./acorr_mwe
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0897e+03
>
> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via cyclic
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0363e+03
>
> I use petsc 'main' branch (same as the latest release). You can remove MatAssemblyBegin/End calls after MatMatTransposeMult():
> MatMatTransposeMult(data_mat, data_mat, MAT_INITIAL_MATRIX, PETSC_DEFAULT, &corr_mat);
> //ierr = MatAssemblyBegin(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
> //ierr = MatAssemblyEnd(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>
> The communication patterns of parallel implementation led to different order of floating-point computation, thus slightly different matrix norm of R.
> Hong
>
> From: petsc-users <petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov>> on behalf of Peder Jørgensgaard Olesen via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> Sent: Monday, April 19, 2021 7:57 AM
> To: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> Subject: [petsc-users] Rather different matrix product results on multiple processes
>
> Hello,
>
> When computing a matrix product of the type R = D.DT using MatMatTransposeMult() I find I get rather different results depending on the number of processes. In one example using a data set that is small compared to the application I get Frobenius norms |R| = 1.047e3 on a single process, 1.0363e3 on a single HPC node (40 cores), and 9.7307e2 on two nodes.
>
> I have ascertained that the single process result is indeed the correct one (i.e., eigenvectors of R form a proper basis for the columns of D), so naturally I'd love to be able to reproduce this result across different parallel setups. How might I achieve this?
>
> I'm attaching MWE code and the data set used for the example.
>
> Thanks in advance!
>
> Best Regards
>
> Peder Jørgensgaard Olesen
> PhD Student, Turbulence Research Lab
> Dept. of Mechanical Engineering
> Technical University of Denmark
> Niels Koppels Allé
> Bygning 403, Rum 105
> DK-2800 Kgs. Lyngby
>
>
> --
> Stefano
>
>
> --
> Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210603/b744753f/attachment.html>
More information about the petsc-users
mailing list