<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
Dear Peder
<div class=""><br class="">
</div>
<div class="">The problem with HDF5 MATDENSE loader should be fixed now in the main branch.</div>
<div class=""><br class="">
</div>
<div class="">Your datafile is now stored at <a href="https://gitlab.com/petsc/datafiles/-/blob/master/matrices/hdf5/sample_data.h5" class="">https://gitlab.com/petsc/datafiles/-/blob/master/matrices/hdf5/sample_data.h5</a> if you are ok with that, and is used
in the src/mat/tests/ex84.c test.</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">Let me note that</div>
<div class="">1) PETSc uses "Fortran storage convention" (= column-major) for dense matrices. However, HDF5 uses "C storage convention" (= row-major), assuming that the last listed dimension is the fastest-changing dimension and the first-listed dimension is
the slowest changing.</div>
<div class="">[<a href="https://support.hdfgroup.org/HDF5/doc/UG/HDF5_Users_Guide-Responsive%20HTML5/index.html#t=HDF5_Users_Guide%2FDataspaces%2FHDF5_Dataspaces_and_Partial_I_O.htm%3Frhhlterm%3D%2522last%2520listed%2520dimension%2522%26rhsyns%3D%2520" class="">https://support.hdfgroup.org/HDF5/doc/UG/HDF5_Users_Guide-Responsive%20HTML5/index.html#t=HDF5_Users_Guide%2FDataspaces%2FHDF5_Dataspaces_and_Partial_I_O.htm%3Frhhlterm%3D%2522last%2520listed%2520dimension%2522%26rhsyns%3D%2520</a>]</div>
<div class=""><br class="">
</div>
<div class="">Hence, we decide to store dense matrices "transposed", i.e. dimension 0 is columns and dimension 1 is rows. So a matrix whose h5dump is</div>
<div class="">
<div class=""> DATASET "B" {</div>
<div class=""> DATATYPE H5T_IEEE_F64LE</div>
<div class=""> DATASPACE SIMPLE { ( 3, 4 ) / ( 3, 4 ) }</div>
<div class=""> DATA {</div>
<div class=""> (0,0): 1, 4, 7, 10,</div>
<div class=""> (1,0): 2, 5, 8, 11,</div>
<div class=""> (2,0): 3, 6, 9, 12</div>
<div class=""> }</div>
</div>
<div class=""><br class="">
</div>
<div class="">will be loaded as</div>
<div class="">
<div class=""> [ 1 2 3</div>
<div class=""> 4 5 6</div>
<div class=""> 7 8 9</div>
<div class=""> 10 11 12]</div>
</div>
<div class=""><br class="">
</div>
<div class="">Another reason for this is to simplify compatibility with MATLAB. Real/imaginary part of complex numbers is the last dimension in any case.</div>
<div class=""><br class="">
</div>
<div class="">Please check whether your dataset should be transposed.</div>
<div class=""><br class="">
</div>
<div class="">2) There was a bug - wrong interpretation of dimensions if "MATLAB_class" attribute was not present - <span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">resolved in the </span><a href="https://gitlab.com/petsc/petsc/-/merge_requests/4044" class="">merge
request 4044</a><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">.</span></div>
<div class=""><br class="">
</div>
<div class="">3) Complex numbers were not really supported which is now fixed in the same MR.</div>
<div class=""><br class="">
</div>
<div class="">4) An unfortunate thing is there is currently no MatView() implementation for MATDENSE and HDF5 which would easily show how the datafile should look like. I hope to fix this soon as well.</div>
<div class=""><br class="">
</div>
<div class="">Sorry for the delay and thanks for reporting.</div>
<div class=""><br class="">
</div>
<div class="">Vaclav Hapla</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">On 22 Apr 2021, at 11:05, Peder Jørgensgaard Olesen via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" class="">petsc-users@mcs.anl.gov</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size: 12pt; font-family: Calibri, Helvetica, sans-serif;" class="">
<div style="margin-top: 0px; margin-bottom: 0px;" class="">Dear Stefano and Jose</div>
<div style="margin-top: 0px; margin-bottom: 0px;" class=""><br class="">
</div>
<div style="margin-top: 0px; margin-bottom: 0px;" class="">Thank you for your replies. Using SVD works like a charm. I'll try to do some trickery to work around the HDF5 reader bug.</div>
<div style="margin-top: 0px; margin-bottom: 0px;" class=""><br class="">
</div>
<div style="margin-top: 0px; margin-bottom: 0px;" class="">Best regards</div>
<div style="margin-top: 0px; margin-bottom: 0px;" class="">Peder</div>
</div>
<hr tabindex="-1" style="display: inline-block; width: 658.546875px;" class="">
<div id="x_divRplyFwdMsg" dir="ltr" class=""><font face="Calibri, sans-serif" style="font-size: 11pt;" class=""><b class="">Fra:</b><span class="Apple-converted-space"> </span>Jose E. Roman <<a href="mailto:jroman@dsic.upv.es" class="">jroman@dsic.upv.es</a>><br class="">
<b class="">Sendt:</b><span class="Apple-converted-space"> </span>21. april 2021 14:24:38<br class="">
<b class="">Til:</b><span class="Apple-converted-space"> </span>Peder Jørgensgaard Olesen<br class="">
<b class="">Cc:</b><span class="Apple-converted-space"> </span><a href="mailto:petsc-users@mcs.anl.gov" class="">petsc-users@mcs.anl.gov</a>; Stefano Zampini<br class="">
<b class="">Emne:</b><span class="Apple-converted-space"> </span>Re: [petsc-users] Rather different matrix product results on multiple processes</font>
<div class=""> </div>
</div>
</div>
<font size="2" style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="font-size: 10pt;" class="">
<div class="PlainText">Independently of the bug mentioned by Stefano, you may want to consider using SLEPc's SVD instead of EPS. Left singular vectors of D are equal to eigenvectors of D*D', see chapter 4 of SLEPc's users manual. The default solver 'cross'
gives you flexibility to compute the product D*D' explicitly or not, and build the transpose explicitly or not.<br class="">
<br class="">
Jose<br class="">
<br class="">
<br class="">
> El 21 abr 2021, a las 12:54, Stefano Zampini <<a href="mailto:stefano.zampini@gmail.com" class="">stefano.zampini@gmail.com</a>> escribió:<br class="">
><span class="Apple-converted-space"> </span><br class="">
> Here you have,<span class="Apple-converted-space"> </span><a href="https://gitlab.com/petsc/petsc/-/merge_requests/3903" class="">https://gitlab.com/petsc/petsc/-/merge_requests/3903</a>. We can discuss the issue on gitlab.<br class="">
><span class="Apple-converted-space"> </span><br class="">
> Thanks<br class="">
> Stefano<br class="">
><span class="Apple-converted-space"> </span><br class="">
> Il giorno mer 21 apr 2021 alle ore 13:39 Stefano Zampini <<a href="mailto:stefano.zampini@gmail.com" class="">stefano.zampini@gmail.com</a>> ha scritto:<br class="">
> Peder<br class="">
><span class="Apple-converted-space"> </span><br class="">
> I have slightly modified your code and I confirm the bug.<br class="">
> The bug is not with the MatMatTranspose operation; it is within the HDF5 reader. I will soon open an MR with the code and discussing the issues.<br class="">
><span class="Apple-converted-space"> </span><br class="">
> Thanks for reporting the issue<br class="">
> Stefano<br class="">
><span class="Apple-converted-space"> </span><br class="">
> Il giorno mer 21 apr 2021 alle ore 12:22 Peder Jørgensgaard Olesen via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" class="">petsc-users@mcs.anl.gov</a>> ha scritto:<br class="">
> Dear Hong<br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
> Thank your for your reply.<br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
> I have a hunch that the issue goes beyond the minor differences that might arise from floating-point computation order, however.<br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
> Writing the product matrix to a binary file using MatView() and inspecting the output shows very different entries depending on the number of processes. Here are the first three rows and columns of the product matrix obtained in a sequential run:<br class="">
><span class="Apple-converted-space"> </span><br class="">
> 2.58348 1.68202 1.66302<br class="">
><span class="Apple-converted-space"> </span><br class="">
> 1.68202 4.27506 1.91897<br class="">
><span class="Apple-converted-space"> </span><br class="">
> 1.66302 1.91897 2.70028<br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
> - and the corresponding part of the product matrix obtained on one node (40 processes):<br class="">
><span class="Apple-converted-space"> </span><br class="">
> 4.43536 2.17261 0.16430<br class="">
><span class="Apple-converted-space"> </span><br class="">
> 2.17261 4.53224 2.53210<br class="">
><span class="Apple-converted-space"> </span><br class="">
> 0.16430 2.53210 4.73234<br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
> The parallel result is not even close to the sequential one. Trying different numbers of processes produces yet different results.<br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
> Also, the eigenvectors that I subsequently determine using a SLEPC solver do not form a proper basis for the column space of the data matrix as they must, which is hardly a surprise given the variability of results indicated above - except when the code is
run on just a single process. Forming such a basis central to the intended application, and given that it would need to work on rather large data sets, running on a single process is hardly a viable solution.<br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
> Best regards<br class="">
><span class="Apple-converted-space"> </span><br class="">
> Peder<br class="">
><span class="Apple-converted-space"> </span><br class="">
> Fra: Zhang, Hong <<a href="mailto:hzhang@mcs.anl.gov" class="">hzhang@mcs.anl.gov</a>><br class="">
> Sendt: 19. april 2021 18:34:31<br class="">
> Til: <a href="mailto:petsc-users@mcs.anl.gov" class="">petsc-users@mcs.anl.gov</a>; Peder Jørgensgaard Olesen<br class="">
> Emne: Re: Rather different matrix product results on multiple processes<br class="">
> <span class="Apple-converted-space"> </span><br class="">
> Peder,<br class="">
> I tested your code on a linux machine. I got<br class="">
> $ ./acorr_mwe<br class="">
> Data matrix norm: 5.0538e+01<br class="">
> Autocorrelation matrix norm: 1.0473e+03<br class="">
><span class="Apple-converted-space"> </span><br class="">
> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via allgatherv (default)<br class="">
> Data matrix norm: 5.0538e+01<br class="">
> Autocorrelation matrix norm: 1.0363e+03<br class="">
><span class="Apple-converted-space"> </span><br class="">
> mpiexec -n 20 ./acorr_mwe<br class="">
> Data matrix norm: 5.0538e+01<br class="">
> Autocorrelation matrix norm: 1.0897e+03<br class="">
><span class="Apple-converted-space"> </span><br class="">
> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via cyclic<br class="">
> Data matrix norm: 5.0538e+01<br class="">
> Autocorrelation matrix norm: 1.0363e+03<br class="">
><span class="Apple-converted-space"> </span><br class="">
> I use petsc 'main' branch (same as the latest release). You can remove MatAssemblyBegin/End calls after MatMatTransposeMult():<br class="">
> MatMatTransposeMult(data_mat, data_mat, MAT_INITIAL_MATRIX, PETSC_DEFAULT, &corr_mat);<span class="Apple-converted-space"> </span><br class="">
> //ierr = MatAssemblyBegin(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); <span class="Apple-converted-space"> </span><br class="">
> //ierr = MatAssemblyEnd(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);<br class="">
><span class="Apple-converted-space"> </span><br class="">
> The communication patterns of parallel implementation led to different order of floating-point computation, thus slightly different matrix norm of R.<span class="Apple-converted-space"> </span><br class="">
> Hong<br class="">
><span class="Apple-converted-space"> </span><br class="">
> From: petsc-users <<a href="mailto:petsc-users-bounces@mcs.anl.gov" class="">petsc-users-bounces@mcs.anl.gov</a>> on behalf of Peder Jørgensgaard Olesen via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" class="">petsc-users@mcs.anl.gov</a>><br class="">
> Sent: Monday, April 19, 2021 7:57 AM<br class="">
> To: <a href="mailto:petsc-users@mcs.anl.gov" class="">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" class="">petsc-users@mcs.anl.gov</a>><br class="">
> Subject: [petsc-users] Rather different matrix product results on multiple processes<br class="">
> <span class="Apple-converted-space"> </span><br class="">
> Hello,<br class="">
><span class="Apple-converted-space"> </span><br class="">
> When computing a matrix product of the type R = D.DT using MatMatTransposeMult() I find I get rather different results depending on the number of processes. In one example using a data set that is small compared to the application I get Frobenius norms |R|
= 1.047e3 on a single process, 1.0363e3 on a single HPC node (40 cores), and 9.7307e2 on two nodes.<br class="">
><span class="Apple-converted-space"> </span><br class="">
> I have ascertained that the single process result is indeed the correct one (i.e., eigenvectors of R form a proper basis for the columns of D), so naturally I'd love to be able to reproduce this result across different parallel setups. How might I achieve
this?<br class="">
><span class="Apple-converted-space"> </span><br class="">
> I'm attaching MWE code and the data set used for the example.<br class="">
><span class="Apple-converted-space"> </span><br class="">
> Thanks in advance!<br class="">
><span class="Apple-converted-space"> </span><br class="">
> Best Regards<br class="">
><span class="Apple-converted-space"> </span><br class="">
> Peder Jørgensgaard Olesen<br class="">
> PhD Student, Turbulence Research Lab<br class="">
> Dept. of Mechanical Engineering<br class="">
> Technical University of Denmark<br class="">
> Niels Koppels Allé<br class="">
> Bygning 403, Rum 105<br class="">
> DK-2800 Kgs. Lyngby<br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
> --<span class="Apple-converted-space"> </span><br class="">
> Stefano<br class="">
><span class="Apple-converted-space"> </span><br class="">
><span class="Apple-converted-space"> </span><br class="">
> --<span class="Apple-converted-space"> </span><br class="">
> Stefano</div>
</span></font></div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>