[petsc-users] Rather different matrix product results on multiple processes
Zhang, Hong
hzhang at mcs.anl.gov
Mon Apr 19 11:34:31 CDT 2021
Peder,
I tested your code on a linux machine. I got
$ ./acorr_mwe
Data matrix norm: 5.0538e+01
Autocorrelation matrix norm: 1.0473e+03
mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via allgatherv (default)
Data matrix norm: 5.0538e+01
Autocorrelation matrix norm: 1.0363e+03
mpiexec -n 20 ./acorr_mwe
Data matrix norm: 5.0538e+01
Autocorrelation matrix norm: 1.0897e+03
mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via cyclic
Data matrix norm: 5.0538e+01
Autocorrelation matrix norm: 1.0363e+03
I use petsc 'main' branch (same as the latest release). You can remove MatAssemblyBegin/End calls after MatMatTransposeMult():
MatMatTransposeMult(data_mat, data_mat, MAT_INITIAL_MATRIX, PETSC_DEFAULT, &corr_mat);
//ierr = MatAssemblyBegin(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
//ierr = MatAssemblyEnd(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
The communication patterns of parallel implementation led to different order of floating-point computation, thus slightly different matrix norm of R.
Hong
________________________________
From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Peder Jørgensgaard Olesen via petsc-users <petsc-users at mcs.anl.gov>
Sent: Monday, April 19, 2021 7:57 AM
To: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: [petsc-users] Rather different matrix product results on multiple processes
Hello,
When computing a matrix product of the type R = D.DT using MatMatTransposeMult() I find I get rather different results depending on the number of processes. In one example using a data set that is small compared to the application I get Frobenius norms |R| = 1.047e3 on a single process, 1.0363e3 on a single HPC node (40 cores), and 9.7307e2 on two nodes.
I have ascertained that the single process result is indeed the correct one (i.e., eigenvectors of R form a proper basis for the columns of D), so naturally I'd love to be able to reproduce this result across different parallel setups. How might I achieve this?
I'm attaching MWE code and the data set used for the example.
Thanks in advance!
Best Regards
Peder Jørgensgaard Olesen
PhD Student, Turbulence Research Lab
Dept. of Mechanical Engineering
Technical University of Denmark
Niels Koppels Allé
Bygning 403, Rum 105
DK-2800 Kgs. Lyngby
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210419/fde0c014/attachment.html>
More information about the petsc-users
mailing list