[petsc-users] Why does CHOLMOD solution time differs between PETSc and MATLAB?
Barry Smith
bsmith at mcs.anl.gov
Sun Nov 23 12:12:03 CST 2014
> On Nov 23, 2014, at 11:52 AM, Victor Magri <victor.antonio.magri at gmail.com> wrote:
>
> On Fri, Nov 21, 2014 at 4:49 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> Here is my conjecture (more than a guess, but less than a full statement of fact). CHOLMOD uses a "supernodel" algorithm which means that it works on dense blocks of the factored matrix and can take advantage of BLAS 2 and 3 operations (unlike almost everything in PETSc).
>
> I though that PETSc would take more advantage from BLAS 2.
Generally not because BLAS 2 is dense matrix times vector while PETSc usually doesn't use dense matrices
>
> 1) the MKL BLAS/LAPACK are likely much faster than the Fortran versions
>
> Your conjecture was right! I compiled PETSc to use mkl.so from my Matlab folder and run the tests again. The computational time was pretty the same in comparison with MATLAB (See attached).
Great.
Performance for the various orderings can be slightly "counter" intuitive if all you count is flops. Here is the info for the default ordering
Common.fl 1.81877e+10 (flop count from most recent analysis)
Common.lnz 4.46748e+07 (fundamental nz in L)
MatSolve 1 1.0 1.3265e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 7 0 0 0 0 0
MatCholFctrSym 1 1.0 6.5606e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 32 0 0 0 0 34 0 0 0 0 0
MatCholFctrNum 1 1.0 1.1193e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 54 0 0 0 0 57 0 0 0 0 0
MatGetOrdering 1 1.0 3.8588e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
Here is the info for nested dissection ordering
Common.fl 1.09806e+10 (flop count from most recent analysis)
Common.lnz 3.31035e+07 (fundamental nz in L)
MatSolve 1 1.0 1.1116e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 5 0 0 0 0 0
MatCholFctrSym 1 1.0 6.3994e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 27 0 0 0 0 29 0 0 0 0 0
MatCholFctrNum 1 1.0 8.5032e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 36 0 0 0 0 38 0 0 0 0 0
MatGetOrdering 1 1.0 5.2249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 22 0 0 0 0 23 0 0 0 0 0
Note that nested dissection requires 1.09806e+10/1.81877e+10 = 60% of the flops needed but the total time for ordering, factoring and solving is higher since the ordering takes .522 seconds.
Note that the solve time with nested dissection 1.1116e-01/1.3265e-01 = 84% so if one has many solvers (but only one ordering and factorization) then nested dissection would be faster.
Did you run with the default PETSc Cholesky? I'd be interested in seeing how long that takes.
Barry
>
> 2) it is possible the MKL BLAS/LAPACK used by MATLAB use multiple threads to take advantage of multiple cores on the hardware.
>
> At least in my computer, no. I opened the system monitor to see if MATLAB would use more than one thread and it didn't.
>
> I think CHOLMOD, used by MATLAB backslash operator, uses AMD (approximate min. degree) ordering scheme by default
>
> Yes, I searched about that and it really does.
>
> For PETSc + MATLAB, use the option -pc_factor_mat_ordering_type qmd
>
> The computation time got bigger! I was thinking that reordering would make it lower. Maybe it doesn't because this matrix comes from a 5-point discretization stencil... I tried also RCM and ND but the best time came from natural.
>
> Thank you so much for helping, Barry and Abhyankar.
>
>
> On Fri, Nov 21, 2014 at 6:34 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> > On Nov 21, 2014, at 1:46 PM, Abhyankar, Shrirang G. <abhyshr at mcs.anl.gov> wrote:
> >
> > I think this may have to do with the matrix reodering scheme. I think CHOLMOD, used by MATLAB backslash operator, uses AMD (approximate min. degree) ordering scheme by default. PETSc's default reordering scheme is nested dissection. For PETSc + MATLAB, use the option -pc_factor_mat_ordering_type qmd
>
> This is possible and definitely worth trying with different orders, but I checked the nonzeros in the matrix and the L factor for both approaches and they are very similar
>
> PETSc call:
> Common.fl 1.81877e+10 (flop count from most recent analysis)
> Common.lnz 4.46748e+07 (fundamental nz in L)
>
> MatLab call:
> flop count: 1.8188e+10
> nnz(L): 4.4675e+07
>
> Unless I am misunderstanding something both factors likely look very similar.
>
> Barry
>
>
> >
> > Shri
> >
> > On Nov 21, 2014, at 11:23 AM, "Victor Magri" <victor.antonio.magri at gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I'm solving a linear system which comes from the discretization of an elliptic equation. The mesh has 1 million control volumes and there are two more constraints for the system.
> >>
> >> First, I've solved it in MATLAB using backslash command, which in turn calls CHOLMOD once the matrix is SPD. Using tic and toc funtions, I could see that it takes about 2s for the system to be solved (See attached for more details).
> >>
> >> I wrote both matrix and rhs in PETSc binary format and solved the linear system using the example problem defined in ksp/examples/tutorials/ex10.c. The command line options were:
> >>
> >> mpiexec -n -1 ./ex10 -f A1mi.dat -rhs rhs1mi.dat -log_summary -ksp_monitos -ksp_view -ksp_type preonly -pc_type cholesky -pc_factor_mat_solver_package cholmod
> >>
> >> The configuration parameters for CHOLMOD in PETSc and MATLAB seems the same, so I was expecting a similar computation time, but it took about 13s to solve in PETSc. (See attached for more details).
> >>
> >> Why does CHOLMOD solution time differs between them?
> >>
> >> If someone wants to try for yourself, here are the binaries for the matrix and the rhs, respectively.
> >> https://www.dropbox.com/s/q9s6mlrmv1qdxon/A1mi.dat?dl=0
> >> https://www.dropbox.com/s/2zadcbjg5d9bycy/rhs1mi.dat?dl=0
> >>
> >>
> >>
> >> <PetscOutput.dat>
> >> <MatlabOutput.dat>
>
>
> <rcm.dat><qmd.dat><nd.dat><natural.dat>
More information about the petsc-users
mailing list