[petsc-users] Performance of mumps vs. Intel Pardiso

Hong hzhang at mcs.anl.gov
Mon Jun 27 20:40:57 CDT 2016


Faraz :
Direct sparse solvers are generally not scalable -- they are used for
ill-conditioned problems which cannot be solved by iterative methods.

Can you try sequential symbolic factorization instead of parallel, i.e.,
use mumps default '-mat_mumps_icntl_28 1'?

Hong

Thanks for the quick response. Here are the log_summary for 24, 48 and 72
> cpus:
>
> 24 cpus
> ======
> MatSolve               1 1.0 1.8100e+00 1.0 0.00e+00 0.0 7.0e+02 7.4e+04
> 3.0e+00  0  0 68  3  9   0  0 68  3  9     0
> MatCholFctrSym         1 1.0 4.6683e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00  6  0  0  0 15   6  0  0  0 15     0
> MatCholFctrNum         1 1.0 5.8129e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 78  0  0  0  0  78  0  0  0  0     0
>
> 48 cpus
> ======
> MatSolve               1 1.0 1.4915e+00 1.0 0.00e+00 0.0 1.6e+03 3.3e+04
> 3.0e+00  0  0 68  3  9   0  0 68  3  9     0
> MatCholFctrSym         1 1.0 5.3486e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00  9  0  0  0 15   9  0  0  0 15     0
> MatCholFctrNum         1 1.0 4.0803e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 71  0  0  0  0  71  0  0  0  0     0
>
> 72 cpus
> ======
> MatSolve               1 1.0 7.7200e+00 1.1 0.00e+00 0.0 2.6e+03 2.0e+04
> 3.0e+00  1  0 68  2  9   1  0 68  2  9     0
> MatCholFctrSym         1 1.0 1.8439e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00 29  0  0  0 15  29  0  0  0 15     0
> MatCholFctrNum         1 1.0 3.3969e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 53  0  0  0  0  53  0  0  0  0     0
>
> Does this look normal or is something off here? Regarding reordering
> algorithm of Pardiso. At this time I do not know much about that. I will do
> some research and see what I can learn. However,  I believe Mumps only has
> two options:
>
>         -mat_mumps_icntl_29     - ICNTL(29): parallel ordering 1 =
> ptscotch, 2 = parmetis
>
> I have tried both and do not see any speed difference. Or are you
> referring to some other kind of reordering?
>
>
> --------------------------------------------
> On Mon, 6/27/16, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>  Subject: Re: [petsc-users] Performance of mumps vs. Intel Pardiso
>  To: "Faraz Hussain" <faraz_hussain at yahoo.com>
>  Cc: "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
>  Date: Monday, June 27, 2016, 5:50 PM
>
>
>     These are the only lines that
>  matter
>
>  MatSolve
>               1 1.0 7.7200e+00 1.1 0.00e+00
>  0.0 2.6e+03 2.0e+04 3.0e+00  1  0 68  2
>  9   1  0 68  2  9     0
>  MatCholFctrSym         1 1.0
>  1.8439e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 29  0
>  0  0 15  29  0  0  0 15     0
>  MatCholFctrNum         1 1.0
>  3.3969e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 53  0
>  0  0  0  53  0  0  0  0     0
>
>  look at the log summary for 24
>  and 48 processes. How are the symbolic and numeric parts
>  scaling with the number of processes?
>
>  Things that could affect the performance a lot.
>  Is the symbolic factorization done in parallel? What
>  reordering is used? If Pardiso is using a reordering that is
>  better for this matrix and has (much) lower fill that could
>  explain why it is so much faster.
>
>   Perhaps correspond with the MUMPS developers
>  on what MUMPS options might make it faster
>
>    Barry
>
>
>  > On Jun 27, 2016, at 5:39 PM, Faraz Hussain
>  <faraz_hussain at yahoo.com>
>  wrote:
>  >
>  > I am
>  struggling trying to understand why mumps is so much slower
>  than Intel Pardiso solver for my simple test matrix (
>  3million^2 sparse symmetrix matrix with ~1000 non-zero
>  entries per line ).
>  >
>  > My compute nodes have 24 cpus each. Intel
>  Pardiso solves it in in 120 seconds using all 24 cpus of one
>  node. With Mumps I get:
>  >
>  > 24 cpus - 765 seconds
>  >
>  48 cpus - 401 seconds
>  > 72 cpus - 344
>  seconds
>  > beyond 72 cpus no speed
>  improvement.
>  >
>  > I am attaching the -log_summary to see if
>  there is something wrong in how I am solving the problem. I
>  am really hoping mumps will be faster when using more cpus..
>  Otherwise I will have to abort my exploration of
>  mumps!<log_summary.o265103>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160627/e6c33f90/attachment.html>


More information about the petsc-users mailing list