[petsc-users] Performance of mumps vs. Intel Pardiso

Tue Jun 28 13:40:50 CDT 2016

Faraz :
The results look reasonable to me. I guess you collect strong speedup,
i.e., fixed problem size while increasing cpus. How large is your matrix?

Thanks, the solve times are faster after  I tried sequential symbolic
> factorization instead of parallel. However, they are still slower than
> Pardiso with 24 cpus ( 120 seconds ). I am not sure if it a configuration
> issue on my end or a limitation of mumps?
>
> How do you run  Pardiso with 24 cpus?

Is it possible for someone else to solve my matrix to verify they get the
> same times? If not, I will contact mumps developers to see if I can send
> them my matrix to benchmark.
>
mumps developers would give you better suggestions.

Hong

>
> --------------------------------------------
> On Mon, 6/27/16, Hong <hzhang at mcs.anl.gov> wrote:
>
>  Subject: Re: [petsc-users] Performance of mumps vs. Intel Pardiso
>  To: "Faraz Hussain" <faraz_hussain at yahoo.com>
>  Cc: "Barry Smith" <bsmith at mcs.anl.gov>, "petsc-users at mcs.anl.gov" <
> petsc-users at mcs.anl.gov>
>  Date: Monday, June 27, 2016, 8:40 PM
>
>  Faraz :Direct sparse solvers are
>  generally not scalable -- they are used for ill-conditioned
>  problems which cannot be solved by iterative
>  methods.
>  Can
>  you try sequential symbolic factorization instead of
>  parallel, i.e., use mumps default '-mat_mumps_icntl_28
>  1'?
>  Hong
>  Thanks
>  for the quick response. Here are the log_summary for 24, 48
>  and 72 cpus:
>
>
>
>  24 cpus
>
>  ======
>
>  MatSolve               1 1.0 1.8100e+00 1.0 0.00e+00
>  0.0 7.0e+02 7.4e+04 3.0e+00  0  0 68  3  9   0  0
>  68  3  9     0
>
>  MatCholFctrSym         1 1.0 4.6683e+01 1.0 0.00e+00
>  0.0 0.0e+00 0.0e+00 5.0e+00  6  0  0  0 15   6  0
>  0  0 15     0
>
>  MatCholFctrNum         1 1.0 5.8129e+02 1.0 0.00e+00
>  0.0 0.0e+00 0.0e+00 0.0e+00 78  0  0  0  0  78  0
>  0  0  0     0
>
>
>
>  48 cpus
>
>  ======
>
>  MatSolve               1 1.0 1.4915e+00 1.0 0.00e+00
>  0.0 1.6e+03 3.3e+04 3.0e+00  0  0 68  3  9   0  0
>  68  3  9     0
>
>  MatCholFctrSym         1 1.0 5.3486e+01 1.0 0.00e+00
>  0.0 0.0e+00 0.0e+00 5.0e+00  9  0  0  0 15   9  0
>  0  0 15     0
>
>  MatCholFctrNum         1 1.0 4.0803e+02 1.0 0.00e+00
>  0.0 0.0e+00 0.0e+00 0.0e+00 71  0  0  0  0  71  0
>  0  0  0     0
>
>
>
>  72 cpus
>
>  ======
>
>  MatSolve               1
>  1.0 7.7200e+00 1.1 0.00e+00 0.0 2.6e+03 2.0e+04 3.0e+00
>  1  0 68  2  9   1  0 68  2  9     0
>
>  MatCholFctrSym         1 1.0 1.8439e+02 1.0 0.00e+00
>  0.0 0.0e+00 0.0e+00 5.0e+00 29  0  0  0 15  29  0  0
>  0 15     0
>
>  MatCholFctrNum         1 1.0 3.3969e+02 1.0 0.00e+00
>  0.0 0.0e+00 0.0e+00 0.0e+00 53  0  0  0  0  53  0
>  0  0  0     0
>
>
>
>  Does this look normal or is something off here?
>  Regarding reordering algorithm of Pardiso. At this time I do
>  not know much about that. I will do some research and see
>  what I can learn. However,  I believe Mumps only has two
>  options:
>
>
>
>          -mat_mumps_icntl_29     - ICNTL(29): parallel
>  ordering 1 = ptscotch, 2 = parmetis
>
>
>
>  I have tried both and do not see any speed difference. Or
>  are you referring to some other kind of reordering?
>
>
>
>
>
>  --------------------------------------------
>
>  On Mon, 6/27/16, Barry Smith <bsmith at mcs.anl.gov>
>  wrote:
>
>
>
>   Subject: Re: [petsc-users] Performance of mumps vs. Intel
>  Pardiso
>
>   To: "Faraz Hussain" <faraz_hussain at yahoo.com>
>
>   Cc: "petsc-users at mcs.anl.gov"
>  <petsc-users at mcs.anl.gov>
>
>   Date: Monday, June 27, 2016, 5:50 PM
>
>
>
>
>
>      These are the only lines that
>
>   matter
>
>
>
>   MatSolve
>
>                1 1.0 7.7200e+00 1.1 0.00e+00
>
>   0.0 2.6e+03 2.0e+04 3.0e+00  1  0 68  2
>
>   9   1  0 68  2  9     0
>
>   MatCholFctrSym         1 1.0
>
>   1.8439e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 29
>  0
>
>   0  0 15  29  0  0  0 15     0
>
>   MatCholFctrNum         1 1.0
>
>   3.3969e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 53
>  0
>
>   0  0  0  53  0  0  0  0     0
>
>
>
>   look at the log summary for 24
>
>   and 48 processes. How are the symbolic and numeric
>  parts
>
>   scaling with the number of processes?
>
>
>
>   Things that could affect the performance a lot.
>
>   Is the symbolic factorization done in parallel? What
>
>   reordering is used? If Pardiso is using a reordering that
>  is
>
>   better for this matrix and has (much) lower fill that
>  could
>
>   explain why it is so much faster.
>
>
>
>    Perhaps correspond with the MUMPS developers
>
>   on what MUMPS options might make it faster
>
>
>
>     Barry
>
>
>
>
>
>   > On Jun 27, 2016, at 5:39 PM, Faraz Hussain
>
>   <faraz_hussain at yahoo.com>
>
>   wrote:
>
>   >
>
>   > I am
>
>   struggling trying to understand why mumps is so much
>  slower
>
>   than Intel Pardiso solver for my simple test matrix (
>
>   3million^2 sparse symmetrix matrix with ~1000 non-zero
>
>   entries per line ).
>
>   >
>
>   > My compute nodes have 24 cpus each. Intel
>
>   Pardiso solves it in in 120 seconds using all 24 cpus of
>  one
>
>   node. With Mumps I get:
>
>   >
>
>   > 24 cpus - 765 seconds
>
>   >
>
>   48 cpus - 401 seconds
>
>   > 72 cpus - 344
>
>   seconds
>
>   > beyond 72 cpus no speed
>
>   improvement.
>
>   >
>
>   > I am attaching the -log_summary to see if
>
>   there is something wrong in how I am solving the problem.
>  I
>
>   am really hoping mumps will be faster when using more
>  cpus..
>
>   Otherwise I will have to abort my exploration of
>
>   mumps!<log_summary.o265103>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160628/3470fe42/attachment.html>