[petsc-users] Performance of mumps vs. Intel Pardiso

Faraz Hussain faraz_hussain at yahoo.com
Tue Jun 28 13:09:55 CDT 2016


Thanks, the solve times are faster after  I tried sequential symbolic factorization instead of parallel. However, they are still slower than Pardiso with 24 cpus ( 120 seconds ). I am not sure if it a configuration issue on my end or a limitation of mumps?

Is it possible for someone else to solve my matrix to verify they get the same times? If not, I will contact mumps developers to see if I can send them my matrix to benchmark.

24 cpus
======
MatCholFctrSym         1 1.0 1.0325e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 16  0  0  0 15  16  0  0  0 $
MatCholFctrNum         1 1.0 4.2542e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 67  0  0  0  0  67  0  0  0 $

48 cpus
======
MatCholFctrSym         1 1.0 4.3957e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 12  0  0  0 15  12  0  0  0 $
MatCholFctrNum         1 1.0 2.5982e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 70  0  0  0  0  70  0  0  0 $

72 cpus
======
MatCholFctrSym         1 1.0 4.3596e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 12  0  0  0 15  12  0  0  0 $
MatCholFctrNum         1 1.0 2.2195e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 59  0  0  0  0  59  0  0  0 $

240 cpus
=======
MatCholFctrSym         1 1.0 4.7354e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 13  0  0  0 15  13  0  0  0 $
MatCholFctrNum         1 1.0 1.8543e+02 1.0 0.00e+00 0.0 0.0e+00 


--------------------------------------------
On Mon, 6/27/16, Hong <hzhang at mcs.anl.gov> wrote:

 Subject: Re: [petsc-users] Performance of mumps vs. Intel Pardiso
 To: "Faraz Hussain" <faraz_hussain at yahoo.com>
 Cc: "Barry Smith" <bsmith at mcs.anl.gov>, "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
 Date: Monday, June 27, 2016, 8:40 PM
 
 Faraz :Direct sparse solvers are
 generally not scalable -- they are used for ill-conditioned
 problems which cannot be solved by iterative
 methods. 
 Can
 you try sequential symbolic factorization instead of
 parallel, i.e., use mumps default '-mat_mumps_icntl_28
 1'?
 Hong
 Thanks
 for the quick response. Here are the log_summary for 24, 48
 and 72 cpus:
 
 
 
 24 cpus
 
 ======
 
 MatSolve               1 1.0 1.8100e+00 1.0 0.00e+00
 0.0 7.0e+02 7.4e+04 3.0e+00  0  0 68  3  9   0  0
 68  3  9     0
 
 MatCholFctrSym         1 1.0 4.6683e+01 1.0 0.00e+00
 0.0 0.0e+00 0.0e+00 5.0e+00  6  0  0  0 15   6  0 
 0  0 15     0
 
 MatCholFctrNum         1 1.0 5.8129e+02 1.0 0.00e+00
 0.0 0.0e+00 0.0e+00 0.0e+00 78  0  0  0  0  78  0 
 0  0  0     0
 
 
 
 48 cpus
 
 ======
 
 MatSolve               1 1.0 1.4915e+00 1.0 0.00e+00
 0.0 1.6e+03 3.3e+04 3.0e+00  0  0 68  3  9   0  0
 68  3  9     0
 
 MatCholFctrSym         1 1.0 5.3486e+01 1.0 0.00e+00
 0.0 0.0e+00 0.0e+00 5.0e+00  9  0  0  0 15   9  0 
 0  0 15     0
 
 MatCholFctrNum         1 1.0 4.0803e+02 1.0 0.00e+00
 0.0 0.0e+00 0.0e+00 0.0e+00 71  0  0  0  0  71  0 
 0  0  0     0
 
 
 
 72 cpus
 
 ======
 
 MatSolve               1
 1.0 7.7200e+00 1.1 0.00e+00 0.0 2.6e+03 2.0e+04 3.0e+00 
 1  0 68  2  9   1  0 68  2  9     0
 
 MatCholFctrSym         1 1.0 1.8439e+02 1.0 0.00e+00
 0.0 0.0e+00 0.0e+00 5.0e+00 29  0  0  0 15  29  0  0 
 0 15     0
 
 MatCholFctrNum         1 1.0 3.3969e+02 1.0 0.00e+00
 0.0 0.0e+00 0.0e+00 0.0e+00 53  0  0  0  0  53  0 
 0  0  0     0
 
 
 
 Does this look normal or is something off here?
 Regarding reordering algorithm of Pardiso. At this time I do
 not know much about that. I will do some research and see
 what I can learn. However,  I believe Mumps only has two
 options:
 
 
 
         -mat_mumps_icntl_29     - ICNTL(29): parallel
 ordering 1 = ptscotch, 2 = parmetis
 
 
 
 I have tried both and do not see any speed difference. Or
 are you referring to some other kind of reordering?
 
 
 
 
 
 --------------------------------------------
 
 On Mon, 6/27/16, Barry Smith <bsmith at mcs.anl.gov>
 wrote:
 
 
 
  Subject: Re: [petsc-users] Performance of mumps vs. Intel
 Pardiso
 
  To: "Faraz Hussain" <faraz_hussain at yahoo.com>
 
  Cc: "petsc-users at mcs.anl.gov"
 <petsc-users at mcs.anl.gov>
 
  Date: Monday, June 27, 2016, 5:50 PM
 
 
 
 
 
     These are the only lines that
 
  matter
 
 
 
  MatSolve       
 
               1 1.0 7.7200e+00 1.1 0.00e+00
 
  0.0 2.6e+03 2.0e+04 3.0e+00  1  0 68  2 
 
  9   1  0 68  2  9     0
 
  MatCholFctrSym         1 1.0
 
  1.8439e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 29 
 0 
 
  0  0 15  29  0  0  0 15     0
 
  MatCholFctrNum         1 1.0
 
  3.3969e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 53 
 0 
 
  0  0  0  53  0  0  0  0     0
 
 
 
  look at the log summary for 24
 
  and 48 processes. How are the symbolic and numeric
 parts
 
  scaling with the number of processes?
 
 
 
  Things that could affect the performance a lot.
 
  Is the symbolic factorization done in parallel? What
 
  reordering is used? If Pardiso is using a reordering that
 is
 
  better for this matrix and has (much) lower fill that
 could
 
  explain why it is so much faster.
 
 
 
   Perhaps correspond with the MUMPS developers
 
  on what MUMPS options might make it faster
 
 
 
    Barry
 
 
 
 
 
  > On Jun 27, 2016, at 5:39 PM, Faraz Hussain
 
  <faraz_hussain at yahoo.com>
 
  wrote:
 
  >
 
  > I am
 
  struggling trying to understand why mumps is so much
 slower
 
  than Intel Pardiso solver for my simple test matrix (
 
  3million^2 sparse symmetrix matrix with ~1000 non-zero
 
  entries per line ).
 
  >
 
  > My compute nodes have 24 cpus each. Intel
 
  Pardiso solves it in in 120 seconds using all 24 cpus of
 one
 
  node. With Mumps I get:
 
  >
 
  > 24 cpus - 765 seconds
 
  >
 
  48 cpus - 401 seconds
 
  > 72 cpus - 344
 
  seconds
 
  > beyond 72 cpus no speed
 
  improvement.
 
  >
 
  > I am attaching the -log_summary to see if
 
  there is something wrong in how I am solving the problem.
 I
 
  am really hoping mumps will be faster when using more
 cpus..
 
  Otherwise I will have to abort my exploration of
 
  mumps!<log_summary.o265103>
 
 


More information about the petsc-users mailing list