[petsc-users] Performance of mumps vs. Intel Pardiso

Barry Smith bsmith at mcs.anl.gov
Mon Jun 27 17:50:48 CDT 2016


   These are the only lines that matter

MatSolve                     1 1.0 7.7200e+00 1.1 0.00e+00 0.0 2.6e+03 2.0e+04 3.0e+00  1  0 68  2  9   1  0 68  2  9     0
MatCholFctrSym         1 1.0 1.8439e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 29  0  0  0 15  29  0  0  0 15     0
MatCholFctrNum         1 1.0 3.3969e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 53  0  0  0  0  53  0  0  0  0     0

look at the log summary for 24 and 48 processes. How are the symbolic and numeric parts scaling with the number of processes?

Things that could affect the performance a lot. Is the symbolic factorization done in parallel? What reordering is used? If Pardiso is using a reordering that is better for this matrix and has (much) lower fill that could explain why it is so much faster.

 Perhaps correspond with the MUMPS developers on what MUMPS options might make it faster

  Barry


> On Jun 27, 2016, at 5:39 PM, Faraz Hussain <faraz_hussain at yahoo.com> wrote:
> 
> I am struggling trying to understand why mumps is so much slower than Intel Pardiso solver for my simple test matrix ( 3million^2 sparse symmetrix matrix with ~1000 non-zero entries per line ).
> 
> My compute nodes have 24 cpus each. Intel Pardiso solves it in in 120 seconds using all 24 cpus of one node. With Mumps I get:
> 
> 24 cpus - 765 seconds
> 48 cpus - 401 seconds
> 72 cpus - 344 seconds
> beyond 72 cpus no speed improvement.
> 
> I am attaching the -log_summary to see if there is something wrong in how I am solving the problem. I am really hoping mumps will be faster when using more cpus.. Otherwise I will have to abort my exploration of mumps!<log_summary.o265103>



More information about the petsc-users mailing list