[petsc-users] Performance of mumps vs. Intel Pardiso
Hong
hzhang at mcs.anl.gov
Mon Jun 27 22:36:03 CDT 2016
Barry:
>
> The symbolic factorization is taking more time with more processes while
> the numerical factorization is taking less time. So the symbolic
> factorization is limiting the scalability. Note that the numerical times
> are great but at least they get better.
>
Parallel symbolic factorization seems troublesome. I use it only when
sequential version fails in memory usage.
Hong
>
> > On Jun 27, 2016, at 7:59 PM, Faraz Hussain <faraz_hussain at yahoo.com>
> wrote:
> >
> > Thanks for the quick response. Here are the log_summary for 24, 48 and
> 72 cpus:
> >
> > 24 cpus
> > ======
> > MatSolve 1 1.0 1.8100e+00 1.0 0.00e+00 0.0 7.0e+02 7.4e+04
> 3.0e+00 0 0 68 3 9 0 0 68 3 9 0
> > MatCholFctrSym 1 1.0 4.6683e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00 6 0 0 0 15 6 0 0 0 15 0
> > MatCholFctrNum 1 1.0 5.8129e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 78 0 0 0 0 78 0 0 0 0 0
> >
> > 48 cpus
> > ======
> > MatSolve 1 1.0 1.4915e+00 1.0 0.00e+00 0.0 1.6e+03 3.3e+04
> 3.0e+00 0 0 68 3 9 0 0 68 3 9 0
> > MatCholFctrSym 1 1.0 5.3486e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00 9 0 0 0 15 9 0 0 0 15 0
> > MatCholFctrNum 1 1.0 4.0803e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 71 0 0 0 0 71 0 0 0 0 0
> >
> > 72 cpus
> > ======
> > MatSolve 1 1.0 7.7200e+00 1.1 0.00e+00 0.0 2.6e+03 2.0e+04
> 3.0e+00 1 0 68 2 9 1 0 68 2 9 0
> > MatCholFctrSym 1 1.0 1.8439e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00 29 0 0 0 15 29 0 0 0 15 0
> > MatCholFctrNum 1 1.0 3.3969e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 53 0 0 0 0 53 0 0 0 0 0
> >
> > Does this look normal or is something off here? Regarding reordering
> algorithm of Pardiso. At this time I do not know much about that. I will do
> some research and see what I can learn. However, I believe Mumps only has
> two options:
> >
> > -mat_mumps_icntl_29 - ICNTL(29): parallel ordering 1 =
> ptscotch, 2 = parmetis
> >
> > I have tried both and do not see any speed difference. Or are you
> referring to some other kind of reordering?
> >
> >
> > --------------------------------------------
> > On Mon, 6/27/16, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > Subject: Re: [petsc-users] Performance of mumps vs. Intel Pardiso
> > To: "Faraz Hussain" <faraz_hussain at yahoo.com>
> > Cc: "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> > Date: Monday, June 27, 2016, 5:50 PM
> >
> >
> > These are the only lines that
> > matter
> >
> > MatSolve
> > 1 1.0 7.7200e+00 1.1 0.00e+00
> > 0.0 2.6e+03 2.0e+04 3.0e+00 1 0 68 2
> > 9 1 0 68 2 9 0
> > MatCholFctrSym 1 1.0
> > 1.8439e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 29 0
> > 0 0 15 29 0 0 0 15 0
> > MatCholFctrNum 1 1.0
> > 3.3969e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 53 0
> > 0 0 0 53 0 0 0 0 0
> >
> > look at the log summary for 24
> > and 48 processes. How are the symbolic and numeric parts
> > scaling with the number of processes?
> >
> > Things that could affect the performance a lot.
> > Is the symbolic factorization done in parallel? What
> > reordering is used? If Pardiso is using a reordering that is
> > better for this matrix and has (much) lower fill that could
> > explain why it is so much faster.
> >
> > Perhaps correspond with the MUMPS developers
> > on what MUMPS options might make it faster
> >
> > Barry
> >
> >
> >> On Jun 27, 2016, at 5:39 PM, Faraz Hussain
> > <faraz_hussain at yahoo.com>
> > wrote:
> >>
> >> I am
> > struggling trying to understand why mumps is so much slower
> > than Intel Pardiso solver for my simple test matrix (
> > 3million^2 sparse symmetrix matrix with ~1000 non-zero
> > entries per line ).
> >>
> >> My compute nodes have 24 cpus each. Intel
> > Pardiso solves it in in 120 seconds using all 24 cpus of one
> > node. With Mumps I get:
> >>
> >> 24 cpus - 765 seconds
> >>
> > 48 cpus - 401 seconds
> >> 72 cpus - 344
> > seconds
> >> beyond 72 cpus no speed
> > improvement.
> >>
> >> I am attaching the -log_summary to see if
> > there is something wrong in how I am solving the problem. I
> > am really hoping mumps will be faster when using more cpus..
> > Otherwise I will have to abort my exploration of
> > mumps!<log_summary.o265103>
>
>
