[petsc-users] Performance of mumps vs. Intel Pardiso
Hong
hzhang at mcs.anl.gov
Tue Jun 28 13:40:50 CDT 2016
Faraz :
The results look reasonable to me. I guess you collect strong speedup,
i.e., fixed problem size while increasing cpus. How large is your matrix?
Thanks, the solve times are faster after I tried sequential symbolic
> factorization instead of parallel. However, they are still slower than
> Pardiso with 24 cpus ( 120 seconds ). I am not sure if it a configuration
> issue on my end or a limitation of mumps?
>
> How do you run Pardiso with 24 cpus?
Is it possible for someone else to solve my matrix to verify they get the
> same times? If not, I will contact mumps developers to see if I can send
> them my matrix to benchmark.
>
mumps developers would give you better suggestions.
Hong
>
> --------------------------------------------
> On Mon, 6/27/16, Hong <hzhang at mcs.anl.gov> wrote:
>
> Subject: Re: [petsc-users] Performance of mumps vs. Intel Pardiso
> To: "Faraz Hussain" <faraz_hussain at yahoo.com>
> Cc: "Barry Smith" <bsmith at mcs.anl.gov>, "petsc-users at mcs.anl.gov" <
> petsc-users at mcs.anl.gov>
> Date: Monday, June 27, 2016, 8:40 PM
>
> Faraz :Direct sparse solvers are
> generally not scalable -- they are used for ill-conditioned
> problems which cannot be solved by iterative
> methods.
> Can
> you try sequential symbolic factorization instead of
> parallel, i.e., use mumps default '-mat_mumps_icntl_28
> 1'?
> Hong
> Thanks
> for the quick response. Here are the log_summary for 24, 48
> and 72 cpus:
>
>
>
> 24 cpus
>
> ======
>
> MatSolve 1 1.0 1.8100e+00 1.0 0.00e+00
> 0.0 7.0e+02 7.4e+04 3.0e+00 0 0 68 3 9 0 0
> 68 3 9 0
>
> MatCholFctrSym 1 1.0 4.6683e+01 1.0 0.00e+00
> 0.0 0.0e+00 0.0e+00 5.0e+00 6 0 0 0 15 6 0
> 0 0 15 0
>
> MatCholFctrNum 1 1.0 5.8129e+02 1.0 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00 78 0 0 0 0 78 0
> 0 0 0 0
>
>
>
> 48 cpus
>
> ======
>
> MatSolve 1 1.0 1.4915e+00 1.0 0.00e+00
> 0.0 1.6e+03 3.3e+04 3.0e+00 0 0 68 3 9 0 0
> 68 3 9 0
>
> MatCholFctrSym 1 1.0 5.3486e+01 1.0 0.00e+00
> 0.0 0.0e+00 0.0e+00 5.0e+00 9 0 0 0 15 9 0
> 0 0 15 0
>
> MatCholFctrNum 1 1.0 4.0803e+02 1.0 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00 71 0 0 0 0 71 0
> 0 0 0 0
>
>
>
> 72 cpus
>
> ======
>
> MatSolve 1
> 1.0 7.7200e+00 1.1 0.00e+00 0.0 2.6e+03 2.0e+04 3.0e+00
> 1 0 68 2 9 1 0 68 2 9 0
>
> MatCholFctrSym 1 1.0 1.8439e+02 1.0 0.00e+00
> 0.0 0.0e+00 0.0e+00 5.0e+00 29 0 0 0 15 29 0 0
> 0 15 0
>
> MatCholFctrNum 1 1.0 3.3969e+02 1.0 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00 53 0 0 0 0 53 0
> 0 0 0 0
>
>
>
> Does this look normal or is something off here?
> Regarding reordering algorithm of Pardiso. At this time I do
> not know much about that. I will do some research and see
> what I can learn. However, I believe Mumps only has two
> options:
>
>
>
> -mat_mumps_icntl_29 - ICNTL(29): parallel
> ordering 1 = ptscotch, 2 = parmetis
>
>
>
> I have tried both and do not see any speed difference. Or
> are you referring to some other kind of reordering?
>
>
>
>
>
> --------------------------------------------
>
> On Mon, 6/27/16, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
>
>
>
> Subject: Re: [petsc-users] Performance of mumps vs. Intel
> Pardiso
>
> To: "Faraz Hussain" <faraz_hussain at yahoo.com>
>
> Cc: "petsc-users at mcs.anl.gov"
> <petsc-users at mcs.anl.gov>
>
> Date: Monday, June 27, 2016, 5:50 PM
>
>
>
>
>
> These are the only lines that
>
> matter
>
>
>
> MatSolve
>
> 1 1.0 7.7200e+00 1.1 0.00e+00
>
> 0.0 2.6e+03 2.0e+04 3.0e+00 1 0 68 2
>
> 9 1 0 68 2 9 0
>
> MatCholFctrSym 1 1.0
>
> 1.8439e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 29
> 0
>
> 0 0 15 29 0 0 0 15 0
>
> MatCholFctrNum 1 1.0
>
> 3.3969e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 53
> 0
>
> 0 0 0 53 0 0 0 0 0
>
>
>
> look at the log summary for 24
>
> and 48 processes. How are the symbolic and numeric
> parts
>
> scaling with the number of processes?
>
>
>
> Things that could affect the performance a lot.
>
> Is the symbolic factorization done in parallel? What
>
> reordering is used? If Pardiso is using a reordering that
> is
>
> better for this matrix and has (much) lower fill that
> could
>
> explain why it is so much faster.
>
>
>
> Perhaps correspond with the MUMPS developers
>
> on what MUMPS options might make it faster
>
>
>
> Barry
>
>
>
>
>
> > On Jun 27, 2016, at 5:39 PM, Faraz Hussain
>
> <faraz_hussain at yahoo.com>
>
> wrote:
>
> >
>
> > I am
>
> struggling trying to understand why mumps is so much
> slower
>
> than Intel Pardiso solver for my simple test matrix (
>
> 3million^2 sparse symmetrix matrix with ~1000 non-zero
>
> entries per line ).
>
> >
>
> > My compute nodes have 24 cpus each. Intel
>
> Pardiso solves it in in 120 seconds using all 24 cpus of
> one
>
> node. With Mumps I get:
>
> >
>
> > 24 cpus - 765 seconds
>
> >
>
> 48 cpus - 401 seconds
>
> > 72 cpus - 344
>
> seconds
>
> > beyond 72 cpus no speed
>
> improvement.
>
> >
>
> > I am attaching the -log_summary to see if
>
> there is something wrong in how I am solving the problem.
> I
>
> am really hoping mumps will be faster when using more
> cpus..
>
> Otherwise I will have to abort my exploration of
>
> mumps!<log_summary.o265103>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160628/3470fe42/attachment.html>
More information about the petsc-users
mailing list