[petsc-dev] Wrapper for WSMP

Jack Poulson jack.poulson at gmail.com
Tue Aug 16 22:35:46 CDT 2011


I have seen several cases of people emailing the MUMPS and PETSc lists
recently complaining about only being able to solve relatively small 3d
problems on TACC machines (particularly for Helmholtz). I also noticed that
some of the 3d WSMP numbers in the link below were about as fast as what I
was seeing for 2d MUMPS and SuperLU_Dist problems. As you (somewhat
sarcastically) said, it's much easier to make your package look good by not
making your competition look good, so either WSMP is that much faster, or
something is going terribly wrong on the TACC machines. Either way, actually
testing WSMP should clear things up, and hopefully provide a significant
speedup. Otherwise, I will have to roll up my sleeves and write a sparse
direct solver.

Jack

On Tue, Aug 16, 2011 at 10:22 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   You can avoid lying by seriously tuning UP your code while tuning down
> your competitors code in testing. We do this all the time when comparing
> PETSc with Trilinos :-) Just kidding. Comparisons are always a dangerous
> business.
>
>   Barry
>
> On Aug 16, 2011, at 10:18 PM, Jack Poulson wrote:
>
> > On Tue, Aug 16, 2011 at 9:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > On Aug 16, 2011, at 5:14 PM, Jack Poulson wrote:
> >
> > > Hello all,
> > >
> > > I am working on a project that requires very fast sparse direct solves
> and MUMPS and SuperLU_Dist haven't been cutting it. From what I've read,
> when properly tuned, WSMP is significantly faster, particularly with
> multiple right-hand sides on large machines. The obvious drawback is that
> it's not open source, but the binaries seem to be readily available for most
> platforms.
> > >
> > > Before I reinvent the wheel, I would like to check if anyone has
> already done some work on adding it into PETSc. If not, its interface is
> quite similar to MUMPS and I should be able to mirror most of that code. On
> the other hand, there are a large number of platform-specific details that
> need to be handled, so keeping things both portable and fast might be a
> challenge. It seems that the CSC storage format should be used since it is
> required for Hermitian matrices.
> > >
> > > Thanks,
> > > Jack
> >
> >  Jack,
> >
> >   By all means do it. That would be a nice thing to have. But be aware
> that the WSMP folks have a reputation for exaggerating how much better their
> software is so don't be surprised if after all that work it is not much
> better.
> >
> >
> > Good to know. I was somewhat worried about that, but perhaps it is a
> matter of getting all of the tuning parameters right. The manual does
> mention that performance is significantly degraded without tuning. I would
> sincerely hope no one would out right lie in their publications, e.g., this
> one:
> > http://portal.acm.org/citation.cfm?id=1654061
> >
> >   BTW: are you solving with many right hand sides? Maybe before you muck
> with WSMP we should figure out how to get you access to the multiple right
> hand side support of MUMPS (I don't know if SuperLU_Dist has it) so you can
> speed up your current computations a good amount? Currently PETSc's
> MatMatSolve() calls a separate solve for each right hand side with MUMPS.
> >
> >   Barry
> >
> >
> > I will eventually need to solve against many right-hand sides, but for
> now I am solving against one and it is still taking too long; in fact, not
> only does it take too long, but memory per core increased for fixed problem
> sizes as I increase the number of MPI processes (for both SuperLU_Dist and
> MUMPS). This was occurring for quasi2d Helmholtz problems over a couple
> hundred cores. My only logical explanation for this behavior is that the
> communication buffers grow proportional to the number of processes on each
> process, but I stress that this is just a guess. I tried reading through the
> MUMPS code and quickly gave up.
> >
> > Another problem with MUMPS is that requires the entire set of right-hand
> sides to reside on the root process...that will clearly not work for a
> billion degrees of freedom with several hundred RHSs. WSMP gets this part
> right and actually distributes those vectors.
> >
> > Jack
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110816/d3d9cf32/attachment.html>


More information about the petsc-dev mailing list