[petsc-dev] KSP automatic reordering
Matthew Knepley
knepley at gmail.com
Tue Sep 18 20:45:36 CDT 2012
On Tue, Sep 18, 2012 at 8:24 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> On Sep 18, 2012, at 8:09 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>
> > We should make it brain-dead simple for KSP to reorder internally and
> run the solve in a low-bandwidth ordering.
>
> I've always had very difficult philosophical issues with doing this.
> It messes up the layering of KSP outside of the PC/matrix; I have the same
> issues with having KSP diagonally scale the system before solving it. We
> have this really ugly chunk of code in KSPSolve() to do the diagonal scaling
>
> /* diagonal scale RHS if called for */
> if (ksp->dscale) {
> ierr =
> VecPointwiseMult(ksp->vec_rhs,ksp->vec_rhs,ksp->diagonal);CHKERRQ(ierr);
> /* second time in, but matrix was scaled back to original */
> if (ksp->dscalefix && ksp->dscalefix2) {
> Mat mat,pmat;
>
> ierr = PCGetOperators(ksp->pc,&mat,&pmat,PETSC_NULL);CHKERRQ(ierr);
> ierr =
> MatDiagonalScale(pmat,ksp->diagonal,ksp->diagonal);CHKERRQ(ierr);
> if (mat != pmat) {ierr =
> MatDiagonalScale(mat,ksp->diagonal,ksp->diagonal);CHKERRQ(ierr);}
> }
>
> /* scale initial guess */
> if (!ksp->guess_zero) {
> if (!ksp->truediagonal) {
> ierr =
> VecDuplicate(ksp->diagonal,&ksp->truediagonal);CHKERRQ(ierr);
> ierr = VecCopy(ksp->diagonal,ksp->truediagonal);CHKERRQ(ierr);
> ierr = VecReciprocal(ksp->truediagonal);CHKERRQ(ierr);
> }
> ierr =
> VecPointwiseMult(ksp->vec_sol,ksp->vec_sol,ksp->truediagonal);CHKERRQ(ierr);
> }
> }
>
> But it is nasty because it changes the convergence tests, the monitor
> routines (they report the residual norms in the scale system not the
> original). Also, does it unscale the matrix after the solve (or leave it
> scaled for when it is never going to be used again?). The scaling can screw
> up algebraic multigrid methods. Does the scaling affect Eisenstat-Walker
> type convergence tests for Newton's method…. It is nasty code, hard to
> follow and hard for users to fully appreciate.
>
> We could do the same ugly hacked up thing for reordering; using a
> partitioner for between processes and a low-bandwidth ordering within
> inside the KSPSolve/SetUp().
>
> It would be nice to have a cleaner/clearer abstract model in terms of the
> software layering to handle this. For example I played with the idea of a
> "diagonal scaling" PC and "reordering" PC that
> does the change and then has its own KSP inside for the solve. Thus you'd
> run with -ksp_type preonly -pc_type reorder -reorder_ksp_type gmres
> -reorder_pc_type ilu etc. But that seems a bit pedantic
> and annoying that you have to put all your "true" solver options with a
> prefix.
>
> Jed, what is your solution?
>
Why not make it part of the matrix? For the minute, assume we are using a
DM. Then the
matrix has the nonzero pattern already. We can use an option to compute a
fill-reducing ordering
and either permute it directly, or just use the permutations on in and out.
This insulates it from
the solver completely.
Matt
> Barry
>
>
>
>
>
>
>
> > The Matrix Market orderings are often so contrived that performance
> numbers are nearly meaningless.
>
> >
> > On Tue, Sep 18, 2012 at 8:05 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > Good paper,
> http://www.epcc.ed.ac.uk/wp-content/uploads/2011/11/PramodKumbhar.pdf,
> worth reading
> >
> >
> > On Sep 18, 2012, at 7:46 PM, C. Bergström <cbergstrom at pathscale.com>
> wrote:
> >
> > >
> > > Hi
> > >
> > > I'm hoping someone with some spare cycles and patience is willing to
> help test a nightly ENZO build with petsc.
> > >
> > > Here's the nightly which won't require a key (It will ask, but it's
> optional)
> > >
> http://c591116.r16.cf2.rackcdn.com/enzo/nightly/Linux/enzo-2012-09-18-installer.run
> > >
> > > For BLAS we're testing against this (and in the future will ship our
> own built version)
> > > https://github.com/xianyi/OpenBLAS/
> > > ----------
> > > I'm specifically looking for feedback on the GPGPU side of this and
> performance. The reason why anyone would care - We've put a lot of work in
> performance for memory bound kernels, predictable latency and lowest
> latency. (We don't generate any PTX and go direct to bare metal codegen
> tied with our own very small runtime. We officially only support Tesla
> 2050/2070 cards at this time, but ping me if you have another card you can
> test with)
> > >
> > > You can replace nvcc with pathcu (We don't support the nvcc flags)
> > >
> > > pathcu -c foo.cu # CUDA (Bugs found should be fixed quickly, but
> expect bugs - Thrust and CuSP testing also in progress)
> > > pathcc/f90 -hmpp # OpenHMPP
> > > pathcc/f90 -openacc # OpenACC and the flag will be changed to -acc soon
> > >
> > > For more details, documentation and or bug reports please email me
> directly.
> > >
> > > Cheers,
> > >
> > >
> > > Christopher
> > >
> > >
> >
> >
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120918/0ed034d6/attachment.html>
More information about the petsc-dev
mailing list