On Tue, Sep 18, 2012 at 8:24 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
On Sep 18, 2012, at 8:09 PM, Jed Brown <<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>> wrote:<br>
<br>
> We should make it brain-dead simple for KSP to reorder internally and run the solve in a low-bandwidth ordering.<br>
<br>
I've always had very difficult philosophical issues with doing this. It messes up the layering of KSP outside of the PC/matrix; I have the same issues with having KSP diagonally scale the system before solving it. We have this really ugly chunk of code in KSPSolve() to do the diagonal scaling<br>
<br>
/* diagonal scale RHS if called for */<br>
if (ksp->dscale) {<br>
ierr = VecPointwiseMult(ksp->vec_rhs,ksp->vec_rhs,ksp->diagonal);CHKERRQ(ierr);<br>
/* second time in, but matrix was scaled back to original */<br>
if (ksp->dscalefix && ksp->dscalefix2) {<br>
Mat mat,pmat;<br>
<br>
ierr = PCGetOperators(ksp->pc,&mat,&pmat,PETSC_NULL);CHKERRQ(ierr);<br>
ierr = MatDiagonalScale(pmat,ksp->diagonal,ksp->diagonal);CHKERRQ(ierr);<br>
if (mat != pmat) {ierr = MatDiagonalScale(mat,ksp->diagonal,ksp->diagonal);CHKERRQ(ierr);}<br>
}<br>
<br>
/* scale initial guess */<br>
if (!ksp->guess_zero) {<br>
if (!ksp->truediagonal) {<br>
ierr = VecDuplicate(ksp->diagonal,&ksp->truediagonal);CHKERRQ(ierr);<br>
ierr = VecCopy(ksp->diagonal,ksp->truediagonal);CHKERRQ(ierr);<br>
ierr = VecReciprocal(ksp->truediagonal);CHKERRQ(ierr);<br>
}<br>
ierr = VecPointwiseMult(ksp->vec_sol,ksp->vec_sol,ksp->truediagonal);CHKERRQ(ierr);<br>
}<br>
}<br>
<br>
But it is nasty because it changes the convergence tests, the monitor routines (they report the residual norms in the scale system not the original). Also, does it unscale the matrix after the solve (or leave it scaled for when it is never going to be used again?). The scaling can screw up algebraic multigrid methods. Does the scaling affect Eisenstat-Walker type convergence tests for Newton's method…. It is nasty code, hard to follow and hard for users to fully appreciate.<br>
<br>
We could do the same ugly hacked up thing for reordering; using a partitioner for between processes and a low-bandwidth ordering within inside the KSPSolve/SetUp().<br>
<br>
It would be nice to have a cleaner/clearer abstract model in terms of the software layering to handle this. For example I played with the idea of a "diagonal scaling" PC and "reordering" PC that<br>
does the change and then has its own KSP inside for the solve. Thus you'd run with -ksp_type preonly -pc_type reorder -reorder_ksp_type gmres -reorder_pc_type ilu etc. But that seems a bit pedantic<br>
and annoying that you have to put all your "true" solver options with a prefix.<br>
<br>
Jed, what is your solution?<br></blockquote><div><br></div><div>Why not make it part of the matrix? For the minute, assume we are using a DM. Then the</div><div>matrix has the nonzero pattern already. We can use an option to compute a fill-reducing ordering</div>
<div>and either permute it directly, or just use the permutations on in and out. This insulates it from</div><div>the solver completely.</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Barry<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
> The Matrix Market orderings are often so contrived that performance numbers are nearly meaningless.<br>
<br>
><br>
> On Tue, Sep 18, 2012 at 8:05 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
><br>
> Good paper, <a href="http://www.epcc.ed.ac.uk/wp-content/uploads/2011/11/PramodKumbhar.pdf" target="_blank">http://www.epcc.ed.ac.uk/wp-content/uploads/2011/11/PramodKumbhar.pdf</a>, worth reading<br>
><br>
><br>
> On Sep 18, 2012, at 7:46 PM, C. Bergström <<a href="mailto:cbergstrom@pathscale.com">cbergstrom@pathscale.com</a>> wrote:<br>
><br>
> ><br>
> > Hi<br>
> ><br>
> > I'm hoping someone with some spare cycles and patience is willing to help test a nightly ENZO build with petsc.<br>
> ><br>
> > Here's the nightly which won't require a key (It will ask, but it's optional)<br>
> > <a href="http://c591116.r16.cf2.rackcdn.com/enzo/nightly/Linux/enzo-2012-09-18-installer.run" target="_blank">http://c591116.r16.cf2.rackcdn.com/enzo/nightly/Linux/enzo-2012-09-18-installer.run</a><br>
> ><br>
> > For BLAS we're testing against this (and in the future will ship our own built version)<br>
> > <a href="https://github.com/xianyi/OpenBLAS/" target="_blank">https://github.com/xianyi/OpenBLAS/</a><br>
> > ----------<br>
> > I'm specifically looking for feedback on the GPGPU side of this and performance. The reason why anyone would care - We've put a lot of work in performance for memory bound kernels, predictable latency and lowest latency. (We don't generate any PTX and go direct to bare metal codegen tied with our own very small runtime. We officially only support Tesla 2050/2070 cards at this time, but ping me if you have another card you can test with)<br>
> ><br>
> > You can replace nvcc with pathcu (We don't support the nvcc flags)<br>
> ><br>
> > pathcu -c <a href="http://foo.cu" target="_blank">foo.cu</a> # CUDA (Bugs found should be fixed quickly, but expect bugs - Thrust and CuSP testing also in progress)<br>
> > pathcc/f90 -hmpp # OpenHMPP<br>
> > pathcc/f90 -openacc # OpenACC and the flag will be changed to -acc soon<br>
> ><br>
> > For more details, documentation and or bug reports please email me directly.<br>
> ><br>
> > Cheers,<br>
> ><br>
> ><br>
> > Christopher<br>
> ><br>
> ><br>
><br>
><br>
<br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>