We should make it brain-dead simple for KSP to reorder internally and run the solve in a low-bandwidth ordering. The Matrix Market orderings are often so contrived that performance numbers are nearly meaningless.<br><br><div class="gmail_quote">

On Tue, Sep 18, 2012 at 8:05 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

   Good paper, <a href="http://www.epcc.ed.ac.uk/wp-content/uploads/2011/11/PramodKumbhar.pdf" target="_blank">http://www.epcc.ed.ac.uk/wp-content/uploads/2011/11/PramodKumbhar.pdf</a>, worth reading<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

On Sep 18, 2012, at 7:46 PM, C. Bergström <<a href="mailto:cbergstrom@pathscale.com">cbergstrom@pathscale.com</a>> wrote:<br>

<br>

><br>

> Hi<br>

><br>

> I'm hoping someone with some spare cycles and patience is willing to help test a nightly ENZO build with petsc.<br>

><br>

> Here's the nightly which won't require a key (It will ask, but it's optional)<br>

> <a href="http://c591116.r16.cf2.rackcdn.com/enzo/nightly/Linux/enzo-2012-09-18-installer.run" target="_blank">http://c591116.r16.cf2.rackcdn.com/enzo/nightly/Linux/enzo-2012-09-18-installer.run</a><br>

><br>

> For BLAS we're testing against this (and in the future will ship our own built version)<br>

> <a href="https://github.com/xianyi/OpenBLAS/" target="_blank">https://github.com/xianyi/OpenBLAS/</a><br>

> ----------<br>

> I'm specifically looking for feedback on the GPGPU side of this and performance.  The reason why anyone would care - We've put a lot of work in performance for memory bound kernels, predictable latency and lowest latency.  (We don't generate any PTX and go direct to bare metal codegen tied with our own very small runtime.  We officially only support Tesla 2050/2070 cards at this time, but ping me if you have another card you can test with)<br>


><br>

> You can replace nvcc with pathcu (We don't support the nvcc flags)<br>

><br>

> pathcu -c <a href="http://foo.cu" target="_blank">foo.cu</a> # CUDA (Bugs found should be fixed quickly, but expect bugs - Thrust and CuSP testing also in progress)<br>

> pathcc/f90 -hmpp # OpenHMPP<br>

> pathcc/f90 -openacc # OpenACC and the flag will be changed to -acc soon<br>

><br>

> For more details, documentation and or bug reports please email me directly.<br>

><br>

> Cheers,<br>

><br>

><br>

> Christopher<br>

><br>

><br>

<br>

</div></div></blockquote></div><br>