[petsc-dev] Clarification on Inodes in PETSc

Tue Dec 21 18:43:10 CST 2010

On Wed, Dec 22, 2010 at 00:44, Chetan Jhurani <chetan.jhurani at gmail.com>wrote:

> I need a confirmation that my understanding of Inodes
> in PETSc is correct.  I want to reuse the Inode information
> already computed in PETSc in functions I'd write outside it.
>
>
What would you be doing?

> As I see from the code, Inode algorithms are used when
> adjacent rows in AIJ structure have identical column locations
> for non-zeros.  These locations are always kept in memory for
> all the rows but when doing sparse operations (multiply,
> factorize) column indices from only 1 row for each set
> of Inodes are used.  I assume there is no MatMatMult using
> Inodes because there isn't much benefit if the second
> matrix has large number of columns.
>

> 1. Are the statements above correct?
> 2. Is there anything crucial I'm missing?
> 3. I'll do benchmarking for my problems, but is there
>   some publicly available data of speedups when
>   Inodes are enabled?
>

It is briefly discussed in

@article{gropp2000globalized,
  title={{Globalized Newton-Krylov-Schwarz algorithms and software for
parallel implicit CFD}},
  author={Gropp, W. and Keyes, D. and Mcinnes, L.C. and Tidriri, MD},
  journal={International Journal of High Performance Computing
Applications},
  volume={14},
  number={2},
  pages={102},
  year={2000}
}

and there is more discussion of blocking in

@inproceedings{gropp2000pmt,
  author = {Gropp, William D. and Kaushik, Dinesh K. and Keyes, David E. and
Smith, Barry},
  title = {Performance modeling and tuning of an unstructured mesh CFD
application},
  booktitle = {Supercomputing '00: Proceedings of the 2000 ACM/IEEE
conference on Supercomputing (CDROM)},
  year = {2000},
  isbn = {0-7803-9802-5},
  pages = {34},
  location = {Dallas, Texas, United States},
  publisher = {IEEE Computer Society},
  address = {Washington, DC, USA},
}

These numbers are still fairly reliable, but hardware has changed somewhat,
the factored matrix format has changed, and prefetch instructions have been
added.  Additionally, everything is at least somewhat problem-specific so
you should run your own benchmarks.  I usually see about a 50% speedup when
using inodes for a multi-component problem.  If you can make your problem
use a fixed block size, BAIJ can give you an extra 50% or so.

This is for a 3D 2-component symmetric problem I ran recently, the AIJ
columns use I-nodes.

http://i.imgur.com/DSl0H.png

If you use unstructured grids, it is well worth your time to compute a
low-bandwidth ordering for the unknowns (RCM is usually pretty good), that
can make a big throughput difference and (often) improves the effectiveness
of incomplete factorization as well.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20101222/9eb6e8f5/attachment.html>