[petsc-dev] Clarification on Inodes in PETSc
Jed Brown
jed at 59A2.org
Tue Dec 21 18:43:10 CST 2010
On Wed, Dec 22, 2010 at 00:44, Chetan Jhurani <chetan.jhurani at gmail.com>wrote:
> I need a confirmation that my understanding of Inodes
> in PETSc is correct. I want to reuse the Inode information
> already computed in PETSc in functions I'd write outside it.
>
>
What would you be doing?
> As I see from the code, Inode algorithms are used when
> adjacent rows in AIJ structure have identical column locations
> for non-zeros. These locations are always kept in memory for
> all the rows but when doing sparse operations (multiply,
> factorize) column indices from only 1 row for each set
> of Inodes are used. I assume there is no MatMatMult using
> Inodes because there isn't much benefit if the second
> matrix has large number of columns.
>
> 1. Are the statements above correct?
> 2. Is there anything crucial I'm missing?
> 3. I'll do benchmarking for my problems, but is there
> some publicly available data of speedups when
> Inodes are enabled?
>
It is briefly discussed in
@article{gropp2000globalized,
title={{Globalized Newton-Krylov-Schwarz algorithms and software for
parallel implicit CFD}},
author={Gropp, W. and Keyes, D. and Mcinnes, L.C. and Tidriri, MD},
journal={International Journal of High Performance Computing
Applications},
volume={14},
number={2},
pages={102},
year={2000}
}
and there is more discussion of blocking in
@inproceedings{gropp2000pmt,
author = {Gropp, William D. and Kaushik, Dinesh K. and Keyes, David E. and
Smith, Barry},
title = {Performance modeling and tuning of an unstructured mesh CFD
application},
booktitle = {Supercomputing '00: Proceedings of the 2000 ACM/IEEE
conference on Supercomputing (CDROM)},
year = {2000},
isbn = {0-7803-9802-5},
pages = {34},
location = {Dallas, Texas, United States},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}
These numbers are still fairly reliable, but hardware has changed somewhat,
the factored matrix format has changed, and prefetch instructions have been
added. Additionally, everything is at least somewhat problem-specific so
you should run your own benchmarks. I usually see about a 50% speedup when
using inodes for a multi-component problem. If you can make your problem
use a fixed block size, BAIJ can give you an extra 50% or so.
This is for a 3D 2-component symmetric problem I ran recently, the AIJ
columns use I-nodes.
http://i.imgur.com/DSl0H.png
If you use unstructured grids, it is well worth your time to compute a
low-bandwidth ordering for the unknowns (RCM is usually pretty good), that
can make a big throughput difference and (often) improves the effectiveness
of incomplete factorization as well.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20101222/9eb6e8f5/attachment.html>
More information about the petsc-dev
mailing list