<div class="gmail_quote">On Wed, Dec 22, 2010 at 00:44, Chetan Jhurani <span dir="ltr"><<a href="mailto:chetan.jhurani@gmail.com">chetan.jhurani@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div id=":566">I need a confirmation that my understanding of Inodes<br>
in PETSc is correct. I want to reuse the Inode information<br>
already computed in PETSc in functions I'd write outside it.<br>
<br></div></blockquote><div><br></div><div>What would you be doing?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div id=":566">
As I see from the code, Inode algorithms are used when<br>
adjacent rows in AIJ structure have identical column locations<br>
for non-zeros. These locations are always kept in memory for<br>
all the rows but when doing sparse operations (multiply,<br>
factorize) column indices from only 1 row for each set<br>
of Inodes are used. I assume there is no MatMatMult using<br>
Inodes because there isn't much benefit if the second<br>
matrix has large number of columns.<br></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div id=":566"><br>
1. Are the statements above correct?<br>
2. Is there anything crucial I'm missing?<br>
3. I'll do benchmarking for my problems, but is there<br>
some publicly available data of speedups when<br>
Inodes are enabled?</div></blockquote></div><div><br></div>It is briefly discussed in<div><br><div><div><div><div>@article{gropp2000globalized,</div><div> title={{Globalized Newton-Krylov-Schwarz algorithms and software for parallel implicit CFD}},</div>
<div> author={Gropp, W. and Keyes, D. and Mcinnes, L.C. and Tidriri, MD},</div><div> journal={International Journal of High Performance Computing Applications},</div><div> volume={14},</div><div> number={2},</div><div>
pages={102},</div><div> year={2000}</div><div>}</div></div></div></div></div><div><br></div><div>and there is more discussion of blocking in</div><div><br></div><div><div>@inproceedings{gropp2000pmt,</div><div> author = {Gropp, William D. and Kaushik, Dinesh K. and Keyes, David E. and Smith, Barry},</div>
<div> title = {Performance modeling and tuning of an unstructured mesh CFD application},</div><div> booktitle = {Supercomputing '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM)},</div><div>
year = {2000},</div><div> isbn = {0-7803-9802-5},</div><div> pages = {34},</div><div> location = {Dallas, Texas, United States},</div><div> publisher = {IEEE Computer Society},</div><div> address = {Washington, DC, USA},</div>
<div>}</div></div><div><br></div><div>These numbers are still fairly reliable, but hardware has changed somewhat, the factored matrix format has changed, and prefetch instructions have been added. Additionally, everything is at least somewhat problem-specific so you should run your own benchmarks. I usually see about a 50% speedup when using inodes for a multi-component problem. If you can make your problem use a fixed block size, BAIJ can give you an extra 50% or so.</div>
<div><br></div><div>This is for a 3D 2-component symmetric problem I ran recently, the AIJ columns use I-nodes.</div><div><br></div><div><a href="http://i.imgur.com/DSl0H.png">http://i.imgur.com/DSl0H.png</a></div><div><br>
</div><div>If you use unstructured grids, it is well worth your time to compute a low-bandwidth ordering for the unknowns (RCM is usually pretty good), that can make a big throughput difference and (often) improves the effectiveness of incomplete factorization as well.</div>