<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jul 1, 2015 at 4:34 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div id=":3yt" class="" style="overflow:hidden">  I suggest you run the following experiment; run with ONE process but use -pc_type bjacobi -sub_pc_type ilu -pc_bjacobi_blocks <blocks> where you use for <blocks> 1 up to 24 and then get the number of iterations needed for each (don't worry about the time it takes, this is done for understanding of the convergence). Send the table of<br></div></blockquote></div><br><div class="gmail_default" style="color:rgb(7,55,99)">Hi Barry, </div><div class="gmail_default" style="color:rgb(7,55,99)"><br></div><div class="gmail_default" style="color:rgb(7,55,99)">Thanks for the feedback! I switched to SuperLU_dist (-ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist) and saw immediate speedup for all number of processors. I'll try your experiment and see what comes up. </div><div class="gmail_default" style="color:rgb(7,55,99)"><br></div><div class="gmail_default" style="color:rgb(7,55,99)">Thanks!</div><br><br clear="all"><div><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div>--</div><div><br></div><div>José Abell </div><div>PhD Candidate</div><div>Computational Geomechanics Group</div><div><span style="font-size:12.7272720336914px">Dept. of Civil and Environmental Engineering</span><br></div><div>UC Davis</div><div><br></div></div></div></div></div></div></div>

</div></div>