<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><br id="lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite"><div>On Aug 8, 2025, at 4:28 PM, Matthew Knepley <knepley@gmail.com> wrote:</div><br class="Apple-interchange-newline"><div><div dir="ltr"><div dir="ltr">On Fri, Aug 8, 2025 at 12:22 PM Frank Bramkamp <<a href="mailto:bramkamp@nsc.liu.se">bramkamp@nsc.liu.se</a>> wrote:</div><div class="gmail_quote gmail_quote_container"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear PETSc Team,<br>

<br>

I have the following general questions.<br>

When using GMRES with classic gram schmidt, you have an optional refinement.<br>

<br>

The default is not to use any refinement, is that correct ? </blockquote><div><br></div><div>Yes. It has been this way at least since 2013 from checking the log.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">So you use the standard classic gram schmidt <br>

without any additional refinement step. <br>

<br>

I originally thought that you use one refinement step if needed as default. Or was that somehow changed one day<br>

and I remember it from older settings ?!<br></blockquote><div><br></div><div>It is available, but we changed it since refinement was almost never happening.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

With refinement one also needs about twice the MPI communication, if refinement is done in each iteration,<br>

which can be a bit more costly due to the MPI_ALLREDUCE for the dot products.<br></blockquote><div><br></div><div>It is reduction that is expensive on large runs.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

I solve the compressible navier stokes equations with turbulence. In which cases do you have the experience one should<br>

better use a refinement step ?!  So far I typically used the default options, which seemed to work well for me so far.<br></blockquote><div><br></div><div>If it is converging, everything is likely fine. You can always check the true residual to detect breakdown, or turn on refinement if necessary with the option.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

for classic gram schmidt one also has to compute multiple dot products which is VecMDot in PETSC if I see this correct.<br></blockquote><div><br></div><div>Yes.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Do you by standard use blas GEMV to compute multiple dot products at once or do you typically prefer your<br>

custom implementation. I think you also have a version where you unroll manual loops by a factor of 4 or so.<br>

I am not quite sure what is the typical default for this, or in which situation one would not use GEMV.<br></blockquote><div><br></div><div>It turns out that Junchao did extensive tests on this last year (I believe). By default, we use GEMV for MDot, </div></div></div></div></blockquote><div><br></div><div><br></div><div>   Frank</div><div><br></div>   A little more detail on how this is done and how to take advantage of it. </div><div><br></div><div>   In VecMultiDot_Seq_GEMV() it checks if the array addresses of multiple vectors have the same offset from the previous vector's array. If they have the same offset that means that Xgemv() can be used. Remember Xgemv() is a Fortran routine with two dimensional array arguments that are column major order and use the lda argument to indice the length of each column allocation (which can be larger than the length of the column actually used). So Xgemv() can only be used if it is "as if" all the columns have the same lda.</div><div><br></div><div>   Now if the collection of vectors were obtained by seperate calls to VecCreate() then their array addresses would have no relationship to each other because each was obtained with a seperate malloc() so one cannot take advantage of Xgemv(). But KSPGMRES (and friends) use (inside KSPSolve_GMRES()) VecDuplicateVecs() to obtain  the vectors and the default VecDuplicateVecs() uses a single malloc to obtain space for all the needed vector arrays it creates so each vector's array is immediately after the previous vector's array. Hence all the vector's arrays have the same offset from the previous vector's array and so Xgemv() can be used.</div><div><br></div><div> Finally to take maximum advantage of the Xgemv() you want KSPGMES to allocate all its needed work vectors together (for example with the default restart of 30 you would want it to allocate all 30 vectors together so that the Xgemv() can be used on all 30 vectors together).  You can ensure this by using the option -ksp_gmres_preallocate We do not do this by default because often with good preconditioners one converges before it even gets to the restart that is set, instead we allocate GMRES_DELTA_DIRECTIONS which is 10 at a time if needed. It turns out using Xgemv(), for example, 3 times with 10 arrays each is not much slower than using Xgemv once with 30 arrays so turning on  -ksp_gmres_preallocate  doesn't buy you much but you might see a nonzero improvement in the time. So if you are hunting for that final 1 percent time improvement make sure you have a good restart number for your problem (the important part and I only know how to do this by experimentation) and use -ksp_gmres_preallocate</div><div><br></div><div><br></div><div>  Barry</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br><blockquote type="cite"><div><div dir="ltr"><div class="gmail_quote gmail_quote_container"><div>but not for MAXPY (since performance here was worse with vendor GEMV). It is difficult (or impossible) to tell which algorithm will be better without testing, so there are options to select each one.</div><div><br></div><div>  Thanks,</div><div><br></div><div>     Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Thanks a lot. I wish you a nice weekend, Frank<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

</blockquote></div><div><br clear="all"></div><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fhbzgnVYBw6ICZM6N0XXNLIcj9qe9bcopUK7lxSsqxqVlv-SvpS8bdrcwYq0TvTbHmbgbSinCLXrvJcM-AWO$" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>

</div></blockquote></div><br></body></html>