<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><br class=""></div> On one system you get this error, on another system with the identical code and test case you do not get the error?<div class=""><br class=""></div><div class=""> You get it with three iterative methods but not with MUMPS?<br class=""><div class=""><br class=""></div><div class="">Barry</div><div class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Aug 24, 2020, at 8:35 PM, Alfredo Jaramillo <<a href="mailto:ajaramillopalma@gmail.com" class="">ajaramillopalma@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">Hello Barry, Matthew, thanks for the replies !</div><div class=""><br class=""></div><div class="">Yes, it is our custom code, and it also happens when setting -pc_type bjacobi. Before testing an iterative solver, we were using MUMPS (-ksp_type preonly -ksp_pc_type lu -pc_factor_mat_solver_type mumps) without issues.</div><div class=""><br class=""></div><div class="">Running the ex19 (as "mpirun -n 4 ex19 -da_refine 5") did not produce any problem.<br class=""><br class=""></div><div class="">To reproduce the situation on my computer, I was able to reproduce the error for a small case and -pc_type bjacobi. For that particular case, when running in the cluster the error appears at the very last iteration:</div><div class=""><br class=""></div><div class="">=====<br class=""></div><div class=""> 27 KSP Residual norm 8.230378644666e-06 <br class="">[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br class="">[0]PETSC ERROR: Invalid argument<br class="">[0]PETSC ERROR: Scalar value must be same on all processes, argument # 3</div><div class="">====</div><div class=""><br class=""></div><div class="">whereas running on my computer the error is not launched and convergence is reached instead:</div><div class=""><br class=""></div><div class="">====<br class="">Linear interp_ solve converged due to CONVERGED_RTOL iterations 27</div><div class="">====</div><div class=""><br class=""></div><div class="">I will run valgrind to seek for possible memory corruptions.</div><div class=""><br class=""></div><div class="">thank you<br class=""></div><div class="">Alfredo<br class=""></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Aug 24, 2020 at 9:00 PM Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank" class="">bsmith@petsc.dev</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div dir="auto" class=""><div class=""><br class=""></div> Oh yes, it could happen with Nan. <div class=""><br class=""></div><div class=""> KSPGMRESClassicalGramSchmidtOrthogonalization() calls KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and set ksp->convergedreason but the call to MAXPY() is still made before returning and hence producing the error message.</div><div class=""><br class=""></div><div class=""> We should circuit the orthogonalization as soon as it sees a Nan/Inf and return immediately for GMRES to cleanup and produce a very useful error message. </div><div class=""><br class=""></div><div class=""> Alfredo,</div><div class=""><br class=""></div><div class=""> It is also possible that the hypre preconditioners are producing a Nan because your matrix is too difficult for them to handle, but it would be odd to happen after many iterations.</div><div class=""><br class=""></div><div class=""> As I suggested before run with -pc_type bjacobi to see if you get the same problem.</div><div class=""><br class=""></div><div class=""> Barry</div><div class=""><br class=""></div><div class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Aug 24, 2020, at 6:38 PM, Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank" class="">knepley@gmail.com</a>> wrote:</div><br class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class="">On Mon, Aug 24, 2020 at 6:27 PM Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank" class="">bsmith@petsc.dev</a>> wrote:<br class=""></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br class="">
Alfredo,<br class="">
<br class="">
This should never happen. The input to the VecMAXPY in gmres is computed via VMDot which produces the same result on all processes.<br class="">
<br class="">
If you run with -pc_type bjacobi does it also happen?<br class="">
<br class="">
Is this your custom code or does it happen in PETSc examples also? Like src/snes/tutorials/ex19 -da_refine 5 <br class="">
<br class="">
Could be memory corruption, can you run under valgrind?<br class=""></blockquote><div class=""><br class=""></div><div class="">Couldn't it happen if something generates a NaN? That also should not happen, but I was allowing that pilut might do it.</div><div class=""><br class=""></div><div class=""> Thanks,</div><div class=""><br class=""></div><div class=""> Matt</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Barry<br class="">
<br class="">
<br class="">
> On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo <<a href="mailto:ajaramillopalma@gmail.com" target="_blank" class="">ajaramillopalma@gmail.com</a>> wrote:<br class="">
> <br class="">
> Dear PETSc developers,<br class="">
> <br class="">
> I'm trying to solve a linear problem with GMRES preconditioned with pilut from HYPRE. For this I'm using the options:<br class="">
> <br class="">
> -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor<br class="">
> <br class="">
> If I use a single core, GMRES (+ pilut or euclid) converges. However, when using multiple cores the next error appears after some number of iterations:<br class="">
> <br class="">
> [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3<br class="">
> <br class="">
> relative to the function VecMAXPY. I attached a screenshot with more detailed output. The same happens when using euclid. Can you please give me some insight on this?<br class="">
> <br class="">
> best regards<br class="">
> Alfredo<br class="">
> <Screenshot from 2020-08-24 17-57-52.png><br class="">
<br class="">
</blockquote></div><br clear="all" class=""><div class=""><br class=""></div>-- <br class=""><div dir="ltr" class=""><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class="">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br class="">-- Norbert Wiener</div><div class=""><br class=""></div><div class=""><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank" class="">https://www.cse.buffalo.edu/~knepley/</a><br class=""></div></div></div></div></div></div></div></div>
</div></blockquote></div><br class=""></div></div></div></blockquote></div>
</div></blockquote></div><br class=""></div></div></body></html>