<div dir="ltr">Thank you; that fixed the problem. I added an<div><br><div>else</div><div><div>{</div><div> PetscCall(VecCUDAReplaceArray(v, NULL));</div><div>}</div></div><div><br></div><div>Thanks,</div><div>Sreeram</div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Nov 17, 2023 at 12:09 PM Barry Smith <<a href="mailto:bsmith@petsc.dev">bsmith@petsc.dev</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><br></div> So the "bug" is not as ginormous as I originally thought. It will never produce incorrect results but can result in the errors you received.<div><br></div><div> The problem is </div><div><br></div><div><div>if (row_rank == 0)</div><div> {</div><div> PetscCall(VecCUDAReplaceArray(v, d_a));</div><div> }</div><div><br></div><div>The place/replacearray routines are actually collective; and need to be called by all MPI processes that own a vector regardless of the local size. This is because the call can invalidate the previously known norm values that have been cached in the vector. If the norm values are invalidated on some MPI processes but not others you will get the error you have seen.</div><div><br></div><div> Barry</div><div><br></div><div> I will prepare a branch with better documentation and clearer error handling for this situation.</div><div><br></div><div><br></div><div><br></div><div><br><blockquote type="cite"><div>On Nov 16, 2023, at 6:30 PM, Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>> wrote:</div><br><div><div><div><br></div> Congratulations you have found a ginormous bug in PETSc! Thanks for the detail information on the problem.<div><br></div><div> I will post a fix shortly.</div><div><br></div><div> Barry</div><div><br id="m_274998358867683436lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite"><div>On Nov 16, 2023, at 6:19 PM, Sreeram R Venkat <<a href="mailto:srvenkat@utexas.edu" target="_blank">srvenkat@utexas.edu</a>> wrote:</div><br><div><div dir="ltr"><div dir="ltr" class="gmail_signature"><div dir="ltr"><div style="color:rgb(34,34,34)">I have a program which reads a vector from file into an array, and then uses that array to create a PETSc Vec object. The Vec is defined on the global communicator, but not all processes actually contain entries of it. For example, suppose we have 4 processors, and the vector is of size 10. Rank 0 will contain entries 0-4 and Rank 1 will contain entries 5-9. Ranks 2 and 3 will not have any entries of the Vec.</div><div style="color:rgb(34,34,34)"><br></div><div style="color:rgb(34,34,34)">This Vec is then used as an input to other parts of the code, and those work fine. However, if I try to take the norm of the Vec with VecNorm(), I get the error</div><div style="color:rgb(34,34,34)"><br></div><div style="color:rgb(34,34,34)">`MPI_Allreduce() called in different locations (code lines) on different processors`<br></div><div style="color:rgb(34,34,34)"><br></div><div style="color:rgb(34,34,34)">The stack trace shows that ranks 0 and 1 (from the above example) are still in the VecNorm() function while ranks 2 and 3 have moved on to a later part of the code. If I add a PetscBarrier() after the VecNorm(), I find that the program hangs. </div><div style="color:rgb(34,34,34)"><br></div><div style="color:rgb(34,34,34)">The funny thing is that part of the code duplicates the Vec with VecDuplicate() and assigns to the duplicated vector the result of some computations. The duplicated Vec has the same layout as the original Vec, but taking VecNorm() on the duplicated Vec works fine. If I use VecCopy(), however, the copied Vec also causes VecNorm() to hang. I've printed out the original Vec, and there are no corrupted/NaN entries.</div><div style="color:rgb(34,34,34)"><br></div><div style="color:rgb(34,34,34)">I have a temporary workaround where I perturb the original Vec slightly before copying it to another Vec. This causes the program to successfully terminate.</div><div style="color:rgb(34,34,34)"><br></div><div style="color:rgb(34,34,34)">Any advice on how to get VecNorm() working with the original Vec?</div><div style="color:rgb(34,34,34)"><br></div><div style="color:rgb(34,34,34)">Thanks,</div><div style="color:rgb(34,34,34)">Sreeram</div></div></div></div>
</div></blockquote></div><br></div></div></div></blockquote></div><br></div></div></blockquote></div>