<div dir="ltr"><div dir="ltr">On Sat, Feb 25, 2023 at 3:11 PM Mike Michell <<a href="mailto:mi.mike1021@gmail.com">mi.mike1021@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">My apologies for the late follow-up. There was a time conflict. <br><br>A simple example code related to the issue I mentioned is attached here. The sample code does: (1) load grid on dm, (2) compute vertex-wise control volume for each node in a median-dual way, (3) halo exchange among procs to have complete control volume values, and (4) print out its field as a .vtu file. To make sure, the computed control volume is also compared with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see lines 771-793). <br><br>Back to the original problem, I can get a proper control volume field with PETSc 3.18.4, which is the latest stable release. However, if I use PETSc from the main repo, it gives a strange control volume field. Something is certainly strange around the parallel boundaries, thus I think something went wrong with halo communication. To help understand, a comparing snapshot is also attached. I guess a certain part of my code is no longer compatible with PETSc unless there is a bug in the library. Could I get comments on it?<br></div></div></blockquote><div><br></div><div>I can run your example. The numbers I get for "median-dual volume" do not match the "PETSc volume", and the PETSc volume is correct. Moreover, the median-dual numbers change, which suggests a memory fault. I compiled it using address sanitizer, and it found an error:<br><br> Number of physical boundary edge ... 4 0 <br> Number of physical and parallel boundary edge ... 4 0 <br> Number of parallel boundary edge ... 0 0 <br> Number of physical boundary edge ... 4 1 <br> Number of physical and parallel boundary edge ... 4 1 <br> Number of parallel boundary edge ... 0 1 <br>=================================================================<br>==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8<br>READ of size 8 at 0x603000022d40 thread T0<br>=================================================================<br>==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8<br>READ of size 8 at 0x60300000f0f0 thread T0<br> #0 0x10cf702a7 in MAIN__ test.F90:657<br> #1 0x10cf768ee in main test.F90:43<br> #0 0x1068e12a7 in MAIN__ test.F90:657<br> #1 0x1068e78ee in main test.F90:43<br> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)<br><br>0x60300000f0f0 is located 0 bytes to the right of 32-byte region [0x60300000f0d0,0x60300000f0f0)<br>allocated by thread T0 here:<br> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)<br><br>0x603000022d40 is located 0 bytes to the right of 32-byte region [0x603000022d20,0x603000022d40)<br>allocated by thread T0 here:<br> #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f)<br> #1 0x1068dba71 in MAIN__ test.F90:499<br> #2 0x1068e78ee in main test.F90:43<br> #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)<br><br>SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__<br>Shadow bytes around the buggy address:</div><div><br></div><div>which corresponds to</div><div><br> ! midpoint of median-dual face for inner face<br> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell<br> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell<br></div><div><br></div><div>and these were allocated here</div><div><br></div><div> allocate(xc(ncell))<br> allocate(yc(ncell))<br></div><div><br></div><div>Hopefully the error is straightforward to see now.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Thanks,<br>Mike<br></div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br>On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Sat, Feb 18, 2023 at 12:00 PM Mike Michell <<a href="mailto:mi.mike1021@gmail.com" target="_blank">mi.mike1021@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">As a follow-up, I tested: <div><br></div><div>(1) Download tar for v3.18.4 from petsc gitlab (<a href="https://gitlab.com/petsc/petsc/-/tree/v3.18.4" target="_blank">https://gitlab.com/petsc/petsc/-/tree/v3.18.4</a>) has no issue on DMPlex halo exchange. This version works as I expect.</div><div>(2) Clone main branch (git clone <a href="https://gitlab.com/petsc/petsc.git" target="_blank">https://gitlab.com/petsc/petsc.git</a>) has issues with DMPlex halo exchange. Something is suspicious about this main branch, related to DMPlex halo. The solution field I got is not correct. But it works okay with 1-proc. </div><div><br></div><div>Does anyone have any comments on this issue? I am curious if other DMPlex users have no problem regarding halo exchange. FYI, I do not declare ghost layers for halo exchange. </div></div></div></blockquote><div><br></div><div>There should not have been any changes there and there are definitely tests for this.</div><div><br></div><div>It would be great if you could send something that failed. I could fix it and add it as a test.</div></div></div></blockquote><div><br></div><div>Just to follow up, we have tests of the low-level communication (Plex tests ex1, ex12, ex18, ex29, ex31), and then we have</div><div>tests that use halo exchange for PDE calculations, for example SNES tutorial ex12, ex13, ex62. THe convergence rates</div><div>should be off if the halo exchange were wrong. Is there any example similar to your code that is failing on your installation?</div><div>Or is there a way to run your code?</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div>Thanks,</div><div>Mike</div></div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br>Dear PETSc team,<div><br></div><div>I am using PETSc for Fortran with DMPlex. I have been using this version of PETSc:</div><div>>>git rev-parse origin</div><div>>>995ec06f924a86c4d28df68d1fdd6572768b0de1</div><div>>>git rev-parse FETCH_HEAD</div><div>>>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b</div><div><br></div><div>There has been no issue, before the one with VTK viewer, which Jed fixed today (<a href="https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735" rel="noreferrer" target="_blank">https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735</a>). </div><div><br></div><div>Since that MR has been merged into the main repo, I pulled the latest version of PETSc (basically I cloned it from scratch). But if I use the same configure option with before, and run my code, then there is an issue with halo exchange. The code runs without error message, but it gives wrong solution field. I guess the issue I have is related to graph partitioner or halo exchange part. This is because if I run the code with 1-proc, the solution is correct. I only updated the version of PETSc and there was no change in my own code. Could I get any comments on the issue? I was wondering if there have been many changes in halo exchange or graph partitioning & distributing part related to DMPlex.</div><div><br></div><div>Thanks,</div><div>Mike</div></div>
</blockquote></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>