[petsc-users] DMPlex Halo Communication or Graph Partitioner Issue

Mike Michell mi.mike1021 at gmail.com
Sun Feb 26 10:32:15 CST 2023


This is what I get from petsc main which is not correct:
Overall volume computed from median-dual ...
   6.37050098781844
Overall volume computed from PETSc ...
   3.15470053800000


This is what I get from petsc 3.18.4 which is correct:
Overall volume computed from median-dual ...
   3.15470053800000
Overall volume computed from PETSc ...
   3.15470053800000


If there is a problem in the code, it is also strange for me that petsc
3.18.4 gives the correct answer.

Thanks,
Mike


> On Sun, Feb 26, 2023 at 11:19 AM Mike Michell <mi.mike1021 at gmail.com>
> wrote:
>
>> Which version of petsc you tested? With petsc 3.18.4, median duan volume
>> gives the same value with petsc from DMPlexComputeCellGeometryFVM().
>>
>
> This is only an accident of the data layout. The code you sent writes over
> memory in the local Fortran arrays.
>
>   Thanks,
>
>      Matt
>
>
>>
>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell <mi.mike1021 at gmail.com>
>>> wrote:
>>>
>>>> My apologies for the late follow-up. There was a time conflict.
>>>>
>>>> A simple example code related to the issue I mentioned is attached
>>>> here. The sample code does: (1) load grid on dm, (2) compute vertex-wise
>>>> control volume for each node in a median-dual way, (3) halo exchange among
>>>> procs to have complete control volume values, and (4) print out its field
>>>> as a .vtu file. To make sure, the computed control volume is also compared
>>>> with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see
>>>> lines 771-793).
>>>>
>>>> Back to the original problem, I can get a proper control volume field
>>>> with PETSc 3.18.4, which is the latest stable release. However, if I use
>>>> PETSc from the main repo, it gives a strange control volume field.
>>>> Something is certainly strange around the parallel boundaries, thus I think
>>>> something went wrong with halo communication. To help understand, a
>>>> comparing snapshot is also attached. I guess a certain part of my code is
>>>> no longer compatible with PETSc unless there is a bug in the library. Could
>>>> I get comments on it?
>>>>
>>>
>>> I can run your example. The numbers I get for "median-dual volume" do
>>> not match the "PETSc volume", and the PETSc volume is correct. Moreover,
>>> the median-dual numbers change, which suggests a memory fault. I compiled
>>> it using address sanitizer, and it found an error:
>>>
>>>  Number of physical boundary edge ...            4           0
>>>  Number of physical and parallel boundary edge ...            4
>>>   0
>>>  Number of parallel boundary edge ...            0           0
>>>  Number of physical boundary edge ...            4           1
>>>  Number of physical and parallel boundary edge ...            4
>>>   1
>>>  Number of parallel boundary edge ...            0           1
>>> =================================================================
>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address
>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8
>>> READ of size 8 at 0x603000022d40 thread T0
>>> =================================================================
>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address
>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8
>>> READ of size 8 at 0x60300000f0f0 thread T0
>>>     #0 0x10cf702a7 in MAIN__ test.F90:657
>>>     #1 0x10cf768ee in main test.F90:43
>>>     #0 0x1068e12a7 in MAIN__ test.F90:657
>>>     #1 0x1068e78ee in main test.F90:43
>>>     #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>
>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region
>>> [0x60300000f0d0,0x60300000f0f0)
>>> allocated by thread T0 here:
>>>     #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>
>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region
>>> [0x603000022d20,0x603000022d40)
>>> allocated by thread T0 here:
>>>     #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f)
>>>     #1 0x1068dba71 in MAIN__ test.F90:499
>>>     #2 0x1068e78ee in main test.F90:43
>>>     #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>
>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__
>>> Shadow bytes around the buggy address:
>>>
>>> which corresponds to
>>>
>>>      ! midpoint of median-dual face for inner face
>>>      axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell
>>>      axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell
>>>
>>> and these were allocated here
>>>
>>>  allocate(xc(ncell))
>>>  allocate(yc(ncell))
>>>
>>> Hopefully the error is straightforward to see now.
>>>
>>>   Thanks,
>>>
>>>     Matt
>>>
>>>
>>>> Thanks,
>>>> Mike
>>>>
>>>>
>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley <knepley at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell <mi.mike1021 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> As a follow-up, I tested:
>>>>>>>
>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab (
>>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on
>>>>>>> DMPlex halo exchange. This version works as I expect.
>>>>>>> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git)
>>>>>>> has issues with DMPlex halo exchange. Something is suspicious about this
>>>>>>> main branch, related to DMPlex halo. The solution field I got is not
>>>>>>> correct. But it works okay with 1-proc.
>>>>>>>
>>>>>>> Does anyone have any comments on this issue? I am curious if other
>>>>>>> DMPlex users have no problem regarding halo exchange. FYI, I do not
>>>>>>> declare ghost layers for halo exchange.
>>>>>>>
>>>>>>
>>>>>> There should not have been any changes there and there are definitely
>>>>>> tests for this.
>>>>>>
>>>>>> It would be great if you could send something that failed. I could
>>>>>> fix it and add it as a test.
>>>>>>
>>>>>
>>>>> Just to follow up, we have tests of the low-level communication (Plex
>>>>> tests ex1, ex12, ex18, ex29, ex31), and then we have
>>>>> tests that use halo exchange for PDE calculations, for example SNES
>>>>> tutorial ex12, ex13, ex62. THe convergence rates
>>>>> should be off if the halo exchange were wrong. Is there any example
>>>>> similar to your code that is failing on your installation?
>>>>> Or is there a way to run your code?
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>      Matt
>>>>>
>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>      Matt
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Mike
>>>>>>>
>>>>>>>
>>>>>>>> Dear PETSc team,
>>>>>>>>
>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using this
>>>>>>>> version of PETSc:
>>>>>>>> >>git rev-parse origin
>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1
>>>>>>>> >>git rev-parse FETCH_HEAD
>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b
>>>>>>>>
>>>>>>>> There has been no issue, before the one with VTK viewer, which Jed
>>>>>>>> fixed today (
>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735
>>>>>>>> ).
>>>>>>>>
>>>>>>>> Since that MR has been merged into the main repo, I pulled the
>>>>>>>> latest version of PETSc (basically I cloned it from scratch). But if I use
>>>>>>>> the same configure option with before, and run my code, then there is an
>>>>>>>> issue with halo exchange. The code runs without error message, but it gives
>>>>>>>> wrong solution field. I guess the issue I have is related to graph
>>>>>>>> partitioner or halo exchange part. This is because if I run the code with
>>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc and
>>>>>>>> there was no change in my own code. Could I get any comments on the issue?
>>>>>>>> I was wondering if there have been many changes in halo exchange or graph
>>>>>>>> partitioning & distributing part related to DMPlex.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mike
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230226/301f7658/attachment-0001.html>


More information about the petsc-users mailing list