[petsc-users] DMPlex Halo Communication or Graph Partitioner Issue

Matthew Knepley knepley at gmail.com
Sun Feb 26 14:04:55 CST 2023


On Sun, Feb 26, 2023 at 2:07 PM Mike Michell <mi.mike1021 at gmail.com> wrote:

> I cannot agree with this argument, unless you also tested with petsc
> 3.18.4 tarball from https://petsc.org/release/install/download/.
> If library has issue, it is trivial that I will see an error from my code.
>
> I ran my code with valgrind and see no error if it is with petsc 3.18.4.
> You can test with my code with valgrind or address sanitizer with this
> version of petsc-3.18.4.tar.gz from (
> https://petsc.org/release/install/download/). I expect you see no error.
>
>
> Let me ask my question differently:
> Has any change been made on DMPlexMarkBoundaryFaces() recently? I found
> that the latest petsc does not recognize parallel (but not physical)
> boundary as boundary for distributed dm (line 235 of my example code).
> Because of this, you saw the error from the arrays:
>

The behavior of DMPlexMarkBoundaryFaces() was changed 3 months ago:


https://gitlab.com/petsc/petsc/-/commit/429fa399fc3cd6fd42f3ca9697415d505b9dce5d

I did update the documentation for that function

  Note:
  This function will use the point `PetscSF` from the input `DM` to exclude
points on the partition boundary from being marked, unless the
  partition overlap is greater than zero. If you also wish to mark the
partition boundary, you can use `DMSetPointSF()` to temporarily set it to
  NULL, and then reset it to the original object after the call.

The reason is that if you call it in parallel, it is no longer suitable for
applying boundary conditions. If you want to restore the prior behavior,
you can use:

{
  PetscSF sf;

  PetscCall(DMGetPointSF(dm, &sf));
  PetscCall(DMSetPointSF(dm, NULL));
  PetscCall(DMPlexMarkBoundaryFaces(dm, val, label));
  PetscCall(DMSetPointSF(dm, sf));
}

  Thanks,

     Matt

! midpoint of median-dual face for inner face
>      axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell
>      axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell
>
> and these were allocated here
>
>  allocate(xc(ncell))
>  allocate(yc(ncell))
>
> as you pointed out. Or any change made on distribution of dm over procs?
>
> Thanks,
> Mike
>
>
>> On Sun, Feb 26, 2023 at 11:32 AM Mike Michell <mi.mike1021 at gmail.com>
>> wrote:
>>
>>> This is what I get from petsc main which is not correct:
>>> Overall volume computed from median-dual ...
>>>    6.37050098781844
>>> Overall volume computed from PETSc ...
>>>    3.15470053800000
>>>
>>>
>>> This is what I get from petsc 3.18.4 which is correct:
>>> Overall volume computed from median-dual ...
>>>    3.15470053800000
>>> Overall volume computed from PETSc ...
>>>    3.15470053800000
>>>
>>>
>>> If there is a problem in the code, it is also strange for me that petsc
>>> 3.18.4 gives the correct answer
>>>
>>
>> As I said, this can happen due to different layouts in memory. If you run
>> it under valgrind, or address sanitizer, you will see
>> that there is a problem.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Thanks,
>>> Mike
>>>
>>>
>>>> On Sun, Feb 26, 2023 at 11:19 AM Mike Michell <mi.mike1021 at gmail.com>
>>>> wrote:
>>>>
>>>>> Which version of petsc you tested? With petsc 3.18.4, median duan
>>>>> volume gives the same value with petsc from DMPlexComputeCellGeometryFVM().
>>>>>
>>>>
>>>> This is only an accident of the data layout. The code you sent writes
>>>> over memory in the local Fortran arrays.
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>>
>>>>>
>>>>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell <mi.mike1021 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> My apologies for the late follow-up. There was a time conflict.
>>>>>>>
>>>>>>> A simple example code related to the issue I mentioned is attached
>>>>>>> here. The sample code does: (1) load grid on dm, (2) compute vertex-wise
>>>>>>> control volume for each node in a median-dual way, (3) halo exchange among
>>>>>>> procs to have complete control volume values, and (4) print out its field
>>>>>>> as a .vtu file. To make sure, the computed control volume is also compared
>>>>>>> with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see
>>>>>>> lines 771-793).
>>>>>>>
>>>>>>> Back to the original problem, I can get a proper control volume
>>>>>>> field with PETSc 3.18.4, which is the latest stable release. However, if I
>>>>>>> use PETSc from the main repo, it gives a strange control volume field.
>>>>>>> Something is certainly strange around the parallel boundaries, thus I think
>>>>>>> something went wrong with halo communication. To help understand, a
>>>>>>> comparing snapshot is also attached. I guess a certain part of my code is
>>>>>>> no longer compatible with PETSc unless there is a bug in the library. Could
>>>>>>> I get comments on it?
>>>>>>>
>>>>>>
>>>>>> I can run your example. The numbers I get for "median-dual volume" do
>>>>>> not match the "PETSc volume", and the PETSc volume is correct. Moreover,
>>>>>> the median-dual numbers change, which suggests a memory fault. I compiled
>>>>>> it using address sanitizer, and it found an error:
>>>>>>
>>>>>>  Number of physical boundary edge ...            4           0
>>>>>>  Number of physical and parallel boundary edge ...            4
>>>>>>     0
>>>>>>  Number of parallel boundary edge ...            0           0
>>>>>>  Number of physical boundary edge ...            4           1
>>>>>>  Number of physical and parallel boundary edge ...            4
>>>>>>     1
>>>>>>  Number of parallel boundary edge ...            0           1
>>>>>> =================================================================
>>>>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address
>>>>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8
>>>>>> READ of size 8 at 0x603000022d40 thread T0
>>>>>> =================================================================
>>>>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address
>>>>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8
>>>>>> READ of size 8 at 0x60300000f0f0 thread T0
>>>>>>     #0 0x10cf702a7 in MAIN__ test.F90:657
>>>>>>     #1 0x10cf768ee in main test.F90:43
>>>>>>     #0 0x1068e12a7 in MAIN__ test.F90:657
>>>>>>     #1 0x1068e78ee in main test.F90:43
>>>>>>     #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>>>
>>>>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region
>>>>>> [0x60300000f0d0,0x60300000f0f0)
>>>>>> allocated by thread T0 here:
>>>>>>     #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>>>
>>>>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region
>>>>>> [0x603000022d20,0x603000022d40)
>>>>>> allocated by thread T0 here:
>>>>>>     #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f)
>>>>>>     #1 0x1068dba71 in MAIN__ test.F90:499
>>>>>>     #2 0x1068e78ee in main test.F90:43
>>>>>>     #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>>>
>>>>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__
>>>>>> Shadow bytes around the buggy address:
>>>>>>
>>>>>> which corresponds to
>>>>>>
>>>>>>      ! midpoint of median-dual face for inner face
>>>>>>      axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell
>>>>>>      axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell
>>>>>>
>>>>>> and these were allocated here
>>>>>>
>>>>>>  allocate(xc(ncell))
>>>>>>  allocate(yc(ncell))
>>>>>>
>>>>>> Hopefully the error is straightforward to see now.
>>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>     Matt
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Mike
>>>>>>>
>>>>>>>
>>>>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley <knepley at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell <
>>>>>>>>> mi.mike1021 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> As a follow-up, I tested:
>>>>>>>>>>
>>>>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab (
>>>>>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on
>>>>>>>>>> DMPlex halo exchange. This version works as I expect.
>>>>>>>>>> (2) Clone main branch (git clone
>>>>>>>>>> https://gitlab.com/petsc/petsc.git) has issues with DMPlex halo
>>>>>>>>>> exchange. Something is suspicious about this main branch, related to DMPlex
>>>>>>>>>> halo. The solution field I got is not correct. But it works okay with
>>>>>>>>>> 1-proc.
>>>>>>>>>>
>>>>>>>>>> Does anyone have any comments on this issue? I am curious if
>>>>>>>>>> other DMPlex users have no problem regarding halo exchange. FYI, I do not
>>>>>>>>>> declare ghost layers for halo exchange.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There should not have been any changes there and there are
>>>>>>>>> definitely tests for this.
>>>>>>>>>
>>>>>>>>> It would be great if you could send something that failed. I could
>>>>>>>>> fix it and add it as a test.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Just to follow up, we have tests of the low-level communication
>>>>>>>> (Plex tests ex1, ex12, ex18, ex29, ex31), and then we have
>>>>>>>> tests that use halo exchange for PDE calculations, for example SNES
>>>>>>>> tutorial ex12, ex13, ex62. THe convergence rates
>>>>>>>> should be off if the halo exchange were wrong. Is there any example
>>>>>>>> similar to your code that is failing on your installation?
>>>>>>>> Or is there a way to run your code?
>>>>>>>>
>>>>>>>>   Thanks,
>>>>>>>>
>>>>>>>>      Matt
>>>>>>>>
>>>>>>>>
>>>>>>>>>   Thanks,
>>>>>>>>>
>>>>>>>>>      Matt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Mike
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Dear PETSc team,
>>>>>>>>>>>
>>>>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using this
>>>>>>>>>>> version of PETSc:
>>>>>>>>>>> >>git rev-parse origin
>>>>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1
>>>>>>>>>>> >>git rev-parse FETCH_HEAD
>>>>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b
>>>>>>>>>>>
>>>>>>>>>>> There has been no issue, before the one with VTK viewer, which
>>>>>>>>>>> Jed fixed today (
>>>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735
>>>>>>>>>>> ).
>>>>>>>>>>>
>>>>>>>>>>> Since that MR has been merged into the main repo, I pulled the
>>>>>>>>>>> latest version of PETSc (basically I cloned it from scratch). But if I use
>>>>>>>>>>> the same configure option with before, and run my code, then there is an
>>>>>>>>>>> issue with halo exchange. The code runs without error message, but it gives
>>>>>>>>>>> wrong solution field. I guess the issue I have is related to graph
>>>>>>>>>>> partitioner or halo exchange part. This is because if I run the code with
>>>>>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc and
>>>>>>>>>>> there was no change in my own code. Could I get any comments on the issue?
>>>>>>>>>>> I was wondering if there have been many changes in halo exchange or graph
>>>>>>>>>>> partitioning & distributing part related to DMPlex.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Mike
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>>> experiments lead.
>>>>>>>>> -- Norbert Wiener
>>>>>>>>>
>>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>> experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230226/9d0a73a9/attachment-0001.html>


More information about the petsc-users mailing list