[petsc-users] DMPlex Halo Communication or Graph Partitioner Issue

Mike Michell mi.mike1021 at gmail.com
Sun Feb 26 18:30:41 CST 2023


I will. Thank you very much!


> On Sun, Feb 26, 2023 at 6:56 PM Mike Michell <mi.mike1021 at gmail.com>
> wrote:
>
>> Okay that was the part of the code incompatible with the latest petsc.
>> Thank you for support.
>> I also need to call DMPlexLabelComplete() for the vertices on the
>> parallel and physical boundaries, but I get an error
>> if DMPlexLabelComplete() is called before DMSetPointSF(dm, sf).
>>
>> So I do:
>> {
>>   PetscSF sf;
>>
>>   PetscCall(DMGetPointSF(dm, &sf));
>>   PetscCall(DMSetPointSF(dm, NULL));
>>   PetscCall(DMPlexMarkBoundaryFaces(dm, val, label));
>>   PetscCall(DMSetPointSF(dm, sf));
>>   PetscCall(DMPlexLabelComplete(dm, label));
>> }
>>
>> I believe that this flow is okay since label is already marked on
>> parallel boundary. But could you please confirm that calling
>> DMPlexLabelComplete() after DMSetPointSF(dm, sf) does not create problems
>> to mark vertices on parallel boundary?
>>
>
> Yes. The idea is the following. A DMPlex is a collection of serial meshes.
> When an SF is specified, this identifies some mesh points
> on one process with those on another. There are no limits to the
> identification, so you can for instance have overlap.
>
>
>> The solution I get from the code with the above flow is now correct and
>> has no errors with latest petsc, but I want to double check.
>>
>
> Excellent. I think it might be possible to do what you want in less code.
> If sometime this is something that you want to pursue,
> please send me an overview mail about the calculations you are doing.
>
>   Thanks,
>
>      Matt
>
>
>> Thanks,
>> Mike
>>
>>
>>> On Sun, Feb 26, 2023 at 2:07 PM Mike Michell <mi.mike1021 at gmail.com>
>>> wrote:
>>>
>>>> I cannot agree with this argument, unless you also tested with petsc
>>>> 3.18.4 tarball from https://petsc.org/release/install/download/.
>>>> If library has issue, it is trivial that I will see an error from my
>>>> code.
>>>>
>>>> I ran my code with valgrind and see no error if it is with petsc
>>>> 3.18.4. You can test with my code with valgrind or address sanitizer with
>>>> this version of petsc-3.18.4.tar.gz from (
>>>> https://petsc.org/release/install/download/). I expect you see no
>>>> error.
>>>>
>>>>
>>>> Let me ask my question differently:
>>>> Has any change been made on DMPlexMarkBoundaryFaces() recently? I found
>>>> that the latest petsc does not recognize parallel (but not physical)
>>>> boundary as boundary for distributed dm (line 235 of my example code).
>>>> Because of this, you saw the error from the arrays:
>>>>
>>>
>>> The behavior of DMPlexMarkBoundaryFaces() was changed 3 months ago:
>>>
>>>
>>> https://gitlab.com/petsc/petsc/-/commit/429fa399fc3cd6fd42f3ca9697415d505b9dce5d
>>>
>>> I did update the documentation for that function
>>>
>>>   Note:
>>>   This function will use the point `PetscSF` from the input `DM` to
>>> exclude points on the partition boundary from being marked, unless the
>>>   partition overlap is greater than zero. If you also wish to mark the
>>> partition boundary, you can use `DMSetPointSF()` to temporarily set it to
>>>   NULL, and then reset it to the original object after the call.
>>>
>>> The reason is that if you call it in parallel, it is no longer suitable
>>> for applying boundary conditions. If you want to restore the prior behavior,
>>> you can use:
>>>
>>> {
>>>   PetscSF sf;
>>>
>>>   PetscCall(DMGetPointSF(dm, &sf));
>>>   PetscCall(DMSetPointSF(dm, NULL));
>>>   PetscCall(DMPlexMarkBoundaryFaces(dm, val, label));
>>>   PetscCall(DMSetPointSF(dm, sf));
>>> }
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>> ! midpoint of median-dual face for inner face
>>>>      axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell
>>>>      axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell
>>>>
>>>> and these were allocated here
>>>>
>>>>  allocate(xc(ncell))
>>>>  allocate(yc(ncell))
>>>>
>>>> as you pointed out. Or any change made on distribution of dm over procs?
>>>>
>>>> Thanks,
>>>> Mike
>>>>
>>>>
>>>>> On Sun, Feb 26, 2023 at 11:32 AM Mike Michell <mi.mike1021 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> This is what I get from petsc main which is not correct:
>>>>>> Overall volume computed from median-dual ...
>>>>>>    6.37050098781844
>>>>>> Overall volume computed from PETSc ...
>>>>>>    3.15470053800000
>>>>>>
>>>>>>
>>>>>> This is what I get from petsc 3.18.4 which is correct:
>>>>>> Overall volume computed from median-dual ...
>>>>>>    3.15470053800000
>>>>>> Overall volume computed from PETSc ...
>>>>>>    3.15470053800000
>>>>>>
>>>>>>
>>>>>> If there is a problem in the code, it is also strange for me that
>>>>>> petsc 3.18.4 gives the correct answer
>>>>>>
>>>>>
>>>>> As I said, this can happen due to different layouts in memory. If you
>>>>> run it under valgrind, or address sanitizer, you will see
>>>>> that there is a problem.
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>      Matt
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Mike
>>>>>>
>>>>>>
>>>>>>> On Sun, Feb 26, 2023 at 11:19 AM Mike Michell <mi.mike1021 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Which version of petsc you tested? With petsc 3.18.4, median duan
>>>>>>>> volume gives the same value with petsc from DMPlexComputeCellGeometryFVM().
>>>>>>>>
>>>>>>>
>>>>>>> This is only an accident of the data layout. The code you sent
>>>>>>> writes over memory in the local Fortran arrays.
>>>>>>>
>>>>>>>   Thanks,
>>>>>>>
>>>>>>>      Matt
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell <
>>>>>>>>> mi.mike1021 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> My apologies for the late follow-up. There was a time conflict.
>>>>>>>>>>
>>>>>>>>>> A simple example code related to the issue I mentioned is
>>>>>>>>>> attached here. The sample code does: (1) load grid on dm, (2) compute
>>>>>>>>>> vertex-wise control volume for each node in a median-dual way, (3) halo
>>>>>>>>>> exchange among procs to have complete control volume values, and (4) print
>>>>>>>>>> out its field as a .vtu file. To make sure, the computed control volume is
>>>>>>>>>> also compared with PETSc-computed control volume via
>>>>>>>>>> DMPlexComputeCellGeometryFVM() (see lines 771-793).
>>>>>>>>>>
>>>>>>>>>> Back to the original problem, I can get a proper control volume
>>>>>>>>>> field with PETSc 3.18.4, which is the latest stable release. However, if I
>>>>>>>>>> use PETSc from the main repo, it gives a strange control volume field.
>>>>>>>>>> Something is certainly strange around the parallel boundaries, thus I think
>>>>>>>>>> something went wrong with halo communication. To help understand, a
>>>>>>>>>> comparing snapshot is also attached. I guess a certain part of my code is
>>>>>>>>>> no longer compatible with PETSc unless there is a bug in the library. Could
>>>>>>>>>> I get comments on it?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can run your example. The numbers I get for "median-dual volume"
>>>>>>>>> do not match the "PETSc volume", and the PETSc volume is correct. Moreover,
>>>>>>>>> the median-dual numbers change, which suggests a memory fault. I compiled
>>>>>>>>> it using address sanitizer, and it found an error:
>>>>>>>>>
>>>>>>>>>  Number of physical boundary edge ...            4           0
>>>>>>>>>  Number of physical and parallel boundary edge ...            4
>>>>>>>>>         0
>>>>>>>>>  Number of parallel boundary edge ...            0           0
>>>>>>>>>  Number of physical boundary edge ...            4           1
>>>>>>>>>  Number of physical and parallel boundary edge ...            4
>>>>>>>>>         1
>>>>>>>>>  Number of parallel boundary edge ...            0           1
>>>>>>>>> =================================================================
>>>>>>>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address
>>>>>>>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8
>>>>>>>>> READ of size 8 at 0x603000022d40 thread T0
>>>>>>>>> =================================================================
>>>>>>>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address
>>>>>>>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8
>>>>>>>>> READ of size 8 at 0x60300000f0f0 thread T0
>>>>>>>>>     #0 0x10cf702a7 in MAIN__ test.F90:657
>>>>>>>>>     #1 0x10cf768ee in main test.F90:43
>>>>>>>>>     #0 0x1068e12a7 in MAIN__ test.F90:657
>>>>>>>>>     #1 0x1068e78ee in main test.F90:43
>>>>>>>>>     #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>>>>>>
>>>>>>>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region
>>>>>>>>> [0x60300000f0d0,0x60300000f0f0)
>>>>>>>>> allocated by thread T0 here:
>>>>>>>>>     #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>>>>>>
>>>>>>>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region
>>>>>>>>> [0x603000022d20,0x603000022d40)
>>>>>>>>> allocated by thread T0 here:
>>>>>>>>>     #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f)
>>>>>>>>>     #1 0x1068dba71 in MAIN__ test.F90:499
>>>>>>>>>     #2 0x1068e78ee in main test.F90:43
>>>>>>>>>     #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>>>>>>>>
>>>>>>>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in
>>>>>>>>> MAIN__
>>>>>>>>> Shadow bytes around the buggy address:
>>>>>>>>>
>>>>>>>>> which corresponds to
>>>>>>>>>
>>>>>>>>>      ! midpoint of median-dual face for inner face
>>>>>>>>>      axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell
>>>>>>>>>      axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell
>>>>>>>>>
>>>>>>>>> and these were allocated here
>>>>>>>>>
>>>>>>>>>  allocate(xc(ncell))
>>>>>>>>>  allocate(yc(ncell))
>>>>>>>>>
>>>>>>>>> Hopefully the error is straightforward to see now.
>>>>>>>>>
>>>>>>>>>   Thanks,
>>>>>>>>>
>>>>>>>>>     Matt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Mike
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley <
>>>>>>>>>>> knepley at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell <
>>>>>>>>>>>> mi.mike1021 at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> As a follow-up, I tested:
>>>>>>>>>>>>>
>>>>>>>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab (
>>>>>>>>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue
>>>>>>>>>>>>> on DMPlex halo exchange. This version works as I expect.
>>>>>>>>>>>>> (2) Clone main branch (git clone
>>>>>>>>>>>>> https://gitlab.com/petsc/petsc.git) has issues with DMPlex
>>>>>>>>>>>>> halo exchange. Something is suspicious about this main branch, related to
>>>>>>>>>>>>> DMPlex halo. The solution field I got is not correct. But it works okay
>>>>>>>>>>>>> with 1-proc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anyone have any comments on this issue? I am curious if
>>>>>>>>>>>>> other DMPlex users have no problem regarding halo exchange. FYI, I do not
>>>>>>>>>>>>> declare ghost layers for halo exchange.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> There should not have been any changes there and there are
>>>>>>>>>>>> definitely tests for this.
>>>>>>>>>>>>
>>>>>>>>>>>> It would be great if you could send something that failed. I
>>>>>>>>>>>> could fix it and add it as a test.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Just to follow up, we have tests of the low-level communication
>>>>>>>>>>> (Plex tests ex1, ex12, ex18, ex29, ex31), and then we have
>>>>>>>>>>> tests that use halo exchange for PDE calculations, for example
>>>>>>>>>>> SNES tutorial ex12, ex13, ex62. THe convergence rates
>>>>>>>>>>> should be off if the halo exchange were wrong. Is there any
>>>>>>>>>>> example similar to your code that is failing on your installation?
>>>>>>>>>>> Or is there a way to run your code?
>>>>>>>>>>>
>>>>>>>>>>>   Thanks,
>>>>>>>>>>>
>>>>>>>>>>>      Matt
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>   Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>>      Matt
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dear PETSc team,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using
>>>>>>>>>>>>>> this version of PETSc:
>>>>>>>>>>>>>> >>git rev-parse origin
>>>>>>>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1
>>>>>>>>>>>>>> >>git rev-parse FETCH_HEAD
>>>>>>>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There has been no issue, before the one with VTK viewer,
>>>>>>>>>>>>>> which Jed fixed today (
>>>>>>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735
>>>>>>>>>>>>>> ).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since that MR has been merged into the main repo, I pulled
>>>>>>>>>>>>>> the latest version of PETSc (basically I cloned it from scratch). But if I
>>>>>>>>>>>>>> use the same configure option with before, and run my code, then there is
>>>>>>>>>>>>>> an issue with halo exchange. The code runs without error message, but it
>>>>>>>>>>>>>> gives wrong solution field. I guess the issue I have is related to graph
>>>>>>>>>>>>>> partitioner or halo exchange part. This is because if I run the code with
>>>>>>>>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc and
>>>>>>>>>>>>>> there was no change in my own code. Could I get any comments on the issue?
>>>>>>>>>>>>>> I was wondering if there have been many changes in halo exchange or graph
>>>>>>>>>>>>>> partitioning & distributing part related to DMPlex.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> What most experimenters take for granted before they begin
>>>>>>>>>>>> their experiments is infinitely more interesting than any results to which
>>>>>>>>>>>> their experiments lead.
>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>
>>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>>>>> experiments lead.
>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>
>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>>> experiments lead.
>>>>>>>>> -- Norbert Wiener
>>>>>>>>>
>>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230226/dd03ced6/attachment-0001.html>


More information about the petsc-users mailing list