[petsc-dev] Checkpoint-restart with DMPlex objects

Tue Dec 18 07:30:13 CST 2018

On Tue, Dec 18, 2018 at 8:28 AM Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Dec 18, 2018 at 6:54 AM Hapla Vaclav <vaclav.hapla at erdw.ethz.ch>
> wrote:
>
>>
>>
>> On 17 Dec 2018, at 20:36, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> On Mon, Dec 17, 2018 at 12:11 PM Lawrence Mitchell <wence at gmx.li> wrote:
>>
>>>
>>> > On 17 Dec 2018, at 11:56, Hapla Vaclav <vaclav.hapla at erdw.ethz.ch>
>>> wrote:
>>> >
>>> > Matt, great that your reminded this email. I actually completely
>>> missed it that time.
>>> >
>>> >> On 14 Dec 2018, at 19:54, Matthew Knepley via petsc-dev <
>>> petsc-dev at mcs.anl.gov> wrote:
>>>
>>> [...]
>>>
>>> >> I would like:
>>> >>
>>> >> - To be able to dump the DMPlex, and fields, on N processes
>>> >>
>>> >> I think the current HDF5 does what you want.
>>> >>
>>> >> - To be able to load the DMPlex, and fields, on P processes.  In the
>>> first instance, to get things going, I am happy if P=1.
>>> >>
>>> >> I think this also works with arbitrary P, although the testing can be
>>> described as extremely thin.
>>> >
>>> > I think we need to be much more precise here. First off, there are now
>>> two HDF5 formats:
>>> > 1) PETSC_VIEWER_HDF5_PETSC - store Plex graph serialization
>>> > 2) PETSC_VIEWER_HDF5_XDMF - store XDMF-compatible representation of
>>> vertices and cells
>>> > 3) PETSC_VIEWER_HDF5_VIZ slightly extends 2) with some stuff for
>>> visualization, you perhaps understand it better
>>> >
>>> > PETSC_VIEWER_DEFAULT/PETSC_VIEWER_NATIVE mean store all three above. I
>>> think what Lawrence calls Native should be 1).
>>> >
>>> > The format 1) is currently written in parallel but loaded sequentially
>>> >
>>> https://bitbucket.org/petsc/petsc/src/fbb1886742ac2bbe3b4d1df09bff9724d3fee060/src/dm/impls/plex/plexhdf5.c#lines-834
>>> >
>>> > I don't understand, how it can work correctly for a distributed mesh
>>> while the Point SF (connecting partitions) is not stored FWICS. I think
>>> there's even no PetscSFView_HDF5(). I will check it more deeply soon.
>>> >
>>> > The format 2) is for which I implemented parallel DMLoad().
>>> > Unfortunately, I can't declare it bulletproof until we declare
>>> parallel DMPlexInterpolate() as 100% working. I did quite some work towards
>>> it in
>>> >
>>> https://bitbucket.org/petsc/petsc/pull-requests/1227/dmplexintepolate-fix-orientation-of-faces/
>>> > but as stated in the PR summary, there are still some examples failing
>>> because of the wrong Point SF which is partly fixed in
>>> knepley/fix-plex-interpolate-sf but it seems it's not yet finished. Matt,
>>> is there any chance you could look at it at some point in near future?
>>> >
>>> > I think for Lawrence's purposes, 2) can be used to read the initial
>>> mesh file but for checkpointing 1) seems to be better ATM because it dumps
>>> everything including interpolated edges & faces, labels and perhaps some
>>> more additional information.
>>>
>>> OK, so I guess there are two different things going on here:
>>>
>>> 1. Store the data you need to reconstruct a DMPlex
>>>
>>> 2. Store the data you need to have a DMPlex viewable via XDMF.
>>>
>>> 3. Store the data you need to reconstruct a DMPlex AND have it viewable
>>> via XDMF.
>>>
>>> For checkpointing only purposes, I only really need 1; for viz purposes,
>>> one only needs 2; ideally, one would not separate viz and checkpointing
>>> files if there is sufficient overlap of data (I think there is), which
>>> needs 3.
>>>
>>> > I will nevertheless keep on working to improve 2) so that it can store
>>> edges & faces & labels in the XDMF-compatible way.
>>> >
>>> >>
>>> >> For dumping, I think I can do DMView(dm) in PETSc "native" format,
>>> and that will write out the topology in a global numbering.
>>> >>
>>> >> I would use HDF5.
>>> >>
>>> >> For the field coefficients, I can just VecView(vec).  But there does
>>> not appear to be any way of saving the Section so that I can actually
>>> attach those coefficients to points in the mesh.
>>> >>
>>> >> Hmm, I will check this right now. If it does not exist, I will write
>>> it.
>>> >
>>> > No, it certainly doesn't exist. There is only ASCII view implemented.
>>> >
>>> >>
>>> >> I can do PetscSectionCreateGlobalSection(section), so that I have a
>>> the global numbering for offsets, but presumably for the point numbering, I
>>> need to convert the local chart into global point numbers using
>>> DMPlexCreatePointNumbering?
>>> >>
>>> >> No, all Sections use local points. We do not use global point numbers
>>> anywhere in Plex.
>>> >
>>> > True. DMPlex is partition-wise sequential. The only thing which
>>> connects the submeshes is the Point SF.
>>>
>>> OK, so I think I misunderstood what the dump format looks like then. For
>>> parallel store/load cycle when I go from N to P processes what must I do?
>>>
>>> If I understand correctly the dump on N processes contains:
>>>
>>> For each process, in process-local numbering
>>>
>>>  - The DMPlex topology on that process
>>>
>>> Now, given that the only thing that connects these local pieces of the
>>> DM together is the point SF, as Vaclav says, it must be the case that a
>>> reloadable dump file contains that information.
>>>
>>
>> No, the dump contains a completely consistent serial DM. Now I remember
>> why parallel load is not implemented :)
>> We demand that the dump look identical from any number of procs for all
>> PETSc stuff. Thus we get a global renumbering
>> and dump with that for all things.
>>
>>
>> Oh I see. I misunderstood this. More clear to me know -
>> DMPlexCreatePointNumbering() is employed in DMPlexView_HDF5_Internal().
>>
>>
>> Now, when we load in parallel, we need to use the new parallel loading
>> from Michael.
>>
>>
>> What exactly do you mean?
>> If I remember well Michael implemented a parallel MED loader and it uses DMPlexCreateFromCellListParallel()
>> just as my XDMF-HDF5 reader does.
>> Is this function what you mean by "the new parallel loading"?
>>
>
> Yes exactly. You are doing it right. We just need to extend that.
> Actually, I think we should probably just store an attribute for
> interpolated meshes, and interpolate on load. This is much simpler, less
> storage, and makes everything uniform. What do you think?
>

Hmm, the problematic part is the labels. How do we make sure we are
labeling the right edge/face?

  Matt

>   Thanks,
>
>     Matt
>
>
>> Thanks
>>
>> Vaclav
>>
>> I have not yet written that, but it should
>> be straightforward :) So the below is not really right. We need to call
>> parallel load for the topology. Then we need code that
>> loads the labels and uses the migration SF to redistribute them, but I
>> think that code already exists for redistribution, so we
>> just hijack it.
>>
>>   Matt
>>
>>
>>> OK, so to dump a field so that we can reload it we must need:
>>>
>>> - topology (in local numbering)
>>> - point SF (connecting the local pieces of the topology together)
>>> - Vector (dofs), presumably in local layout to make things easier
>>> - Section describing the vector layout (local numbering)
>>>
>>> So to load, I do:
>>>
>>> 1. load and distribute topology, and construct new point SF (this
>>> presumably gives me a "migration SF" that maps from old points to new points
>>>
>>> 2. Broadcast the Section over migration SF so that we know how many dofs
>>> belong to each point in the new topology
>>>
>>> 3. Broadcast the Vec over the migration SF to get the dofs to the right
>>> place.
>>>
>>> Whenever I think of this on paper it seems "easy", but then I
>>> occasionally try and sit down and do it and immediately get lost, so I am
>>> normally missing something.
>>>
>>> What am I missing this time?
>>>
>>> Lawrence
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20181218/60a2c412/attachment.html>