[petsc-dev] Checkpoint-restart with DMPlex objects

Mon Dec 17 04:56:39 CST 2018

Matt, great that your reminded this email. I actually completely missed it that time.

On 14 Dec 2018, at 19:54, Matthew Knepley via petsc-dev <petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>> wrote:

On Fri, Jul 20, 2018 at 5:34 AM Lawrence Mitchell <wence at gmx.li<mailto:wence at gmx.li>> wrote:
Dear petsc-dev,

I'm once again revisiting doing "proper" checkpoint-restart cycles.  I would like to leverage the existing PETSc stuff for this as much as possible, but I am a bit lost as to what is implemented, and what is missing.

I have:

- A (distributed) DMPlex defining the topology

- Some number of fields defined on this topology, each described by:

  - a Section (with a point permutation set)
  - a Vec of coefficients
  - Some extra information that describes what the coefficients mean, but let's assume I know how to handle that.

(Aside, for Vecs with a block size > 1, I actually have a section that indexes the blocks, which probably means I need to unroll into an unblocked version first).

Sections usually are {point --> {dofs}}. Not sure how it uses blocks instead of points.

I would like:

- To be able to dump the DMPlex, and fields, on N processes

I think the current HDF5 does what you want.

- To be able to load the DMPlex, and fields, on P processes.  In the first instance, to get things going, I am happy if P=1.

I think this also works with arbitrary P, although the testing can be described as extremely thin.

I think we need to be much more precise here. First off, there are now two HDF5 formats:
1) PETSC_VIEWER_HDF5_PETSC - store Plex graph serialization
2) PETSC_VIEWER_HDF5_XDMF - store XDMF-compatible representation of vertices and cells
3) PETSC_VIEWER_HDF5_VIZ slightly extends 2) with some stuff for visualization, you perhaps understand it better

PETSC_VIEWER_DEFAULT/PETSC_VIEWER_NATIVE mean store all three above. I think what Lawrence calls Native should be 1).

The format 1) is currently written in parallel but loaded sequentially
https://bitbucket.org/petsc/petsc/src/fbb1886742ac2bbe3b4d1df09bff9724d3fee060/src/dm/impls/plex/plexhdf5.c#lines-834

I don't understand, how it can work correctly for a distributed mesh while the Point SF (connecting partitions) is not stored FWICS. I think there's even no PetscSFView_HDF5(). I will check it more deeply soon.

The format 2) is for which I implemented parallel DMLoad().
Unfortunately, I can't declare it bulletproof until we declare parallel DMPlexInterpolate() as 100% working. I did quite some work towards it in
https://bitbucket.org/petsc/petsc/pull-requests/1227/dmplexintepolate-fix-orientation-of-faces/
but as stated in the PR summary, there are still some examples failing because of the wrong Point SF which is partly fixed in knepley/fix-plex-interpolate-sf but it seems it's not yet finished. Matt, is there any chance you could look at it at some point in near future?

I think for Lawrence's purposes, 2) can be used to read the initial mesh file but for checkpointing 1) seems to be better ATM because it dumps everything including interpolated edges & faces, labels and perhaps some more additional information.

I will nevertheless keep on working to improve 2) so that it can store edges & faces & labels in the XDMF-compatible way.

For dumping, I think I can do DMView(dm) in PETSc "native" format, and that will write out the topology in a global numbering.

I would use HDF5.

For the field coefficients, I can just VecView(vec).  But there does not appear to be any way of saving the Section so that I can actually attach those coefficients to points in the mesh.

Hmm, I will check this right now. If it does not exist, I will write it.

No, it certainly doesn't exist. There is only ASCII view implemented.

I can do PetscSectionCreateGlobalSection(section), so that I have a the global numbering for offsets, but presumably for the point numbering, I need to convert the local chart into global point numbers using DMPlexCreatePointNumbering?

No, all Sections use local points. We do not use global point numbers anywhere in Plex.

True. DMPlex is partition-wise sequential. The only thing which connects the submeshes is the Point SF.

For load, I can do DMLoad(dm), which only loads on rank-0 for now.

I do not think that is true.

Actually it is true if we talk about PETSC_VIEWER_HDF5_PETSC.

Then VecLoad for the coefficients, and (presumably) a putative PetscSectionLoad so that I can associate coefficients to points in the topology.

Okay, I am remembering more now. I just use the PetscDS to automatically create the Section, and pray that
it matches the Vec that we saved. This is a real design limitation, since the Section is associated with the DM
not the Vec, you have to assume all stored Vecs were created with the default Section in the DM. Now, if this
is true, then you could just load the Section, set it as the default for the DM, and load the Vec.

So it feels like the crucial part of this is a native (HDF5) based viewing for PetscSection.

Yep. I thought this worked, but maybe I only decided it should work.

For each section (I'm ignoring fields, because those are just sections anyway), there are two explicit, and one implicit, pieces of information:

1. The number of dofs per point

Yep.

2. The offset of the dof (this is the global numbering)

No, you should store the local Section. You use the SF for the DM to automatically get the global section.

True, see above.

3 (implicit). The association of the local chart numbering to the global chart.

Above.

Saving the first two is easy, how best to save the last so that I can load easily.  My thought is to, for each point in the section chart, save a three-tuple: (global-point-number, dofs-per-point, global-dof-number)

Hmm, I have to think a little bit more. Where are you at  with this now?

  Matt

Then, I can easily read this data in and correctly map from the on-disk coefficients into any section I might have built on the newly loaded DM.

Does this all sound reasonable?  Or have I missed either an existing implementation, or other issues?

Cheers,

Lawrence

I think there should be no big problem implementing PetscSection{View,Load}_HDF5(). I would suggest to do make use of ISView() to store the integer arrays (as I did with matrices). The same for PetscSF.

Note it seems to me XDMF could possibly store also those coefficients (mapped by PetscSection on PETSc side)
http://www.xdmf.org/index.php/XDMF_Model_and_Format#Attribute
I would be thankful for feedback on this.

Vaclav

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20181217/ce6b33c6/attachment.html>