[petsc-dev] [petsc-maint #60349] Saving multiple Vec in the same binary file

Fri Jan 7 10:31:10 CST 2011

Maybe that's a good opportunity to gauge interest in something I've been
working on for my own codes, but which could be used in PETSc. It's not
quite ready even in its separate form, but starting to work okay on my side.

It's basically an object model for C, and in many senses quite similar to
PetscObj. (I haven't looked at all the intricacies of PetscObj, but what I
have is quite flexible, so I'm pretty sure things could be made to work).
There are "classes", think for Vec, Mat. So a Vec object is an instance of
the Vec class, and has its own C type (a pointer to a struct). Classes can
have subclasses a.k.a. types, that is they can at run-time internally be
switched between different implementations, think a dense or a sparse
matrix, etc. There's create(), destroy() and ref counting.

Okay, so that sounds exactly like petsc (it pretty much is). But there's
more: There are descriptors for classes and subclasses (types), which for
the most part of course are just structs -- but those descriptors list
certain members of the struct, for example parameters. So after parameters
are registered, the object model knows about them and can do certain things
without the need for additional repetitive code, e.g. parameter parsing from
the command line and a "view" method that will print out the values. In
fact, these descriptors help in saving and restoring the state of an object
to a file, too. To give you some idea, the whole thing looks somewhat like
(not too useful, since a class normally wants its own methods in addition to
the standard ones)

// in the header

extern struct mrc_class mrc_class_mrc_io;

MRC_OBJ_DEFINE_STANDARD_METHODS(mrc_io, struct mrc_io);
// class specific methods go here

// private header

struct mrc_io {
  struct mrc_obj obj;
  char *outdir;
  char *basename;
};

// implementation

static void
mrc_io_init()
{
#ifdef HAVE_HDF5_H
  libmrc_io_register_xdmf();
#endif
  libmrc_io_register_ascii();
  libmrc_io_register_combined();
}

#define VAR(x) (void *)offsetof(struct mrc_io, x)
static struct param mrc_io_descr[] = {
  { "outdir"          , VAR(outdir)       , PARAM_STRING(".")      },
  { "basename"        , VAR(basename)     , PARAM_STRING("run")    },
  {},
};
#undef VAR

struct mrc_class mrc_class_mrc_io = {
  .name         = "mrc_io",
  .size         = sizeof(struct mrc_io),
  .param_descr  = mrc_io_descr,
  .init         = mrc_io_init,
  .setup        = _mrc_io_setup,
};

which is basically all to provide you with a mrc_io class, though not too
useful until you add in your own methods. The class will have standard
methods create(), destroy(), view(), set_type(), set_from_options(), setup()
etc. setup() in the example above is actually overridden, though not shown.
As it is in C, there's some ugliness in casting types involved when
implementing / overriding methods, but not too bad (IMO).

Users now can set an option with 'mrc_io_set_param_string(io, "basename",
"myname");' Of course that's not as fast as a a direct assignment (which
requires knowledge of the internal struct though), but setting parameters is
not normally performance critical in the first place. Of course a given
class or subclass can implement their own methods which bypass "pass the
option name as string" if needed for some reason. "--mrc_io_basename myname"
will also automatically work (this was the primary goal, the
..._set_param_<type>() methods are just a side benefit which should
alleviate the need to write custom code for every integer / float / choice /
... parameter. I realize there is sometimes the need to customize what
happens when a certain parameter is set rather than just assigning the
value, that wouldn't be a problem to add. The descriptor table would also be
the place to add the help texts.

Not all of the following is finished, but there's more: Objects can have a
name assigned (and should have a name assigned, maybe that should even be
mandatory). The name is automatically used for e.g. parsing the command
line, so the option above then might become "--my_io_name_basename ...".
Objects can be organized in a hierarchy (but they don't have to be). If a PC
has a KSP as its parent, the option name will automatically be
"--ksp_pc_...", or "--<ksp_name>_<pc_name>_..." if the objects have been
named. Objects can also have references to other objects (a Vec may have a
reference to a DA, but it wouldn't want to be its parent). This kind of
state can be written to disk (currently in HDF5, but the interface is more
generic). Standard state is written and restored automatically, e.g.,
parameters, children and parent objs as well as references to other objects.
For a given class or subclass, one only needs to add in the code which
writes the remaining specific state. There's certainly a caveat here about
initialization/setup order when rereading such a thing from disk, which can
be tricky and not happen as automatically as one would like.

Everything can be overridden, so I'm quite sure the model can deal with
everything that is currently done in petsc, but it of course only makes
sense to go there if much of petsc follows a standard model, which I think
it does. Then a lot of redundant code can be removed and replaced by a
somewhat self-describing object type which generalizes general housekeeping
and improves consistency, while allowing to keep the fast paths untouched.

--Kai

On Fri, Jan 7, 2011 at 10:29 AM, Matthew Knepley <petsc-maint at mcs.anl.gov>wrote:

> I have bitten the HDF5 bullet. From Python, it is great since you can use
> PyTables, which is an excellent package. I am still getting the PETSc
> support there, but the Vec output is fine.
>
>    Matt
>
>
> On Fri, Jan 7, 2011 at 3:48 AM, Jed Brown <jed at 59a2.org> wrote:
>
>> On Thu, Jan 6, 2011 at 16:04, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>> > > I guess one could store some kind of a catalog in the .info file that
>> > would allow reloading object by name in any order. Do you have a feeling
>> > that anybody other than me could be interested?
>> >
>>
>> In the long term, we need something better.  PETSc binary formats can't
>> carry enough metadata to be usefully plottable without user interaction.
>>  I
>> think I'm still in favor of a SQLite backend (for simplicity and
>> versatility) indexing binary files written using MPI-IO.  Most
>> visualization
>> and metadata tasks should be answerable in 1-line SQL queries instead of
>> custom page-long graph traversals.
>>
>> >>> On a different but not unrelated topic, Imtiaz who had started looking
>> > at the vtk IO had to leave LSU after our international services office
>> > screwed up his immigration paperwork... I am going to work with another
>> > student, Matt Kemp, on this project. Matt happens to also work part time
>> for
>> > ANL and will be at MCS next week. I'll ask him to stop by and introduce
>> > himself to the group. He is also a good python / web programer so I was
>> > thinking of asking him to have a look at item 3 of the proposed project
>> > list: "Converting PetscLogViewPython() to generate JSON instead and
>> > developing Python parsers for quickly generating nice tables of
>> performance
>> > details from runs or groups of runs." Is this still open?
>> >
>>
>> Yes, but have a look at petscplot (
>> https://github.com/jedbrown/petscplot/wiki/PETSc-Plot) for some ideas on
>> plotting.  (petscplot parses plain ASCII output and creates a few plot
>> styles, see https://github.com/jedbrown/tme-ice/blob/master/make.sh#L5for
>> more example invokations.)
>>
>>
>> On Thu, Jan 6, 2011 at 16:04, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>> > I guess one could store some kind of a catalog in the .info file that
>>> would allow reloading object by name in any order. Do you have a feeling
>>> that anybody other than me could be interested?
>>>
>>
>> In the long term, we need something better.  PETSc binary formats can't
>> carry enough metadata to be usefully plottable without user interaction.  I
>> think I'm still in favor of a SQLite backend (for simplicity and
>> versatility) indexing binary files written using MPI-IO.  Most visualization
>> and metadata tasks should be answerable in 1-line SQL queries instead of
>> custom page-long graph traversals.
>>
>> >>> On a different but not unrelated topic, Imtiaz who had started looking
>>> at the vtk IO had to leave LSU after our international services office
>>> screwed up his immigration paperwork... I am going to work with another
>>> student, Matt Kemp, on this project. Matt happens to also work part time for
>>> ANL and will be at MCS next week. I'll ask him to stop by and introduce
>>> himself to the group. He is also a good python / web programer so I was
>>> thinking of asking him to have a look at item 3 of the proposed project
>>> list: "Converting PetscLogViewPython() to generate JSON instead and
>>> developing Python parsers for quickly generating nice tables of performance
>>> details from runs or groups of runs." Is this still open?
>>>
>>
>> Yes, but have a look at petscplot (
>> https://github.com/jedbrown/petscplot/wiki/PETSc-Plot) for some ideas on
>> plotting.  (petscplot parses plain ASCII output and creates a few plot
>> styles, see https://github.com/jedbrown/tme-ice/blob/master/make.sh#L5for more example invokations.)
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>

-- 
Kai Germaschewski
Assistant Professor, Dept of Physics / Space Science Center
University of New Hampshire, Durham, NH 03824
office: Morse Hall 245E
phone:  +1-603-862-2912
fax: +1-603-862-2771
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110107/78bfed73/attachment.html>