[MOAB-dev] adding 'read and broadcast' to HDF5 reader
Rob Latham
robl at mcs.anl.gov
Fri Oct 19 15:37:00 CDT 2012
On Fri, Oct 19, 2012 at 01:30:46PM -0700, Mark Miller wrote:
> Not sure how much this helps but newest versions of HDF5 library support
> reading a file into memory (one I/O operation) and then proc 0 can
> broadcast that buffer (single broadcast) and other procs can 'open' that
> buffer of bytes as an HDF5 file. So, in theory, with minimal changes to
> MOAB, its possible to 'spoof' MOAB into thinking each processor did the
> read anyways. One problem; I think this feature works for whole files
> only. So, if the tables MOAB needs to read this way are self contained
> in a single file, it could work. Otherwise, its not much help...
>
> This is the 'file image' feature of HDF5.
I'll take a look at that approach, but on BlueGene pulling in an
entire file may not be a viable option. These processors only need
one piece of a larger file. In virtual node mode I only have 512
MiB in total to work with.
==rob
> Mark
>
> On Fri, 2012-10-19 at 15:16 -0500, Iulian Grindeanu wrote:
> > Hello Rob,
> > I think that change has to happen in src/parallel/ReadParallel.cpp
> > I am not sure yet though, Tim would confirm that
> >
> > Iulian
> >
> >
> > ______________________________________________________________________
> > Tim knows all this but for the rest of the list, here's the
> > short story:
> >
> > MOAB's HDF5 reader and writer have a problem on BlueGene where
> > it will
> > collectively read in initial conditions or write output, and
> > run out
> > of memory. This out-of-memory condition comes from MOAB doing
> > all the
> > right things -- using HDF5, using collective I/O -- but the
> > MPI-IO
> > library on Intrepid goes and consumes too much memory.
> >
> > I've got one approach to deal with the MPI-IO memory issue for
> > writes.
> > This approach would sort of work for the reads, but what is
> > really
> > needed is for rank 0 to do the read and broadcast the result
> > to
> > everyone.
> >
> > So, I'm looking for a little help understanding MOAB's read
> > side of
> > the code. Conceptually, all processes read the table of
> > entities.
> >
> > A fairly small 'mbconvert' job will run out of memory:
> >
> > 512 nodes, 2048 processors:
> >
> > ======
> > NODES=512
> > CORES=$(($NODES * 4))
> > cd /intrepid-fs0/users/robl/scratch/moab-test
> >
> > cqsub -t 15 -m vn -p SSSPP -e
> > MPIRUN_LABEL=1:BG_COREDUMPONEXIT=1 \
> > -n $NODES -c
> > $CORES /home/robl/src/moab-svn/build/tools/mbconvert\
> > -O CPUTIME -O PARALLEL_GHOSTS=3.0.1 -O
> > PARALLEL=READ_PART \
> > -O PARALLEL_RESOLVE_SHARED_ENTS -O PARTITION -t \
> > -o CPUTIME -o
> > PARALLEL=WRITE_PART /intrepid-fs0/users/tautges/persistent/meshes/2bricks/nogeom/64bricks_8mtet_ng_rib_${CORES}.h5m \
> >
> > /intrepid-fs0/users/robl/scratch/moab/8mtet_ng-${CORES}-out.h5m
> > ======
> >
> > I'm kind of stumbling around ReadHDF5::load_file and
> > ReadHDF5::load_file_partial trying to find a spot where a
> > collection
> > of tags are read into memory. I'd like to, instead of having
> > all
> > processors do the read, have just one processor read and then
> > send the
> > tag data to the other processors.
> >
> > First, do I remember the basic MOAB concept correctly: that
> > early on
> > every process reads the exact same tables out of the (in this
> > case
> > HDF5) file?
> >
> > If I want rank 0 to do all the work and send data to other
> > ranks,
> > where's the best place to slip that in? It's been a while
> > since I did
> > anything non-trivial in C++, so some of these data structures
> > are kind
> > of greek to me.
> >
> > thanks
> > ==rob
> >
> > --
> > Rob Latham
> > Mathematics and Computer Science Division
> > Argonne National Lab, IL USA
> >
> >
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the moab-dev
mailing list