[MOAB-dev] adding 'read and broadcast' to HDF5 reader

Fri Oct 19 15:37:00 CDT 2012

On Fri, Oct 19, 2012 at 01:30:46PM -0700, Mark Miller wrote:
> Not sure how much this helps but newest versions of HDF5 library support
> reading a file into memory (one I/O operation) and then proc 0 can
> broadcast that buffer (single broadcast) and other procs can 'open' that
> buffer of bytes as an HDF5 file. So, in theory, with minimal changes to
> MOAB, its possible to 'spoof' MOAB into thinking each processor did the
> read anyways. One problem; I think this feature works for whole files
> only. So, if the tables MOAB needs to read this way are self contained
> in a single file, it could work. Otherwise, its not much help...
> 
> This is the 'file image' feature of HDF5.

I'll take a look at that approach, but on BlueGene pulling in an
entire file may not be a viable option.  These processors only need
one piece of a larger file.    In virtual node mode I only have 512
MiB in total to work with.

==rob

> Mark
> 
> On Fri, 2012-10-19 at 15:16 -0500, Iulian Grindeanu wrote:
> > Hello Rob,
> > I think that change has to happen in src/parallel/ReadParallel.cpp
> > I am not sure yet though, Tim would confirm that
> > 
> > Iulian
> > 
> > 
> > ______________________________________________________________________
> >         Tim knows all this but for the rest of the list, here's the
> >         short story:
> >         
> >         MOAB's HDF5 reader and writer have a problem on BlueGene where
> >         it will
> >         collectively read in initial conditions or write output, and
> >         run out
> >         of memory.  This out-of-memory condition comes from MOAB doing
> >         all the
> >         right things -- using HDF5, using collective I/O -- but the
> >         MPI-IO
> >         library on Intrepid goes and consumes too much memory.
> >         
> >         I've got one approach to deal with the MPI-IO memory issue for
> >         writes.
> >         This approach would sort of work for the reads, but what is
> >         really
> >         needed is for rank 0 to do the read and broadcast the result
> >         to
> >         everyone.  
> >         
> >         So, I'm looking for a little help understanding MOAB's read
> >         side of
> >         the code.  Conceptually, all processes read the table of
> >         entities. 
> >         
> >         A fairly small 'mbconvert' job will run out of memory: 
> >         
> >         512 nodes, 2048 processors:
> >         
> >         ======
> >         NODES=512
> >         CORES=$(($NODES * 4))
> >         cd /intrepid-fs0/users/robl/scratch/moab-test
> >         
> >         cqsub -t 15 -m vn -p SSSPP -e
> >         MPIRUN_LABEL=1:BG_COREDUMPONEXIT=1 \
> >                 -n $NODES -c
> >          $CORES /home/robl/src/moab-svn/build/tools/mbconvert\
> >                 -O CPUTIME -O PARALLEL_GHOSTS=3.0.1 -O
> >         PARALLEL=READ_PART \
> >                 -O PARALLEL_RESOLVE_SHARED_ENTS -O PARTITION -t \
> >                 -o CPUTIME -o
> >         PARALLEL=WRITE_PART /intrepid-fs0/users/tautges/persistent/meshes/2bricks/nogeom/64bricks_8mtet_ng_rib_${CORES}.h5m \
> >         
> >          /intrepid-fs0/users/robl/scratch/moab/8mtet_ng-${CORES}-out.h5m
> >         ======
> >         
> >         I'm kind of stumbling around  ReadHDF5::load_file and
> >         ReadHDF5::load_file_partial trying to find a spot where a
> >         collection
> >         of tags are read into memory.  I'd like to, instead of having
> >         all
> >         processors do the read, have just one processor read and then
> >         send the
> >         tag data to the other processors.
> >         
> >         First, do I remember the basic MOAB concept correctly: that
> >         early on
> >         every process reads the exact same tables out of the (in this
> >         case
> >         HDF5) file?  
> >         
> >         If I want rank 0 to do all the work and send data to other
> >         ranks,
> >         where's the best place to slip that in?  It's been a while
> >         since I did
> >         anything non-trivial in C++, so some of these data structures
> >         are kind
> >         of greek to me.
> >         
> >         thanks
> >         ==rob
> >         
> >         -- 
> >         Rob Latham
> >         Mathematics and Computer Science Division
> >         Argonne National Lab, IL USA
> > 
> > 

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA