[MOAB-dev] adding 'read and broadcast' to HDF5 reader

Fri Oct 19 15:33:17 CDT 2012

On Fri, Oct 19, 2012 at 03:16:46PM -0500, Iulian Grindeanu wrote:
> Hello Rob, 
> I think that change has to happen in src/parallel/ReadParallel.cpp 
> I am not sure yet though, Tim would confirm that 

Interesting.  What is this POPT_BCAST option?  

I don't want to change all of moab into 'rank 0 does i/o'  --
obviously that's not going to scale.  

But for some of these inputs we are looking at datasets that are not
all that big:

/intrepid-fs0/users/tautges/persistent/meshes/2bricks/nogeom/64bricks_8mtet_ng_rib_2048.h5m
has a 7995686 x 4 "connectivity" dataset, but I know from talking with
jason that you are only pulling one column out of this array, so 61
MiBytes.  

==rob

> 
> ----- Original Message -----
> 
> | Tim knows all this but for the rest of the list, here's the short
> | story:
> 
> 
> | MOAB's HDF5 reader and writer have a problem on BlueGene where it will
> 
> | collectively read in initial conditions or write output, and run out
> 
> | of memory. This out-of-memory condition comes from MOAB doing all the
> 
> | right things -- using HDF5, using collective I/O -- but the MPI-IO
> 
> | library on Intrepid goes and consumes too much memory.
> 
> 
> | I've got one approach to deal with the MPI-IO memory issue for writes.
> 
> | This approach would sort of work for the reads, but what is really
> 
> | needed is for rank 0 to do the read and broadcast the result to
> 
> | everyone.
> 
> 
> | So, I'm looking for a little help understanding MOAB's read side of
> 
> | the code. Conceptually, all processes read the table of entities.
> 
> 
> | A fairly small 'mbconvert' job will run out of memory:
> 
> 
> | 512 nodes, 2048 processors:
> 
> 
> | ======
> 
> | NODES=512
> 
> | CORES=$(($NODES * 4))
> 
> | cd /intrepid-fs0/users/robl/scratch/moab-test
> 
> 
> | cqsub -t 15 -m vn -p SSSPP -e MPIRUN_LABEL=1:BG_COREDUMPONEXIT=1 \
> 
> | -n $NODES -c $CORES /home/robl/src/moab-svn/build/tools/mbconvert\
> 
> | -O CPUTIME -O PARALLEL_GHOSTS=3.0.1 -O PARALLEL=READ_PART \
> 
> | -O PARALLEL_RESOLVE_SHARED_ENTS -O PARTITION -t \
> 
> | -o CPUTIME -o PARALLEL=WRITE_PART
> | /intrepid-fs0/users/tautges/persistent/meshes/2bricks/nogeom/64bricks_8mtet_ng_rib_${CORES}.h5m
> | \
> 
> | /intrepid-fs0/users/robl/scratch/moab/8mtet_ng-${CORES}-out.h5m
> 
> | ======
> 
> 
> | I'm kind of stumbling around ReadHDF5::load_file and
> 
> | ReadHDF5::load_file_partial trying to find a spot where a collection
> 
> | of tags are read into memory. I'd like to, instead of having all
> 
> | processors do the read, have just one processor read and then send the
> 
> | tag data to the other processors.
> 
> 
> | First, do I remember the basic MOAB concept correctly: that early on
> 
> | every process reads the exact same tables out of the (in this case
> 
> | HDF5) file?
> 
> 
> | If I want rank 0 to do all the work and send data to other ranks,
> 
> | where's the best place to slip that in? It's been a while since I did
> 
> | anything non-trivial in C++, so some of these data structures are kind
> 
> | of greek to me.
> 
> 
> | thanks
> 
> | ==rob
> 
> 
> | --
> 
> | Rob Latham
> 
> | Mathematics and Computer Science Division
> 
> | Argonne National Lab, IL USA
> 
> 
> 

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA