[MOAB-dev] adding 'read and broadcast' to HDF5 reader

Fri Oct 19 16:31:40 CDT 2012

----- Original Message -----

| On Fri, Oct 19, 2012 at 01:30:46PM -0700, Mark Miller wrote:

| > Not sure how much this helps but newest versions of HDF5 library
| > support

| > reading a file into memory (one I/O operation) and then proc 0 can

| > broadcast that buffer (single broadcast) and other procs can 'open'
| > that

| > buffer of bytes as an HDF5 file. So, in theory, with minimal changes
| > to

| > MOAB, its possible to 'spoof' MOAB into thinking each processor did
| > the

| > read anyways. One problem; I think this feature works for whole
| > files

| > only. So, if the tables MOAB needs to read this way are self
| > contained

| > in a single file, it could work. Otherwise, its not much help...

| >

| > This is the 'file image' feature of HDF5.

| I'll take a look at that approach, but on BlueGene pulling in an

| entire file may not be a viable option. These processors only need

| one piece of a larger file. In virtual node mode I only have 512

| MiB in total to work with.

| ==rob

My assumption is that the "file image" feature can be used for a portion of the file, obviously there are files that do not fit on one 
proc (or on 512 Mb) 
So Mark is probably suggesting that the "header/tags/set" part of hdf5 read to happen on one proc, and the rest of the processors "think" that they read it directly from file, while in fact they are reading it from the "buffer" (file image)? . Am I wrong in my understanding? 
Right now, hdf5 reader in moab has to read on each processor the header + something more, like some sets information. I am not sure what exactly is 
read by each processor, at a minimum. I will look into the code and try to figure it out. 

Iulian 

| > Mark

| >

| > On Fri, 2012-10-19 at 15:16 -0500, Iulian Grindeanu wrote:

| > > Hello Rob,

| > > I think that change has to happen in src/parallel/ReadParallel.cpp

| > > I am not sure yet though, Tim would confirm that

| > >

| > > Iulian

| > >

| > >

| > > ______________________________________________________________________

| > > Tim knows all this but for the rest of the list, here's the

| > > short story:

| > >

| > > MOAB's HDF5 reader and writer have a problem on BlueGene where

| > > it will

| > > collectively read in initial conditions or write output, and

| > > run out

| > > of memory. This out-of-memory condition comes from MOAB doing

| > > all the

| > > right things -- using HDF5, using collective I/O -- but the

| > > MPI-IO

| > > library on Intrepid goes and consumes too much memory.

| > >

| > > I've got one approach to deal with the MPI-IO memory issue for

| > > writes.

| > > This approach would sort of work for the reads, but what is

| > > really

| > > needed is for rank 0 to do the read and broadcast the result

| > > to

| > > everyone.

| > >

| > > So, I'm looking for a little help understanding MOAB's read

| > > side of

| > > the code. Conceptually, all processes read the table of

| > > entities.

| > >

| > > A fairly small 'mbconvert' job will run out of memory:

| > >

| > > 512 nodes, 2048 processors:

| > >

| > > ======

| > > NODES=512

| > > CORES=$(($NODES * 4))

| > > cd /intrepid-fs0/users/robl/scratch/moab-test

| > >

| > > cqsub -t 15 -m vn -p SSSPP -e

| > > MPIRUN_LABEL=1:BG_COREDUMPONEXIT=1 \

| > > -n $NODES -c

| > > $CORES /home/robl/src/moab-svn/build/tools/mbconvert\

| > > -O CPUTIME -O PARALLEL_GHOSTS=3.0.1 -O

| > > PARALLEL=READ_PART \

| > > -O PARALLEL_RESOLVE_SHARED_ENTS -O PARTITION -t \

| > > -o CPUTIME -o

| > > PARALLEL=WRITE_PART
| > > /intrepid-fs0/users/tautges/persistent/meshes/2bricks/nogeom/64bricks_8mtet_ng_rib_${CORES}.h5m
| > > \

| > >

| > > /intrepid-fs0/users/robl/scratch/moab/8mtet_ng-${CORES}-out.h5m

| > > ======

| > >

| > > I'm kind of stumbling around ReadHDF5::load_file and

| > > ReadHDF5::load_file_partial trying to find a spot where a

| > > collection

| > > of tags are read into memory. I'd like to, instead of having

| > > all

| > > processors do the read, have just one processor read and then

| > > send the

| > > tag data to the other processors.

| > >

| > > First, do I remember the basic MOAB concept correctly: that

| > > early on

| > > every process reads the exact same tables out of the (in this

| > > case

| > > HDF5) file?

| > >

| > > If I want rank 0 to do all the work and send data to other

| > > ranks,

| > > where's the best place to slip that in? It's been a while

| > > since I did

| > > anything non-trivial in C++, so some of these data structures

| > > are kind

| > > of greek to me.

| > >

| > > thanks

| > > ==rob

| > >

| > > --

| > > Rob Latham

| > > Mathematics and Computer Science Division

| > > Argonne National Lab, IL USA

| > >

| > >

| --

| Rob Latham

| Mathematics and Computer Science Division

| Argonne National Lab, IL USA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/moab-dev/attachments/20121019/b44d444c/attachment.html>