[MOAB-dev] adding 'read and broadcast' to HDF5 reader

Fri Oct 19 16:40:26 CDT 2012

----- Original Message -----

| On Fri, Oct 19, 2012 at 03:16:46PM -0500, Iulian Grindeanu wrote:

| > Hello Rob,

| > I think that change has to happen in src/parallel/ReadParallel.cpp

| > I am not sure yet though, Tim would confirm that

| Interesting. What is this POPT_BCAST option?

| I don't want to change all of moab into 'rank 0 does i/o' --

| obviously that's not going to scale.

this is read/broadcast option, which as you said, does not scale. 

| But for some of these inputs we are looking at datasets that are not

| all that big:

| /intrepid-fs0/users/tautges/persistent/meshes/2bricks/nogeom/64bricks_8mtet_ng_rib_2048.h5m

| has a 7995686 x 4 "connectivity" dataset, but I know from talking with

| jason that you are only pulling one column out of this array, so 61

| MiBytes.

These are the connectivity arrays for ~ 8 million tetra elements. We do not read them on all procs, every processor needs to find out first 
what subset of elements has to read; the set information should be then enough to decide what portion of the connectivity array needs to be 
read on each processor. 

Iulian 

| ==rob

| >

| > ----- Original Message -----

| >

| > | Tim knows all this but for the rest of the list, here's the short

| > | story:

| >

| >

| > | MOAB's HDF5 reader and writer have a problem on BlueGene where it
| > | will

| >

| > | collectively read in initial conditions or write output, and run
| > | out

| >

| > | of memory. This out-of-memory condition comes from MOAB doing all
| > | the

| >

| > | right things -- using HDF5, using collective I/O -- but the MPI-IO

| >

| > | library on Intrepid goes and consumes too much memory.

| >

| >

| > | I've got one approach to deal with the MPI-IO memory issue for
| > | writes.

| >

| > | This approach would sort of work for the reads, but what is really

| >

| > | needed is for rank 0 to do the read and broadcast the result to

| >

| > | everyone.

| >

| >

| > | So, I'm looking for a little help understanding MOAB's read side
| > | of

| >

| > | the code. Conceptually, all processes read the table of entities.

| >

| >

| > | A fairly small 'mbconvert' job will run out of memory:

| >

| >

| > | 512 nodes, 2048 processors:

| >

| >

| > | ======

| >

| > | NODES=512

| >

| > | CORES=$(($NODES * 4))

| >

| > | cd /intrepid-fs0/users/robl/scratch/moab-test

| >

| >

| > | cqsub -t 15 -m vn -p SSSPP -e MPIRUN_LABEL=1:BG_COREDUMPONEXIT=1 \

| >

| > | -n $NODES -c $CORES /home/robl/src/moab-svn/build/tools/mbconvert\

| >

| > | -O CPUTIME -O PARALLEL_GHOSTS=3.0.1 -O PARALLEL=READ_PART \

| >

| > | -O PARALLEL_RESOLVE_SHARED_ENTS -O PARTITION -t \

| >

| > | -o CPUTIME -o PARALLEL=WRITE_PART

| > | /intrepid-fs0/users/tautges/persistent/meshes/2bricks/nogeom/64bricks_8mtet_ng_rib_${CORES}.h5m

| > | \

| >

| > | /intrepid-fs0/users/robl/scratch/moab/8mtet_ng-${CORES}-out.h5m

| >

| > | ======

| >

| >

| > | I'm kind of stumbling around ReadHDF5::load_file and

| >

| > | ReadHDF5::load_file_partial trying to find a spot where a
| > | collection

| >

| > | of tags are read into memory. I'd like to, instead of having all

| >

| > | processors do the read, have just one processor read and then send
| > | the

| >

| > | tag data to the other processors.

| >

| >

| > | First, do I remember the basic MOAB concept correctly: that early
| > | on

| >

| > | every process reads the exact same tables out of the (in this case

| >

| > | HDF5) file?

| >

| >

| > | If I want rank 0 to do all the work and send data to other ranks,

| >

| > | where's the best place to slip that in? It's been a while since I
| > | did

| >

| > | anything non-trivial in C++, so some of these data structures are
| > | kind

| >

| > | of greek to me.

| >

| >

| > | thanks

| >

| > | ==rob

| >

| >

| > | --

| >

| > | Rob Latham

| >

| > | Mathematics and Computer Science Division

| >

| > | Argonne National Lab, IL USA

| >

| >

| >

| --

| Rob Latham

| Mathematics and Computer Science Division

| Argonne National Lab, IL USA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/moab-dev/attachments/20121019/c32e2f50/attachment.html>