[MOAB-dev] adding 'read and broadcast' to HDF5 reader
Iulian Grindeanu
iulian at mcs.anl.gov
Fri Oct 19 15:16:46 CDT 2012
Hello Rob,
I think that change has to happen in src/parallel/ReadParallel.cpp
I am not sure yet though, Tim would confirm that
Iulian
----- Original Message -----
| Tim knows all this but for the rest of the list, here's the short
| story:
| MOAB's HDF5 reader and writer have a problem on BlueGene where it will
| collectively read in initial conditions or write output, and run out
| of memory. This out-of-memory condition comes from MOAB doing all the
| right things -- using HDF5, using collective I/O -- but the MPI-IO
| library on Intrepid goes and consumes too much memory.
| I've got one approach to deal with the MPI-IO memory issue for writes.
| This approach would sort of work for the reads, but what is really
| needed is for rank 0 to do the read and broadcast the result to
| everyone.
| So, I'm looking for a little help understanding MOAB's read side of
| the code. Conceptually, all processes read the table of entities.
| A fairly small 'mbconvert' job will run out of memory:
| 512 nodes, 2048 processors:
| ======
| NODES=512
| CORES=$(($NODES * 4))
| cd /intrepid-fs0/users/robl/scratch/moab-test
| cqsub -t 15 -m vn -p SSSPP -e MPIRUN_LABEL=1:BG_COREDUMPONEXIT=1 \
| -n $NODES -c $CORES /home/robl/src/moab-svn/build/tools/mbconvert\
| -O CPUTIME -O PARALLEL_GHOSTS=3.0.1 -O PARALLEL=READ_PART \
| -O PARALLEL_RESOLVE_SHARED_ENTS -O PARTITION -t \
| -o CPUTIME -o PARALLEL=WRITE_PART
| /intrepid-fs0/users/tautges/persistent/meshes/2bricks/nogeom/64bricks_8mtet_ng_rib_${CORES}.h5m
| \
| /intrepid-fs0/users/robl/scratch/moab/8mtet_ng-${CORES}-out.h5m
| ======
| I'm kind of stumbling around ReadHDF5::load_file and
| ReadHDF5::load_file_partial trying to find a spot where a collection
| of tags are read into memory. I'd like to, instead of having all
| processors do the read, have just one processor read and then send the
| tag data to the other processors.
| First, do I remember the basic MOAB concept correctly: that early on
| every process reads the exact same tables out of the (in this case
| HDF5) file?
| If I want rank 0 to do all the work and send data to other ranks,
| where's the best place to slip that in? It's been a while since I did
| anything non-trivial in C++, so some of these data structures are kind
| of greek to me.
| thanks
| ==rob
| --
| Rob Latham
| Mathematics and Computer Science Division
| Argonne National Lab, IL USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/moab-dev/attachments/20121019/6665b3c6/attachment.html>
More information about the moab-dev
mailing list