[MOAB-dev] scaling on intrepid

Fri Jan 11 14:25:24 CST 2013

On Fri, Jan 11, 2013 at 12:53:24PM -0600, Tim Tautges wrote:
> Hi Rob,
>   Thanks for getting back to that.  Would be interesting to see
> whether the fixes to ROMIO work on 32k or 64k procs, that's where we
> weren't able to get any reads to run reliably.

Yup, that's where I'm headed ... but am stuck at 8k right now.

> For write, could you set the -o DEBUG_IO=2 option, and send me the
> output?  That will give us more info.  It seems like the code is
> getting hung up in the root-only read & bcast metadata stuff, but
> the debug output will confirm that.

OK, will queue that up and get back to you.

Is moab-dev an OK place for this kind of discussion, or should I take
this off-list?

==rob

> - tim
> 
> On 01/11/2013 10:09 AM, Rob Latham wrote:
> >I guess intrepid is old news and we should be looking at Mira now...
> >
> >Distressingly long ago, I was working with Tim and Jason on scaling
> >moab on intrepid.   What would happen with the stock MPI-IO library is
> >that MOAB would feed HDF5 a request, HDF5 would build up a complicated
> >MPI-IO workload, and the MPI-IO library on Intrepid would consume too
> >much memory and fail.
> >
> >I came up with a scheme to fit ROMIO parameters to the available
> >memory.  This scheme seems to be working ok for reads and I'm able to
> >scale up to 8k mpi processors without manually setting any hints
> >(except for the one that says "size this parameter automatically")
> >
> >The write step is presently causing me some grief, and it does not
> >immediately look like the write problem is MPI-IO.
> >
> >I was hoping I could run the experiment scenario by some moab folks as a sanity
> >check to make sure I am still driving MOAB in a correct and useful way.
> >
> >I've been working with mbconvert like this:
> >
> >NODES=2048
> >CORES=$(($NODES * 4))
> >
> ># because read-only home file system
> >cd /intrepid-fs0/users/robl/scratch/moab-test
> >
> >cqsub -t 30 -m vn -p BGQtools_esp -e ROMIO_HINTS=/home/robl/src/moab-svn/experiments/romio_hints:MPIRUN_LABEL=1:BG_COREDUMPONEXIT=1 \
> >         -n $NODES -c  $CORES /home/robl/src/moab-svn/build/tools/mbconvert\
> >         -O CPUTIME -O PARALLEL_GHOSTS=3.0.1 -O PARALLEL=READ_PART \
> >         -O PARALLEL_RESOLVE_SHARED_ENTS -O PARTITION -t \
> >         -o CPUTIME -o PARALLEL=WRITE_PART /intrepid-fs0/users/tautges/persistent/meshes/2bricks/nogeom/64bricks_8mtet_ng_rib_${CORES}.h5m \
> >         /intrepid-fs0/users/robl/scratch/moab/8mtet_ng-${CORES}-out.h5m
> >
> >[ Unsurprisingly (since it crashed in the middle of write)
> >/intrepid-fs0/users/robl/scratch/moab/8mtet_ng-8192-out.h5m exists,
> >but is only about 7k big and is not recognized as an HDF5 file. ]
> >
> >I'm using moab-svn r5930
> >
> >The program terminates with "killed with signal 15" , but the job has
> >only run for 20 minutes.  I asked for 30.  I'll resubmit for 60
> >minutes.
> >
> >I get this much output:
> >stdout[0] Parallel Read times:
> >stdout[0]   47.6997 PARALLEL READ PART
> >stdout[0]   0.284176 PARALLEL RESOLVE_SHARED_ENTS
> >stdout[0]   1.98761 PARALLEL EXCHANGE_GHOSTS
> >stdout[0]   1.79284 PARALLEL RESOLVE_SHARED_SETS
> >stdout[0]   50.0319 PARALLEL TOTAL
> >stdout[0]   real:   50.4s
> >stdout[0]   user:   50.4s
> >stdout[0]   system: 0.0s
> >
> >(that's some pretty awful performance: 50 seconds to read 317 MiB?  I'll get to
> >that next once I've got things actually performing at all.)
> >
> >I dumped 8192 core files and stitched them together with
> >coreprocessor.  The backtrace is not very helpful.
> >
> >- Everyone gets to this function:
> >moab::WriteHDF5::write_file_impl(char const*, bool,
> >   moab::FileOptions const&, unsigned int const*, int,
> >   std::vector<std::string, std::allocator<std::string> > const&,
> >   moab::TagInfo* const*, int, int)
> >
> >- all but ten make it to MPI_Bcast (I think it's the "send ID to every
> >   proc" bcast at WriteHDF5Parallel.cpp:1058).
> >
> >==rob
> >
> >
> 

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA