[MOAB-dev] DMMoabLoadFromFile() - parallel performance issue

Mon Dec 14 18:24:39 CST 2015

James,

I haven't locally tested your test case yet but those numbers look
suspicious. We have seen bad slowdown of the HDF5 I/O until 4000 procs
for loading mesh files that are over O(100)Gb. Even there, the plateau
in timing still happens around couple of minutes and not 45 minutes.
So the simple test should certainly not see bad performance
degradation.

DMMoabLoadFromFile internally just calls moab->load_file, which is
primarily a load factory that invokes ReadHDF5. So the bulk of the
behavior can be reduced to ReadHDF5 and HDF5 itself. So here are some
questions to better understand what you are doing here.

1) What version of HDF5 are you using and how is it configured ?
Debug/optimized ?
2) Is MOAB configured with optimized mode ?
3) What compiler and MPI version are you using on your machine so that
we can better understand if its a compiler flag issue.
4) What are your machine characteristics ? Cluster or large scale
machine ? GPFS or Lustre or some other base system ?
5) Can you work with a MOAB branch ? We have a PR that is currently
being reviewed, which should give you fine grained profiling data
during the read. Take a look at [1].

Let us know some of these answers. Meanwhile, we will also try to use
your test case to check the I/O performance and see if your results
are replicable to some degree.

Vijay

[1] https://bitbucket.org/fathomteam/moab/pull-requests/170/genlargemesh-corrections/diff

On Mon, Dec 14, 2015 at 6:25 PM, WARNER, JAMES E. (LARC-D309)
<james.e.warner at nasa.gov> wrote:
> Hi Vijay & Iulian,
>
> Hope you are doing well! I have a question regarding some strange behavior
> we’re seeing with the DMMoabLoadFromFile() function…
>
> After doing some recent profiling of our MOAB-based finite element code, we
> noticed that we are spending a disproportionate amount of CPU time within
> the DMMoabLoadFromFile() function, which gets slower / remains constant as
> we increase the number of processors. We also recently attempted a
> scalability test with ~30M FEM nodes  on 500 processors which hung in
> DMMoabLoadFromFile() for about 45 minutes before we killed the job. We then
> re-ran the test on one processor and it made it through successfully in
> several seconds.
>
> To reproduce the problem we’re seeing, we wrote a test case (attached here)
> that simply loads a smaller mesh with approximately 16K nodes and prints the
> run time. When I run the code on an increasing number of processors, I get
> something like:
>
> NP=1: Time to read file: 0.0416839 [sec.]
> NP=2: Time to read file: 1.42497 [sec.]
> NP=4: Time to read file: 1.13678 [sec.]
> NP=8: Time to read file: 1.0475 [sec.]
> …
>
> If it is relevant/helpful – we are using the mbpart tool to partition the
> mesh.  Do you have any ideas why we are not seeing scalability here? Any
> thoughts/tips would be appreciated! Let me know if you would like any more
> information.
>
> Thanks,
> Jim
>
>
>