[MOAB-dev] [Fathom] Problems reading large meshes in MOAB in parallel

Tue Mar 23 07:56:12 CDT 2010

Let's talk about this at the fathom mtg today, along with the BGP stuff Alvaro is running.

- tim

Dmitry Karpeev wrote:
> If anybody has any idea what may be going on here (described in detail below),
> I'd really appreciate the insight.  I am trying to summarize this as
> concisely as possible,
> yet provide sufficient detail.  If more info is needed, I'm ready to offer it.
> 
> I have been experiencing trouble with reading large meshes into MOAB
> in parallel.
> Since there are several machines involved (Linux laptop, ANL clusters:
> cosmea, fusion),
> several code revisions, and several read modes (bcast_delete,
> read_part, read_delete, bcast),
> which generate various problems, I want to focuse on a limited subset,
> to describe the observed
> performance and isolate the problems.
> 
> Machine(s):
> I have temporarily abandoned running performance benchmarks on fusion, since
> that machine has a broken autoconf system (libtool is missing) and
> building updated revisions
> there is time consuming and not easily automated (a configure script
> has to be prepared elsewhere,
> transferred to fusion, etc).  This leaves cosmea as the only parallel
> machine I'm using, at least for now.
> 
> Code:
> Also, I am currently using codes based on two MOAB revisions:
> "stable", based on an old rev. 3556 and "unstable", based on rev. 3668.
> The rationale for using two codes is that I used to have problems with hanging
> 
> Read modes:
> I'm focusing on:
>  -  bcast_delete, as the most memory intensive (procs have to be able
> to allocate
>     the entire mesh, receive it from the root proc and then delete the
> nonlocal portions)
>  -  read_part, as the most scalable.
> 
> 
> Meshes:
> I'm using two different meshes -- the "small" and the "large":
>   - "small" is the tjunc6 mesh, partitioned into 64 pieces (I haven't
> run on more than 64 procs)
>   - "large" is the 64bricks_1mhex mesh, partitioned into 1024 pieces.
> Both partitions have been generated using mbzoltan with Alvaro's help.
> A brief note: larger meshes, such as 64bricks_4mhex and (even more so)
> 64bricks_16mhex
> cause a bad_alloc(), so I'm not even attempting to read them at the moment.
> 
> I'm running both "stable" and "unstable" version on tjun6_64.h5m and
> 64bricks_1mhex_1024.h5m
> with the equivalent of mbparallelcomm_test 0 3 1 <meshfile>
> PARALLEL_PARTITION (bcast_delete)
> or mbparallelcomm_test -2 3 1 <mesfile> PARALLEL_PARTITION (read_part).
> In particular, I'm not looking at the effects of using mpi-io and other options.
> 
> The results are these: both "stable" and "unstable" codes have no
> problem reading tjiunc6_64
> in any mode, but they both fail on 64bricks_1mhex_1024.  Below is the
> description of the failure,
> which depends on the read mode:
> 
> =============================================================================
> bcast_delete:
> Essentially the same type of failure appears to occur when using both
> stable and unstable codes,
> running on 4 or 8 procs, with, the equivalent of  following command
> (modulo the location of files):
> mpiexec -np 4 mbparallelcomm_test 0 3 1 64bricks_1mhex_1024.h5m
> PARALLEL_PARTITION
> It appears that the mesh is read in, but the resolution of shared
> entities causes a problem.
>>From the rather cryptic output below, I'm not sure what sort of
> problem this may be.
> ------------------------------------------------------------------------------------------------------------------------
> Using MPI from /gfs/software/software/mvapich2/1.0-2008-02-06-intel-shlib
> Running on 1 nodes
> 4 cores per node, one MPI process per core
> for a total of 4 MPI processes
> .................................................
> PBS nodefile:
> n018
> n018
> n018
> n018
> .................................................
> mpd ring nodefile:
> n018:4
> .................................................
> Running mpdboot ...
> done
> Running mpdtrace ...
> n018
> done
> Running mpdringtest ...
> time for 1 loops = 0.000300884246826 seconds
> done
> Using MPIEXEC_CMD=/gfs/software/software/mvapich2/1.0-2008-02-06-intel-shlib/bin/mpiexec
> -n 4
> Commencing parallel run 64bricks_1mhex_1024 of executable
> /home/karpeev/fathom/moab/stable/build/parallel/mbparallelcomm_test
> Couldn't read mesh; error message:
> Failed in step PARALLEL RESOLVE_SHARED_ENTS
> 
> Failed to find new entity in send list.
> Trouble getting remote handles when packing entities.
> Failed to pack entities from a sequence.
> Packing entities failed.
> Trouble resolving shared entity remote handles.
> 
> application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2Finished
> ........................................
> Running mpdallexit ...
> done
> 
> ===========================================================================================
> read_part:
> The "unstable" code generates essentially the same error here as the
> one that occurs with bcast_delete.
> The "stable" coce, however, doesn't fail outright, but reads 0
> entities on all procs:
> --------------------------------------------------------------------------------------------------------------------------------------------------
> Using MPI from /gfs/software/software/mvapich2/1.0-2008-02-06-intel-shlib
> Running on 1 nodes
> 4 cores per node, one MPI process per core
> for a total of 4 MPI processes
> .................................................
> PBS nodefile:
> n032
> n032
> n032
> n032
> .................................................
> mpd ring nodefile:
> n032:4
> .................................................
> Running mpdboot ...
> done
> Running mpdtrace ...
> n032
> done
> Running mpdringtest ...
> time for 1 loops = 0.000304937362671 seconds
> done
> Using MPIEXEC_CMD=/gfs/software/software/mvapich2/1.0-2008-02-06-intel-shlib/bin/mpiexec
> -n 4
> Commencing parallel run 64bricks_1mhex_1024 of executable
> /home/karpeev/fathom/moab/stable/build/parallel/mbparallelcomm_test
> Proc 1 iface entities:
>     0 0d iface entities.
>     0 1d iface entities.Proc 0 iface entities:
>     0 0d iface entities.
>     0 1d iface entities.
>     0 2d iface entities.
>     0 3d iface entities.
>     (0 verts adj to other iface ents)
> Proc 2 iface entities:
>     0 0d iface entities.
>     0 1d iface entities.
>     0 2d iface entities.
>     0 3d iface entities.
>     (0 verts adj to other iface ents)
> Proc 3 iface entities:
>     0 0d iface entities.
>     0 1d iface entities.
>     0 2d iface entities.
>     0 3d iface entities.
>     (0 verts adj to other iface ents)
> 
>     0 2d iface entities.
>     0 3d iface entities.
>     (0 verts adj to other iface ents)
> Proc 0: Success.
> Proc 1: Success.
> Proc 2: Success.
> Proc 3: Success.
> Finished ........................................
> Running mpdallexit ...
> done
> ============================================================================
> My guess is that the "unstable" code is trying to do the "right" thing
> in both bcast_delete and read_part
> cases, but fails (e.g., running out of memory?).  The "stable" code
> does something similar that in the bcast_delete
> case, but behaves incorrectly (reads zero entity sets) in the read_part case.
> 
> Anybody have any idea about what's going on?
> 
> Thanks!
> Dmitry.
> _______________________________________________
> Fathom mailing list
> Fathom at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/fathom
> 

-- 
================================================================
"You will keep in perfect peace him whose mind is
   steadfast, because he trusts in you."               Isaiah 26:3

              Tim Tautges            Argonne National Laboratory
          (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
          phone: (608) 263-8485      1500 Engineering Dr.
            fax: (608) 263-4499      Madison, WI 53706