[MOAB-dev] Help for migration of cells between MPI subdomains with moab

Tue Jun 15 10:56:41 CDT 2021

Hello Olivier,
In ParCommGraph we use pack/unpack methods;

for example in send mesh parts

....
        ents.merge( verts );
        ParallelComm::Buffer* buffer = new ParallelComm::Buffer( ParallelComm::INITIAL_BUFF_SIZE );
        buffer->reset_ptr( sizeof( int ) );
        rval = pco->pack_buffer( ents, false, true, false, -1, buffer );
...
     ierr = MPI_Isend( buffer->mem_ptr, size_pack, MPI_CHAR, receiver_proc, 2, jcomm,
                          &sendReqs[indexReq] );  // we have to use joint communicator

in receive mesh:
               ierr = MPI_Recv( buffer->mem_ptr, size_pack, MPI_CHAR, sender1, 2, jcomm, &status );
            if( 0 != ierr )
            {
...
 rval = pco->unpack_buffer( buffer->buff_ptr, false, -1, -1, L1hloc, L1hrem, L1p, L2hloc, L2hrem, L2p,
                                       entities_vec );
...

So this is what I think it should happen for a particular set of entities

I will work on an example

ParCommGraph actually encapsulates the "communication graph", but we never used it in this "limited" scenario, only for complete  meshes

It would be cool to expand the capability; in the sense that maybe at some time step you have a global idea of what need to go and  where, and use the infrastructure to build the " graph " ;   this graph info is mostly local, each receiver needs to know only the senders it receives from, and each sender need to know only the receivers it needs to send to;  for our full meshes that info is decided by the partitioner (Zoltan methods, or trivial partition)

In your case, the decision is made by your solver or some other criteria

Iulian

________________________________________
From: Olivier Jamond <olivier.jamond at cea.fr>
Sent: Monday, June 14, 2021 9:35 AM
To: Grindeanu, Iulian R.; Vijay S. Mahadevan
Cc: moab-dev at mcs.anl.gov
Subject: Re: [MOAB-dev] Help for migration of cells between MPI subdomains with moab

Ok, I am ok with this global procedure for migrating cells, which is:
1) remove the ghosts and // tags
2) do the migration: pack -> delete -> MPI send/recv -> unpack
3) recompute the // tags and ghost layers

But, at this time I do not manage to succeed with the step 2... I
struggled with different functions ('high-level' send/recv_entities,
send_recv_entities, 'low-level' pack/unpack_entities), but without
having it to work...

For the global number of entities, I think that 2 billions is ok for a
couple of years, but we plan to do more... What should be done in this case?

Many thanks,
Olivier

On 14/06/2021 16:11, Grindeanu, Iulian R. wrote:
> Yes Olivier,
> Initially you have one distribution of the entities, after a parallel read, or creation of the mesh in parallel, for example as in GenLargeMesh example;
>
> In order to "resolve" the parallel model, we have to call the "resove_shared_ents" method, either as part of the read, or after creation of the mesh in memory;
>
> In this initial stage, mesh is distributed by entities, for example by using the PARALLEL_PARTITION tag; vertices are already duplicated, because the cells definition needs them; "resolve_shared_ents" will simply identify what vertices are shared between processes, and will set the parallel tags on them (and on lower dim entities if they exist); Those tags identify for example the remote entity handle for the shared vertex on the other processes;
>
> If you need ghost layers, at this initial stage, another round of communications is needed between processes, to create those ghost layers on each process; all those ghost layers need to have their parallel tags correctly defined for a correct "state" of the mesh.
>
> After this initial state, you decide to move a few elements, from process A to process B, for example; Or maybe you need multiple moves;
> Some of these elements were maybe ghosts already, so maybe you do not have to move them, maybe you already have them instanced on the new processes; Keeping track of everything will be error prone;
>
> This is why I suggest starting from a clean state; remove all the parallel tags, and all ghost elements; do the migration from A to B, which means pack elements from A, delete what is not needed on A anymore, send to B using MPI, unpack on B; Then recompute all the parallel tags, and create new ghost layers if needed;
>
>
> parallel tags are complicated because some entities can be multishared (ghost cells can  be multishared too) . Simply shared info is stored in dense tags, multishared is stored in sparse tags.  We also encountered problems when partitions are "thin" so if you need multiple ghost layers they may encroach in neighboring partitions that were not "neighbors" initially;
>
>
> Another question is about how many global entities do you have? GenLargeMesh assumes you can create in memory more than 2 billion vertices, in which case using the GLOBAL_ID tag for tracking globally is not enough; In that case, in order to track vertices correctly between processes that share them, we have to use another tag that can handle 2^64 vertices :)
>
> I am thinking that I will write an example describing this process; we may need to change the API, for example to offer a method to clean all parallel tags in memory.
>
> Iulian
>
>
> ________________________________________
> From: Olivier Jamond <olivier.jamond at cea.fr>
> Sent: Monday, June 14, 2021 2:38 AM
> To: Grindeanu, Iulian R.; Vijay S. Mahadevan
> Cc: moab-dev at mcs.anl.gov
> Subject: Re: [MOAB-dev] Help for migration of cells between MPI subdomains with moab
>
> Hello Iulian,
>
> I am not sure to understand perfectly what you mean: I feel that my
> problem is not just about the tags for the parallel status, but I also
> need to moves entities between processes in the first place. What I have
> in mind, as you suggest, is that after migrating the entities between
> processes, I would have to call 'resolve_shared_ents' to update  the
> parallel status tags (assuming that the vertices global ids have been
> preserved during migration...). Does this make sense?
>
> Yes, in the parts of the numerical model where a numerical method
> involving stencils (such as finite-volumes) is used, the ghost layers
> will have to be recomputed at least "around" the cells that have been
> migrated.
>
> Thanks a lot for helping!
>
> Best regards,
> Olivier
>
> On 12/06/2021 02:37, Grindeanu, Iulian R. wrote:
>> Hello Olivier,
>> Now thinking more about your scenario, my suggestion is to recompute the parallel tags at every stage (rebalancing of the mesh)
>>
>> adding, resetting, updating the parallel tags is an error prone process; it would be better to start from a clean state at every rebalancing; even of you move just a few elements from some processes to others.
>> Will you have to recompute ghost layers, etc ?
>>
>> Everything will happen in memory, and computing the parallel tags is done with "resolve shared entities" type methods, that happen after a parallel distributed read, for example, or when migrating a full mesh to a different set of processes.
>>
>> The parallel info is stored in many tags, some are sparse, some are dense, and setting and resetting those for a limited set of cells/vertices will not be much faster than just recomputing all
>>
>> It is important to keep global ids tags consistent on vertices, and on cells to keep track of them; partitions are cell-based partitions, and vertices and lower dim entities will be shared.
>>
>> we can discuss more
>> Iulian
>>
>>
>> ________________________________________
>> From: Olivier Jamond <olivier.jamond at cea.fr>
>> Sent: Friday, June 11, 2021 12:01 PM
>> To: Vijay S. Mahadevan
>> Cc: Grindeanu, Iulian R.; moab-dev at mcs.anl.gov
>> Subject: Re: [MOAB-dev] Help for migration of cells between MPI subdomains with moab
>>
>>> We can read in MOAB mesh files using different approaches in parallel.
>>> But my understanding is that you need to move entities that are
>>> in-memory, right ?
>> Yes exactly!
>>> Understood. The iMOAB interface implementation that I mentioned
>>> earlier is targeted at moving the entire mesh topology from one set of
>>> processes to another. Not just a subset - though it very well could be
>>> based on a provided list of entities or a meshset containing the
>>> subset. You can look at iMOAB_SendMesh/iMOAB_ReceiveMesh [1] for an
>>> implementation to send mesh pieces between processes. There may be
>>> enough logic there that can be extracted and exposed through a new API
>>> in ParallelComm if needed. Here is a sample routine [2] that sends the
>>> local part of the mesh to a target process. These may not be directly
>>> usable for your use-case without buying fully into the iMOAB
>>> interface, but hopefully gives you an example to move forward.
>>>
>>> Do let us know if you have further questions. We may have better
>>> suggestions after looking at your example.
>>>
>>> Vijay
>>>
>>> [1] https://ftp.mcs.anl.gov/pub/fathom/moab-docs/iMOAB_8cpp_source.html
>>> [2] https://ftp.mcs.anl.gov/pub/fathom/moab-docs/classmoab_1_1ParCommGraph.html#ae13a1f5915695ffd602274b635644125
>> I will have a look at these two functions that I haven't spotted yet!
>>
>> Thanks again, we'll keep in touch!
>> Olivier
>>
>>
>>> On Fri, Jun 11, 2021 at 11:57 AM Olivier Jamond <olivier.jamond at cea.fr> wrote:
>>>> Hi Vijay and Iulian,
>>>>
>>>> Thanks for your answers. I attach my simple toy code to this mail.
>>>> Please tell me if you get it this time!
>>>>
>>>> I just looked at the iMesh interface. Do you carry out migration using
>>>> the hdds with the write/load_mesh functions?
>>>>
>>>> In some of our cases, we plan to do quite 'high frequency' load
>>>> balancing, mainly for fast transient explicit dynamic simulations. In
>>>> this plan, we balance the mesh quite often (using Zoltan also), but at
>>>> each rebalancing, the set of cells which have to be migrated is a small
>>>> part of the cells involved in the calculation.
>>>>
>>>> We do not use the iMesh/ITAPS interface, so no problem with that! In
>>>> fact we wrap in our 'Mesh' class a moab::Core instance and a
>>>> moab::ParallelComm instance.
>>>>
>>>> Thanks a lot for your help,
>>>> Best regards,
>>>> Olivier
>>>>
>>>> On 11/06/2021 17:19, Vijay S. Mahadevan wrote:
>>>>> Hi Olivier,
>>>>>
>>>>> As Iulian mentioned, there is a lot of infrastructure written for our
>>>>> Climate workflows to migrate a mesh from a process set A to a process
>>>>> set B, where A and B can have overlaps or can be completely disjoint.
>>>>> And during the process of migration, we can do repartitioning as well
>>>>> since typically n(A) != n(B). Currently, this functionality is
>>>>> available and heavily tested with iMOAB interface as it is consumed by
>>>>> a Fortran90 based code. However, if you are interested in this for
>>>>> arbitrary meshes, we can work with you to update that interface more
>>>>> generally to support all dimensions. Note that this iMOAB migration
>>>>> implementation currently works for all unstructured meshes on a sphere
>>>>> (so the dimension is hard-coded in places).
>>>>>
>>>>> I suggest you not use the iMesh/ITAPS framework as this is a legacy
>>>>> interface and we are not too keen on supporting it going forward.
>>>>>
>>>>> Thanks,
>>>>> Vijay
>>>>>
>>>>> On Fri, Jun 11, 2021 at 10:52 AM Grindeanu, Iulian R.
>>>>> <iulian at mcs.anl.gov> wrote:
>>>>>> Hello Olivier,
>>>>>> Migration of meshes with ParallelComm methods is not a well tested piece of code, it is rather old;
>>>>>> Your attachment did not go through, can you send it again to me directly?
>>>>>>
>>>>>> I would be happy to assist; In principle, the data needs to be packed, then sent/received using MPI , then unpacked;
>>>>>> We do use the packing and unpacking quite extensively in currently tested code; we prefer now the migration of full meshes from a set of processes to another set, while repartitioning on the fly using Zoltan based methods (more exactly using iMOAB newer interface)
>>>>>>
>>>>>> I am pretty sure ParallelComm methods will work with minor fixes.
>>>>>>
>>>>>> We also support an iMesh / ITAPS based framework, which could assist in migration of elements, but that is just another layer / API ; the heavy duty work is still performed by ParallelComm
>>>>>>
>>>>>> Thanks,
>>>>>> Iulian
>>>>>>
>>>>>> ________________________________________
>>>>>> From: moab-dev <moab-dev-bounces at mcs.anl.gov> on behalf of Olivier Jamond <olivier.jamond at cea.fr>
>>>>>> Sent: Thursday, June 10, 2021 7:32 AM
>>>>>> To: moab-dev at mcs.anl.gov
>>>>>> Subject: [MOAB-dev] Help for migration of cells between MPI subdomains with moab
>>>>>>
>>>>>> Dear Moab developers,
>>>>>>
>>>>>> I am working for the french CEA
>>>>>> (https://en.wikipedia.org/wiki/French_Alternative_Energies_and_Atomic_Energy_Commission)
>>>>>> on the development of a new software dedicated to HPC simulations of
>>>>>> various mechanical scenarii (structures, fluid, FSI, other couplings,
>>>>>> ...) using mesh-based methods (finite-elements, finite-volumes,
>>>>>> hybrid-discontinuous-galerkin, ...).
>>>>>>
>>>>>> This new code uses moab to manage the topological connections between
>>>>>> geometrical entities in a distributed memory context.
>>>>>>
>>>>>> For several reasons, especially for dynamic load balancing, we will need
>>>>>> to be able to migrate cells between MPI processes during calculations.
>>>>>> But at this time I am quite stuck with this with moab... So I wonder if
>>>>>> maybe you could help me with that...
>>>>>>
>>>>>> I struggled for some time with the moab::ParallelComm class to try to
>>>>>> migrate a cell from a process to another one, but without success. I
>>>>>> attached to this email a very simple toy piece of code which illustrates
>>>>>> that. In this simple program, I construct a basic mesh with 2 quads on
>>>>>> the proc0, and I would like to migrate one of these quads to proc1, and
>>>>>> then migrate it back to proc0. As I wrote in some comments in this code,
>>>>>> I tried using different functions ('high-level' send/recv_entities,
>>>>>> send_recv_entities, 'low-level' pack/unpack_entities), but without
>>>>>> success yet...
>>>>>>
>>>>>> I would really appreciate if you could help me with that and maybe take
>>>>>> a look at the attached simple code.
>>>>>>
>>>>>> Best regards,
>>>>>> Olivier Jamond
>>>>>>
>>>>>>