itaps-parallel reasons against part=mesh instance

Thu Apr 22 11:47:09 CDT 2010

On Apr 22, 2010, at 10:39 AM, Tim Tautges wrote:

>
>
> Mark Beall wrote:
>> I don't see how having part=mesh instance and having more than one  
>> part per process implies multiple threads. Given the current  
>> interface, that would seem to be true since the mesh instance is  
>> passed in as the "context" of all the collective calls since it  
>> basically means "all of the mesh on this process", it would just be  
>> a matter of having something else that means the same thing.
>
> Well, that implies then that iMeshP interacts with multiple iMesh  
> instances, right?  That's substantially different from the way both  
> FMDB and MOAB have been designed up to this point.

My understanding was that FMDB is similar to our implementation, but  
that understanding could be wrong.

> It also implies that iMesh instances are thread safe, which isn't a  
> guaranteed thing by any means (though MOAB is).

Actually, exactly the opposite. It allows you to use multiple threads  
without the iMesh instances being thread safe.

> The deeper issue here, though, is that you don't want to have  
> multiple copies of entities local to a process, both for memory and  
> for logistical reasons.
>
> Memory-wise, the amount of memory per core is decreasing or holding  
> steady; BGP and Q have / will have around 512 MB/core.  Memory  
> accounts for over half the cost of these large machines, I think, so  
> increasing that isn't a low-cost solution.

What would be a typical number of elements per core? If I'm at 100,000  
elements per core, then I'm using a few 10s of MB for the mesh and  
copying the boundary is going to be a tiny fraction of that. If the  
partition sizes are smaller the fraction for copying the boundary goes  
up, but the total memory is a tiny fraction of the amount of memory  
per core.

> Logistically, we've designed iMeshP such that each partition has a  
> 1-1 association with an MPI communicator.  There are multiple  
> situations where one wants to communicate with different groups of  
> processes, from the same instance.  I've pointed out the radiation  
> transport application already; I can name a few others if you're  
> interested.  Sure, we can make wholesale changes to iMeshP based on  
> different assumptions, but then there are a few other decisions I'd  
> like to revisit.

I guess I don't see any reason that would have to change.

> Along the same lines, since an
>> iMeshP_PartitionHandle is associated with a single iMesh_Instance  
>> why is it even necessary to pass both into those functions, the  
>> mesh instance is redundant since there is only one that can be the  
>> correct one, right?
>
> We have not assumed that a partition knows about the instance.   
> That's kind of like STL container iterators not knowing about the  
> container itself.  If we had language-specific wrappers, the iMesh  
> instance would likely be the class in C++; it's that way in the  
> python interface we've developed at UW.

What harm would there be in having the partition know about the  
instance? It would make a cleaner interface and avoid a source of  
errors from people passing in the wrong instance. Seems like a better  
choice to me.

> - tim
>
>> mark
>> On Apr 21, 2010, at 12:03 PM, Tim Tautges wrote:
>>> Some of the following argument depends on how partitions are  
>>> handled.  However, assuming the partition is also associated with  
>>> a mesh instance (with the partition in each mesh instance  
>>> coordinated with those on other instances over the parallel  
>>> job)... using one part per instance, and multiple instances on a  
>>> given process, implies multiple threads of control, since many of  
>>> the iMeshP functions are collective calls.  Single threads of  
>>> control are far and away the most common mode for running codes at  
>>> scale right now, and I assert will continue to be for some time  
>>> (5-10 yrs).
>>>
>>> Also, for the mode where an application is providing an iMeshP  
>>> implementation on top of its data structure so it can use services  
>>> implemented on iMeshP, I think the restriction of one part per  
>>> instance means that these apps will always restrict themselves to  
>>> one part per process.  I think your application is of this type.   
>>> So, I'd much rather have this be a restriction of your application  
>>> than a behavior bound at the interface level.  In fact, the latter  
>>> almost guarantees that MOAB will only support one part per  
>>> process, since that will cover 99% of the use cases.
>>>
>>> I think this goes back again to the runtime notion of a part being  
>>> confused with the unit of partitioning that's stored with /  
>>> associated to the mesh.  Maybe that's just a narrow view, though,  
>>> since I'm pretty sure Mark S. disagrees with that.
>>>
>>> - tim
>>>
>>> Mark Beall wrote:
>>>> All,
>>>> I was thinking over the call we had on Monday, specifically about  
>>>> what arguments were made against part=mesh instance. The only  
>>>> really compelling argument I recall (and sorry if I don't  
>>>> remember others, that's why I'm writing this email) was Tim's  
>>>> example of the overhead in partitioning a mesh into 100,000  
>>>> partitions with 8 elements each.
>>>> Well, it kind of struck me that Tim's example, while relevant in  
>>>> terms of the percentage of overhead isn't really that relevant in  
>>>> terms of total memory. The initial mesh there would be 800,000  
>>>> elements, maybe a few hundred MB. Even with much more than 100%  
>>>> overhead, I could easily do that on my laptop. Given that I can  
>>>> buy a computer with 96 GB of memory today for about $8000 (192 GB  
>>>> for $17000) ( a Dell 7500 with 3rd party memory in case you're  
>>>> curious), you could add a couple zeros to the number of  
>>>> partitions for that mesh before it should become an issue for  
>>>> someone that will be running that simulation on a supercomputer  
>>>> costing a few hundred million dollars.
>>>> What were the other compelling arguments against part=mesh  
>>>> instance?
>>>> mark
>>>
>>> -- 
>>> ================================================================
>>> "You will keep in perfect peace him whose mind is
>>> steadfast, because he trusts in you."               Isaiah 26:3
>>>
>>>            Tim Tautges            Argonne National Laboratory
>>>        (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
>>>        phone: (608) 263-8485      1500 Engineering Dr.
>>>          fax: (608) 263-4499      Madison, WI 53706
>>>
>
> -- 
> ================================================================
> "You will keep in perfect peace him whose mind is
>  steadfast, because he trusts in you."               Isaiah 26:3
>
>             Tim Tautges            Argonne National Laboratory
>         (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
>         phone: (608) 263-8485      1500 Engineering Dr.
>           fax: (608) 263-4499      Madison, WI 53706
>