itaps-parallel reasons against part=mesh instance

Thu Apr 22 09:39:26 CDT 2010

Mark Beall wrote:
> I don't see how having part=mesh instance and having more than one part 
> per process implies multiple threads. Given the current interface, that 
> would seem to be true since the mesh instance is passed in as the 
> "context" of all the collective calls since it basically means "all of 
> the mesh on this process", it would just be a matter of having something 
> else that means the same thing. 

Well, that implies then that iMeshP interacts with multiple iMesh instances, right?  That's substantially different from 
the way both FMDB and MOAB have been designed up to this point.  It also implies that iMesh instances are thread safe, 
which isn't a guaranteed thing by any means (though MOAB is).

The deeper issue here, though, is that you don't want to have multiple copies of entities local to a process, both for 
memory and for logistical reasons.

Memory-wise, the amount of memory per core is decreasing or holding steady; BGP and Q have / will have around 512 
MB/core.  Memory accounts for over half the cost of these large machines, I think, so increasing that isn't a low-cost 
solution.

Logistically, we've designed iMeshP such that each partition has a 1-1 association with an MPI communicator.  There are 
multiple situations where one wants to communicate with different groups of processes, from the same instance.  I've 
pointed out the radiation transport application already; I can name a few others if you're interested.  Sure, we can 
make wholesale changes to iMeshP based on different assumptions, but then there are a few other decisions I'd like to 
revisit.

Along the same lines, since an
> iMeshP_PartitionHandle is associated with a single iMesh_Instance why is 
> it even necessary to pass both into those functions, the mesh instance 
> is redundant since there is only one that can be the correct one, right?
> 

We have not assumed that a partition knows about the instance.  That's kind of like STL container iterators not knowing 
about the container itself.  If we had language-specific wrappers, the iMesh instance would likely be the class in C++; 
it's that way in the python interface we've developed at UW.

- tim

> mark
> 
> 
> On Apr 21, 2010, at 12:03 PM, Tim Tautges wrote:
> 
>> Some of the following argument depends on how partitions are handled.  
>> However, assuming the partition is also associated with a mesh 
>> instance (with the partition in each mesh instance coordinated with 
>> those on other instances over the parallel job)... using one part per 
>> instance, and multiple instances on a given process, implies multiple 
>> threads of control, since many of the iMeshP functions are collective 
>> calls.  Single threads of control are far and away the most common 
>> mode for running codes at scale right now, and I assert will continue 
>> to be for some time (5-10 yrs).
>>
>> Also, for the mode where an application is providing an iMeshP 
>> implementation on top of its data structure so it can use services 
>> implemented on iMeshP, I think the restriction of one part per 
>> instance means that these apps will always restrict themselves to one 
>> part per process.  I think your application is of this type.  So, I'd 
>> much rather have this be a restriction of your application than a 
>> behavior bound at the interface level.  In fact, the latter almost 
>> guarantees that MOAB will only support one part per process, since 
>> that will cover 99% of the use cases.
>>
>> I think this goes back again to the runtime notion of a part being 
>> confused with the unit of partitioning that's stored with / associated 
>> to the mesh.  Maybe that's just a narrow view, though, since I'm 
>> pretty sure Mark S. disagrees with that.
>>
>> - tim
>>
>> Mark Beall wrote:
>>> All,
>>> I was thinking over the call we had on Monday, specifically about 
>>> what arguments were made against part=mesh instance. The only really 
>>> compelling argument I recall (and sorry if I don't remember others, 
>>> that's why I'm writing this email) was Tim's example of the overhead 
>>> in partitioning a mesh into 100,000 partitions with 8 elements each.
>>> Well, it kind of struck me that Tim's example, while relevant in 
>>> terms of the percentage of overhead isn't really that relevant in 
>>> terms of total memory. The initial mesh there would be 800,000 
>>> elements, maybe a few hundred MB. Even with much more than 100% 
>>> overhead, I could easily do that on my laptop. Given that I can buy a 
>>> computer with 96 GB of memory today for about $8000 (192 GB for 
>>> $17000) ( a Dell 7500 with 3rd party memory in case you're curious), 
>>> you could add a couple zeros to the number of partitions for that 
>>> mesh before it should become an issue for someone that will be 
>>> running that simulation on a supercomputer costing a few hundred 
>>> million dollars.
>>> What were the other compelling arguments against part=mesh instance?
>>> mark
>>
>> -- 
>> ================================================================
>> "You will keep in perfect peace him whose mind is
>>  steadfast, because he trusts in you."               Isaiah 26:3
>>
>>             Tim Tautges            Argonne National Laboratory
>>         (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
>>         phone: (608) 263-8485      1500 Engineering Dr.
>>           fax: (608) 263-4499      Madison, WI 53706
>>
> 
> 

-- 
================================================================
"You will keep in perfect peace him whose mind is
   steadfast, because he trusts in you."               Isaiah 26:3

              Tim Tautges            Argonne National Laboratory
          (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
          phone: (608) 263-8485      1500 Engineering Dr.
            fax: (608) 263-4499      Madison, WI 53706