itaps-parallel reasons against part=mesh instance
Mark Beall
mbeall at simmetrix.com
Thu Apr 22 11:47:09 CDT 2010
On Apr 22, 2010, at 10:39 AM, Tim Tautges wrote:
>
>
> Mark Beall wrote:
>> I don't see how having part=mesh instance and having more than one
>> part per process implies multiple threads. Given the current
>> interface, that would seem to be true since the mesh instance is
>> passed in as the "context" of all the collective calls since it
>> basically means "all of the mesh on this process", it would just be
>> a matter of having something else that means the same thing.
>
> Well, that implies then that iMeshP interacts with multiple iMesh
> instances, right? That's substantially different from the way both
> FMDB and MOAB have been designed up to this point.
My understanding was that FMDB is similar to our implementation, but
that understanding could be wrong.
> It also implies that iMesh instances are thread safe, which isn't a
> guaranteed thing by any means (though MOAB is).
Actually, exactly the opposite. It allows you to use multiple threads
without the iMesh instances being thread safe.
> The deeper issue here, though, is that you don't want to have
> multiple copies of entities local to a process, both for memory and
> for logistical reasons.
>
> Memory-wise, the amount of memory per core is decreasing or holding
> steady; BGP and Q have / will have around 512 MB/core. Memory
> accounts for over half the cost of these large machines, I think, so
> increasing that isn't a low-cost solution.
What would be a typical number of elements per core? If I'm at 100,000
elements per core, then I'm using a few 10s of MB for the mesh and
copying the boundary is going to be a tiny fraction of that. If the
partition sizes are smaller the fraction for copying the boundary goes
up, but the total memory is a tiny fraction of the amount of memory
per core.
> Logistically, we've designed iMeshP such that each partition has a
> 1-1 association with an MPI communicator. There are multiple
> situations where one wants to communicate with different groups of
> processes, from the same instance. I've pointed out the radiation
> transport application already; I can name a few others if you're
> interested. Sure, we can make wholesale changes to iMeshP based on
> different assumptions, but then there are a few other decisions I'd
> like to revisit.
I guess I don't see any reason that would have to change.
> Along the same lines, since an
>> iMeshP_PartitionHandle is associated with a single iMesh_Instance
>> why is it even necessary to pass both into those functions, the
>> mesh instance is redundant since there is only one that can be the
>> correct one, right?
>
> We have not assumed that a partition knows about the instance.
> That's kind of like STL container iterators not knowing about the
> container itself. If we had language-specific wrappers, the iMesh
> instance would likely be the class in C++; it's that way in the
> python interface we've developed at UW.
What harm would there be in having the partition know about the
instance? It would make a cleaner interface and avoid a source of
errors from people passing in the wrong instance. Seems like a better
choice to me.
> - tim
>
>> mark
>> On Apr 21, 2010, at 12:03 PM, Tim Tautges wrote:
>>> Some of the following argument depends on how partitions are
>>> handled. However, assuming the partition is also associated with
>>> a mesh instance (with the partition in each mesh instance
>>> coordinated with those on other instances over the parallel
>>> job)... using one part per instance, and multiple instances on a
>>> given process, implies multiple threads of control, since many of
>>> the iMeshP functions are collective calls. Single threads of
>>> control are far and away the most common mode for running codes at
>>> scale right now, and I assert will continue to be for some time
>>> (5-10 yrs).
>>>
>>> Also, for the mode where an application is providing an iMeshP
>>> implementation on top of its data structure so it can use services
>>> implemented on iMeshP, I think the restriction of one part per
>>> instance means that these apps will always restrict themselves to
>>> one part per process. I think your application is of this type.
>>> So, I'd much rather have this be a restriction of your application
>>> than a behavior bound at the interface level. In fact, the latter
>>> almost guarantees that MOAB will only support one part per
>>> process, since that will cover 99% of the use cases.
>>>
>>> I think this goes back again to the runtime notion of a part being
>>> confused with the unit of partitioning that's stored with /
>>> associated to the mesh. Maybe that's just a narrow view, though,
>>> since I'm pretty sure Mark S. disagrees with that.
>>>
>>> - tim
>>>
>>> Mark Beall wrote:
>>>> All,
>>>> I was thinking over the call we had on Monday, specifically about
>>>> what arguments were made against part=mesh instance. The only
>>>> really compelling argument I recall (and sorry if I don't
>>>> remember others, that's why I'm writing this email) was Tim's
>>>> example of the overhead in partitioning a mesh into 100,000
>>>> partitions with 8 elements each.
>>>> Well, it kind of struck me that Tim's example, while relevant in
>>>> terms of the percentage of overhead isn't really that relevant in
>>>> terms of total memory. The initial mesh there would be 800,000
>>>> elements, maybe a few hundred MB. Even with much more than 100%
>>>> overhead, I could easily do that on my laptop. Given that I can
>>>> buy a computer with 96 GB of memory today for about $8000 (192 GB
>>>> for $17000) ( a Dell 7500 with 3rd party memory in case you're
>>>> curious), you could add a couple zeros to the number of
>>>> partitions for that mesh before it should become an issue for
>>>> someone that will be running that simulation on a supercomputer
>>>> costing a few hundred million dollars.
>>>> What were the other compelling arguments against part=mesh
>>>> instance?
>>>> mark
>>>
>>> --
>>> ================================================================
>>> "You will keep in perfect peace him whose mind is
>>> steadfast, because he trusts in you." Isaiah 26:3
>>>
>>> Tim Tautges Argonne National Laboratory
>>> (tautges at mcs.anl.gov) (telecommuting from UW-Madison)
>>> phone: (608) 263-8485 1500 Engineering Dr.
>>> fax: (608) 263-4499 Madison, WI 53706
>>>
>
> --
> ================================================================
> "You will keep in perfect peace him whose mind is
> steadfast, because he trusts in you." Isaiah 26:3
>
> Tim Tautges Argonne National Laboratory
> (tautges at mcs.anl.gov) (telecommuting from UW-Madison)
> phone: (608) 263-8485 1500 Engineering Dr.
> fax: (608) 263-4499 Madison, WI 53706
>
More information about the itaps-parallel
mailing list