[MPICH] Availability of the Driller library
Darius Buntinas
buntinas at mcs.anl.gov
Tue Sep 25 11:31:48 CDT 2007
Your explanation makes sense, but I forgot to say in my last email was
that I would like to avoid overriding the default memory allocators.
Instead, I would like to remap sections of memory as needed, e.g., for
each MPI_Send operation.
Overriding malloc, mmap, brk, and sbrk work fine for most codes, but
there's always a few which don't work, and I'm just thinking of how to
handle those.
Thanks,
-d
On 09/24/2007 08:33 PM, Jean-Marc Saffroy wrote:
> On Mon, 24 Sep 2007, Darius Buntinas wrote:
>
>>>> Is there a way to split a vma and share only part of it? That would
>>>> be interesting as well.
>>>
>>> Hmmm that would be possible, but the cost of sharing part or all of a
>>> given memory region is roughly the same, so why would you want to do
>>> this?
>>
>> Well I was thinking that in MPI, when a call to, say MPI_Send is made,
>> the process is not allowed to access the buffer being sent until the
>> call returns. So, I was thinking that if the remapping were done in
>> MPI_Send, then we wouldn't have to worry about other threads modifying
>> the data. That assumes, that the rest of the segment (not being
>> remapped) would not have to be copied.
>
> Segment copies only occur at initialization time, for all existing
> segments that can possibly be shared. Later on, when new segments are
> requested by the application (through malloc, which calls overloaded
> versions of sbrk or mmap), they are created as memory mapped files. So
> at any time after initialization, the process should have most of its
> memory already inside files, and another process can mmap these files in
> its own address space when needed.
>
> When a process calls send (resp. recv), and the buffer is in a mapped
> file, then the receiving (resp. sending) process can use the API to
> retrieve the file descriptor and map it, and then do the recv (resp.
> send) with a single memcpy. The file descriptor and memory mapping can
> (and should, for performance) be cached by the receiving process until
> further notice from the owner process (eg. until free calls munmap which
> destroys the mapping).
>
> Now if several threads want to send buffers that lie inside the same
> segment, then there is no need to split this segment or remap only parts
> of it in receiving processes. The segment can be mapped once entirely in
> the process, and all threads only need to care about their own data
> inside it.
>
> Supporting multiple threads will require other kinds of precautions:
> - Driller initialization in a multithreaded process will be
> challenging, because of the requirement that a segment is not written to
> after its copy to a file and before the file is mapped
> - global structures (process map tree, map cache tree) need mutual
> exclusion
> - threads should use different sockets to exchange with the fdproxy, or
> mutual exclusion should be used
> - dlmalloc has locking but, according to its own comments, it is not
> very efficient; using a more thread-friendly allocator (such as hoard?)
> would be an option
> ... and possibly other tricks.
>
>
> I hope this makes things a bit clearer now.
>
More information about the mpich-discuss
mailing list