[MPICH] Availability of the Driller library

Jean-Marc Saffroy saffroy at gmail.com
Mon Sep 24 20:33:34 CDT 2007


On Mon, 24 Sep 2007, Darius Buntinas wrote:

>>> Is there a way to split a vma and share only part of it?  That would 
>>> be interesting as well.
>> 
>> Hmmm that would be possible, but the cost of sharing part or all of a 
>> given memory region is roughly the same, so why would you want to do 
>> this?
>
> Well I was thinking that in MPI, when a call to, say MPI_Send is made, 
> the process is not allowed to access the buffer being sent until the 
> call returns. So, I was thinking that if the remapping were done in 
> MPI_Send, then we wouldn't have to worry about other threads modifying 
> the data. That assumes, that the rest of the segment (not being 
> remapped) would not have to be copied.

Segment copies only occur at initialization time, for all existing 
segments that can possibly be shared. Later on, when new segments are 
requested by the application (through malloc, which calls overloaded 
versions of sbrk or mmap), they are created as memory mapped files. So at 
any time after initialization, the process should have most of its memory 
already inside files, and another process can mmap these files in its own 
address space when needed.

When a process calls send (resp. recv), and the buffer is in a mapped 
file, then the receiving (resp. sending) process can use the API to 
retrieve the file descriptor and map it, and then do the recv (resp. send) 
with a single memcpy. The file descriptor and memory mapping can (and 
should, for performance) be cached by the receiving process until further 
notice from the owner process (eg. until free calls munmap which destroys 
the mapping).

Now if several threads want to send buffers that lie inside the same 
segment, then there is no need to split this segment or remap only parts 
of it in receiving processes. The segment can be mapped once entirely in 
the process, and all threads only need to care about their own data inside 
it.

Supporting multiple threads will require other kinds of precautions:
  - Driller initialization in a multithreaded process will be challenging, 
because of the requirement that a segment is not written to after its copy 
to a file and before the file is mapped
  - global structures (process map tree, map cache tree) need mutual 
exclusion
  - threads should use different sockets to exchange with the fdproxy, or 
mutual exclusion should be used
  - dlmalloc has locking but, according to its own comments, it is not very 
efficient; using a more thread-friendly allocator (such as hoard?) would 
be an option
... and possibly other tricks.


I hope this makes things a bit clearer now.

-- 
saffroy at gmail.com




More information about the mpich-discuss mailing list