[MPICH] Availability of the Driller library

Darius Buntinas buntinas at mcs.anl.gov
Tue Sep 25 11:31:48 CDT 2007


Your explanation makes sense, but I forgot to say in my last email was 
that I would like to avoid overriding the default memory allocators. 
Instead, I would like to remap sections of memory as needed, e.g., for 
each MPI_Send operation.

Overriding malloc, mmap, brk, and sbrk work fine for most codes, but 
there's always a few which don't work, and I'm just thinking of how to 
handle those.

Thanks,
-d

On 09/24/2007 08:33 PM, Jean-Marc Saffroy wrote:
> On Mon, 24 Sep 2007, Darius Buntinas wrote:
> 
>>>> Is there a way to split a vma and share only part of it?  That would 
>>>> be interesting as well.
>>>
>>> Hmmm that would be possible, but the cost of sharing part or all of a 
>>> given memory region is roughly the same, so why would you want to do 
>>> this?
>>
>> Well I was thinking that in MPI, when a call to, say MPI_Send is made, 
>> the process is not allowed to access the buffer being sent until the 
>> call returns. So, I was thinking that if the remapping were done in 
>> MPI_Send, then we wouldn't have to worry about other threads modifying 
>> the data. That assumes, that the rest of the segment (not being 
>> remapped) would not have to be copied.
> 
> Segment copies only occur at initialization time, for all existing 
> segments that can possibly be shared. Later on, when new segments are 
> requested by the application (through malloc, which calls overloaded 
> versions of sbrk or mmap), they are created as memory mapped files. So 
> at any time after initialization, the process should have most of its 
> memory already inside files, and another process can mmap these files in 
> its own address space when needed.
> 
> When a process calls send (resp. recv), and the buffer is in a mapped 
> file, then the receiving (resp. sending) process can use the API to 
> retrieve the file descriptor and map it, and then do the recv (resp. 
> send) with a single memcpy. The file descriptor and memory mapping can 
> (and should, for performance) be cached by the receiving process until 
> further notice from the owner process (eg. until free calls munmap which 
> destroys the mapping).
> 
> Now if several threads want to send buffers that lie inside the same 
> segment, then there is no need to split this segment or remap only parts 
> of it in receiving processes. The segment can be mapped once entirely in 
> the process, and all threads only need to care about their own data 
> inside it.
> 
> Supporting multiple threads will require other kinds of precautions:
>  - Driller initialization in a multithreaded process will be 
> challenging, because of the requirement that a segment is not written to 
> after its copy to a file and before the file is mapped
>  - global structures (process map tree, map cache tree) need mutual 
> exclusion
>  - threads should use different sockets to exchange with the fdproxy, or 
> mutual exclusion should be used
>  - dlmalloc has locking but, according to its own comments, it is not 
> very efficient; using a more thread-friendly allocator (such as hoard?) 
> would be an option
> ... and possibly other tricks.
> 
> 
> I hope this makes things a bit clearer now.
> 




More information about the mpich-discuss mailing list