[MPICH] Availability of the Driller library
Jean-Marc Saffroy
saffroy at gmail.com
Mon Sep 24 20:33:34 CDT 2007
On Mon, 24 Sep 2007, Darius Buntinas wrote:
>>> Is there a way to split a vma and share only part of it? That would
>>> be interesting as well.
>>
>> Hmmm that would be possible, but the cost of sharing part or all of a
>> given memory region is roughly the same, so why would you want to do
>> this?
>
> Well I was thinking that in MPI, when a call to, say MPI_Send is made,
> the process is not allowed to access the buffer being sent until the
> call returns. So, I was thinking that if the remapping were done in
> MPI_Send, then we wouldn't have to worry about other threads modifying
> the data. That assumes, that the rest of the segment (not being
> remapped) would not have to be copied.
Segment copies only occur at initialization time, for all existing
segments that can possibly be shared. Later on, when new segments are
requested by the application (through malloc, which calls overloaded
versions of sbrk or mmap), they are created as memory mapped files. So at
any time after initialization, the process should have most of its memory
already inside files, and another process can mmap these files in its own
address space when needed.
When a process calls send (resp. recv), and the buffer is in a mapped
file, then the receiving (resp. sending) process can use the API to
retrieve the file descriptor and map it, and then do the recv (resp. send)
with a single memcpy. The file descriptor and memory mapping can (and
should, for performance) be cached by the receiving process until further
notice from the owner process (eg. until free calls munmap which destroys
the mapping).
Now if several threads want to send buffers that lie inside the same
segment, then there is no need to split this segment or remap only parts
of it in receiving processes. The segment can be mapped once entirely in
the process, and all threads only need to care about their own data inside
it.
Supporting multiple threads will require other kinds of precautions:
- Driller initialization in a multithreaded process will be challenging,
because of the requirement that a segment is not written to after its copy
to a file and before the file is mapped
- global structures (process map tree, map cache tree) need mutual
exclusion
- threads should use different sockets to exchange with the fdproxy, or
mutual exclusion should be used
- dlmalloc has locking but, according to its own comments, it is not very
efficient; using a more thread-friendly allocator (such as hoard?) would
be an option
... and possibly other tricks.
I hope this makes things a bit clearer now.
--
saffroy at gmail.com
More information about the mpich-discuss
mailing list