[mpich2-dev] MPI_Alloc_mem ignores info argument and fails to register memory

James Dinan dinan at mcs.anl.gov
Tue Sep 6 19:15:16 CDT 2011


Hi Jeff,

I've created a ticket for this:

http://trac.mcs.anl.gov/projects/mpich2/ticket/1526

  ~Jim.

On 9/6/11 6:42 PM, Jeff Hammond wrote:
> I don't think I have Trac developer rights on MPICH2 yet.
>
> In any case, I am reasonably confident I can implement a patch at the
> device level for BG but that isn't useful here.  Someone with more
> experience inside of ch3 will have to figure out where to setup an
> interface for registration.  Maybe your efforts will be informative in
> this regard.
>
> Jeff
>
> On Wed, Sep 7, 2011 at 1:30 AM, Howard Pritchard<howardp at cray.com>  wrote:
>> Hi Jeff,
>>
>> You'll be happy to know that cray already has an internal RFE filed
>> against the cray mpich2 for exactly this kind of support.
>> So its in the queue.
>>
>> Maybe you should open a track ticket against argonne mpich2?
>>
>> Howard
>>
>> Jeff Hammond wrote:
>>> I would like to be able to use an info argument to instruct
>>> MPI_Alloc_mem to register pinned buffers in order to maximize
>>> performance of RMA on networks that support/require this.  Currently,
>>> no MPICH2-derived implementation I have investigated (MPICH2,
>>> MVAPICH2, BGP-MPI) even considers the info argument, and therefore has
>>> no opportunity to optimize RMA using RMA-oriented buffers.  Rather,
>>> the first RMA call with any buffer requires registration overhead,
>>> which Jim Dinan has demonstrated to have a noticeable impact on
>>> performance relative to ARMCI as well as a simulation of what would
>>> happen if MPI_Alloc_mem did what I consider to be the right thing,
>>> meaning pre-registered buffers.
>>>
>>> On the other hand, the ultra-modern and extremely well-designed
>>> OpenMPI parses the info argument and provides an implementation of
>>> preregistration when it is desired.  Note that this comment is only an
>>> attempt to troll Pavan and should not be taken too seriously, although
>>> I do think that OpenMPI is doing the right thing by providing the user
>>> the option of helping MPI make an intelligent decision internally.
>>>
>>> The following are the comparative call paths of the two MPI
>>> implementations under consideration:
>>>
>>> MPICH2 trunk:
>>>
>>> int MPI_Alloc_mem(MPI_Aint size, MPI_Info info, void *baseptr)
>>> void *MPID_Alloc_mem( size_t size, MPID_Info *info_ptr )
>>> void *MPIDI_Alloc_mem( size_t size, MPID_Info *info_ptr )
>>> MPIU_Malloc(size);
>>>
>>> OpenMPI 1.4.3:
>>>
>>> int MPI_Alloc_mem(MPI_Aint size, MPI_Info info, void *baseptr)
>>> void *mca_mpool_base_alloc(size_t size, ompi_info_t *info)
>>> <stuff that actually does memory registration in appropriate cases>
>>>
>>> On a related subject, at the EPFL-CECAM workshop I participated in
>>> this week, a CP2K developer commented that MPI RMA performance would
>>> be better if, like IBM-MPI, MPICH2-derived implementations like
>>> CrayMPI for Gemini took an info argument that allowed the user to
>>> request immediate firing of e.g. Put, rather than the
>>> wait-until-the-last-minute-and-pack-it approach currently employed in
>>> CH3 (I haven't read the source but multiple MPICH2 developers have
>>> said that this is the case).  Modern networks are very unlike Ethernet
>>> in their ability to handle rapid injection of many small packets (Cray
>>> Gemini is a perfect example) and therefore RMA should be flexible
>>> enough to accommodate an implementation for an Ethernot network.  I
>>> know from a direct implementation of noncontiguous operations in DCMF
>>> that packing is unsuitable in many cases, particularly when the user
>>> wants true passive-target progress without user interrupts.  This is
>>> actually the use case of my collaborator at Juelich.
>>>
>>> Anyways, neither of my points is particularly new information to Jim
>>> and Pavan, but I wanted to summarize it all here now that I have more
>>> specific information to add, particularly the apparent superiority of
>>> OpenMPI to MPICH2 in one particular instance :-)
>>>
>>> Best,
>>>
>>> Jeff
>>>
>>
>>
>> --
>> Howard Pritchard
>> Software Engineering
>> Cray, Inc.
>>
>
>
>


More information about the mpich2-dev mailing list