[mpich2-dev] MPI_Alloc_mem ignores info argument and fails to register memory
Jeff Hammond
jhammond at alcf.anl.gov
Tue Sep 6 18:42:36 CDT 2011
I don't think I have Trac developer rights on MPICH2 yet.
In any case, I am reasonably confident I can implement a patch at the
device level for BG but that isn't useful here. Someone with more
experience inside of ch3 will have to figure out where to setup an
interface for registration. Maybe your efforts will be informative in
this regard.
Jeff
On Wed, Sep 7, 2011 at 1:30 AM, Howard Pritchard <howardp at cray.com> wrote:
> Hi Jeff,
>
> You'll be happy to know that cray already has an internal RFE filed
> against the cray mpich2 for exactly this kind of support.
> So its in the queue.
>
> Maybe you should open a track ticket against argonne mpich2?
>
> Howard
>
> Jeff Hammond wrote:
>> I would like to be able to use an info argument to instruct
>> MPI_Alloc_mem to register pinned buffers in order to maximize
>> performance of RMA on networks that support/require this. Currently,
>> no MPICH2-derived implementation I have investigated (MPICH2,
>> MVAPICH2, BGP-MPI) even considers the info argument, and therefore has
>> no opportunity to optimize RMA using RMA-oriented buffers. Rather,
>> the first RMA call with any buffer requires registration overhead,
>> which Jim Dinan has demonstrated to have a noticeable impact on
>> performance relative to ARMCI as well as a simulation of what would
>> happen if MPI_Alloc_mem did what I consider to be the right thing,
>> meaning pre-registered buffers.
>>
>> On the other hand, the ultra-modern and extremely well-designed
>> OpenMPI parses the info argument and provides an implementation of
>> preregistration when it is desired. Note that this comment is only an
>> attempt to troll Pavan and should not be taken too seriously, although
>> I do think that OpenMPI is doing the right thing by providing the user
>> the option of helping MPI make an intelligent decision internally.
>>
>> The following are the comparative call paths of the two MPI
>> implementations under consideration:
>>
>> MPICH2 trunk:
>>
>> int MPI_Alloc_mem(MPI_Aint size, MPI_Info info, void *baseptr)
>> void *MPID_Alloc_mem( size_t size, MPID_Info *info_ptr )
>> void *MPIDI_Alloc_mem( size_t size, MPID_Info *info_ptr )
>> MPIU_Malloc(size);
>>
>> OpenMPI 1.4.3:
>>
>> int MPI_Alloc_mem(MPI_Aint size, MPI_Info info, void *baseptr)
>> void *mca_mpool_base_alloc(size_t size, ompi_info_t *info)
>> <stuff that actually does memory registration in appropriate cases>
>>
>> On a related subject, at the EPFL-CECAM workshop I participated in
>> this week, a CP2K developer commented that MPI RMA performance would
>> be better if, like IBM-MPI, MPICH2-derived implementations like
>> CrayMPI for Gemini took an info argument that allowed the user to
>> request immediate firing of e.g. Put, rather than the
>> wait-until-the-last-minute-and-pack-it approach currently employed in
>> CH3 (I haven't read the source but multiple MPICH2 developers have
>> said that this is the case). Modern networks are very unlike Ethernet
>> in their ability to handle rapid injection of many small packets (Cray
>> Gemini is a perfect example) and therefore RMA should be flexible
>> enough to accommodate an implementation for an Ethernot network. I
>> know from a direct implementation of noncontiguous operations in DCMF
>> that packing is unsuitable in many cases, particularly when the user
>> wants true passive-target progress without user interrupts. This is
>> actually the use case of my collaborator at Juelich.
>>
>> Anyways, neither of my points is particularly new information to Jim
>> and Pavan, but I wanted to summarize it all here now that I have more
>> specific information to add, particularly the apparent superiority of
>> OpenMPI to MPICH2 in one particular instance :-)
>>
>> Best,
>>
>> Jeff
>>
>
>
> --
> Howard Pritchard
> Software Engineering
> Cray, Inc.
>
--
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/index.php/User:Jhammond
More information about the mpich2-dev
mailing list