[mpich-discuss] Poor scaling of MPI_WIN_CREATE?

Jed Brown jedbrown at mcs.anl.gov
Wed May 30 11:08:36 CDT 2012


On Wed, May 30, 2012 at 11:03 AM, Jim Dinan <dinan at mcs.anl.gov> wrote:

> Hi Tim,
>
> How often are you creating windows?  As Jed mentioned, this is expected to
> be fairly expensive and synchronizing on most systems.  The Cray XE has
> some special sauce that can make this cheap if you go through DMAPP
> directly,


Isn't the whole point of a "vendor optimized MPI" that they would have done
this? Is there a semantic reason why MPI_Win_create() cannot be implemented
in this fast way using DMAPP?


> but if you want your performance tuning to be portable, taking window
> creation off the critical path would be a good change to make.
>
>  ~Jim.
>
>
> On 5/30/12 10:48 AM, Timothy Stitt wrote:
>
>> Thanks Jeff...you provided some good suggestions. I'll consult the DMAPP
>> documentation and also go back to the code to see if I can reuse window
>> buffers in some way.
>>
>> Would you happen to have links to the DMAPP docs on-hand? I couldn't
>> seem to find any tutorials etc. after a quick browse.
>>
>> Cheers,
>>
>> Tim.
>>
>> On May 30, 2012, at 11:40 AM, Jeff Hammond wrote:
>>
>>  If you don't care about portability, translating from MPI-2 RMA to
>>> DMAPP is mostly trivial and you can eliminate collective window
>>> creation altogether. However, I will note that my experience getting
>>> MPI and DMAPP to inter-operate properly on XE6 (Hopper, in fact) was
>>> terrible. And yes, I did everything the NERSC documentation and Cray
>>> told me to do.
>>>
>>> I wonder if you can reduce the time spent in MPI_WIN_CREATE by calling
>>> it less often. Can you not allocate the window once and keep reusing
>>> it? You might need to restructure your code to reuse the underlying
>>> local buffers but that isn't that complicated in some cases.
>>>
>>> Best,
>>>
>>> Jeff
>>>
>>> On Wed, May 30, 2012 at 10:36 AM, Jed Brown <jedbrown at mcs.anl.gov
>>> <mailto:jedbrown at mcs.anl.gov>> wrote:
>>>
>>>> On Wed, May 30, 2012 at 10:29 AM, Timothy Stitt
>>>> <Timothy.Stitt.9 at nd.edu <mailto:Timothy.Stitt.9 at nd.edu**>>
>>>>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I am currently trying to improve the scaling of a CFD code on some Cray
>>>>> machines at NERSC (I believe Cray systems leverage mpich2 for their MPI
>>>>> communications, hence the posting to this list) and I am running
>>>>> into some
>>>>> scalability issues with the MPI_WIN_CREATE() routine.
>>>>>
>>>>> To cut a long story short, the CFD code requires each process to
>>>>> receive
>>>>> values from some neighborhood processes. Unfortunately, each process
>>>>> doesn't
>>>>> know who its neighbors should be in advance.
>>>>>
>>>>
>>>>
>>>> How often do the neighbors change? By what mechanism?
>>>>
>>>>
>>>>> To overcome this we exploit the one-sided MPI_PUT() routine to
>>>>> communicate
>>>>> data from neighbors directly.
>>>>>
>>>>> Recent profiling at 256, 512 and 1024 processes shows that the
>>>>> MPI_WIN_CREATE routine is starting to dominate the walltime and
>>>>> reduce our
>>>>> scalability quite rapidly. For instance the %walltime for
>>>>> MPI_WIN_CREATE
>>>>> over various process sizes increases as follows:
>>>>>
>>>>> 256 cores - 4.0%
>>>>> 512 cores - 9.8%
>>>>> 1024 cores - 24.3%
>>>>>
>>>>
>>>>
>>>> The current implementation of MPI_Win_create uses an Allgather which is
>>>> synchronizing and relatively expensive.
>>>>
>>>>
>>>>>
>>>>> I was wondering if anyone in the MPICH2 community had any advice on how
>>>>> one can improve the performance of MPI_WIN_CREATE? Or maybe someone
>>>>> has a
>>>>> better strategy for communicating the data that bypasses the (poorly
>>>>> scaling?) MPI_WIN_CREATE routine.
>>>>>
>>>>> Thanks in advance for any help you can provide.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Tim.
>>>>> ______________________________**_________________
>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>> <mailto:mpich-discuss at mcs.anl.**gov <mpich-discuss at mcs.anl.gov>>
>>>>>
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mcs.anl.gov/**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>>>>>
>>>>
>>>>
>>>>
>>>> ______________________________**_________________
>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>> <mailto:mpich-discuss at mcs.anl.**gov <mpich-discuss at mcs.anl.gov>>
>>>>
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mcs.anl.gov/**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>>>>
>>>>
>>>
>>>
>>> --
>>> Jeff Hammond
>>> Argonne Leadership Computing Facility
>>> University of Chicago Computation Institute
>>> jhammond at alcf.anl.gov <mailto:jhammond at alcf.anl.gov> / (630) 252-5381
>>>
>>> http://www.linkedin.com/in/**jeffhammond<http://www.linkedin.com/in/jeffhammond>
>>> https://wiki.alcf.anl.gov/**parts/index.php/User:Jhammond<https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond>
>>> ______________________________**_________________
>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>>>
>>
>> *Tim Stitt*PhD(User Support Manager).
>>
>> Center for Research Computing | University of Notre Dame |
>> P.O. Box 539, Notre Dame, IN 46556 | Phone: 574-631-5287 | Email:
>> tstitt at nd.edu <mailto:tstitt at nd.edu>
>>
>>
>>
>>
>> ______________________________**_________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>>
> ______________________________**_________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/**mailman/listinfo/mpich-discuss<https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120530/d76f97e3/attachment.html>


More information about the mpich-discuss mailing list