[mpich-discuss] Poor scaling of MPI_WIN_CREATE?

Wed May 30 10:40:08 CDT 2012

If you don't care about portability, translating from MPI-2 RMA to
DMAPP is mostly trivial and you can eliminate collective window
creation altogether.  However, I will note that my experience getting
MPI and DMAPP to inter-operate properly on XE6 (Hopper, in fact) was
terrible.  And yes, I did everything the NERSC documentation and Cray
told me to do.

I wonder if you can reduce the time spent in MPI_WIN_CREATE by calling
it less often.  Can you not allocate the window once and keep reusing
it?  You might need to restructure your code to reuse the underlying
local buffers but that isn't that complicated in some cases.

Best,

Jeff

On Wed, May 30, 2012 at 10:36 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> On Wed, May 30, 2012 at 10:29 AM, Timothy Stitt <Timothy.Stitt.9 at nd.edu>
> wrote:
>>
>> Hi all,
>>
>> I am currently trying to improve the scaling of a CFD code on some Cray
>> machines at NERSC (I believe Cray systems leverage mpich2 for their MPI
>> communications, hence the posting to this list) and I am running into some
>> scalability issues with the MPI_WIN_CREATE() routine.
>>
>> To cut a long story short, the CFD code requires each process to receive
>> values from some neighborhood processes. Unfortunately, each process doesn't
>> know who its neighbors should be in advance.
>
>
> How often do the neighbors change? By what mechanism?
>
>>
>> To overcome this we exploit the one-sided MPI_PUT() routine to communicate
>> data from neighbors directly.
>>
>> Recent profiling at 256, 512 and 1024 processes shows that the
>> MPI_WIN_CREATE routine is starting to dominate the walltime and reduce our
>> scalability quite rapidly. For instance the %walltime for MPI_WIN_CREATE
>> over various process sizes increases as follows:
>>
>> 256 cores - 4.0%
>> 512 cores - 9.8%
>> 1024 cores - 24.3%
>
>
> The current implementation of MPI_Win_create uses an Allgather which is
> synchronizing and relatively expensive.
>
>>
>>
>> I was wondering if anyone in the MPICH2 community had any advice on how
>> one can improve the performance of MPI_WIN_CREATE? Or maybe someone has a
>> better strategy for communicating the data that bypasses the (poorly
>> scaling?) MPI_WIN_CREATE routine.
>>
>> Thanks in advance for any help you can provide.
>>
>> Regards,
>>
>> Tim.
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>

-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond