<div class="gmail_quote">On Wed, May 30, 2012 at 11:03 AM, Jim Dinan <span dir="ltr"><<a href="mailto:dinan@mcs.anl.gov" target="_blank">dinan@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Tim,<br>
<br>
How often are you creating windows? As Jed mentioned, this is expected to be fairly expensive and synchronizing on most systems. The Cray XE has some special sauce that can make this cheap if you go through DMAPP directly,</blockquote>
<div><br></div><div>Isn't the whole point of a "vendor optimized MPI" that they would have done this? Is there a semantic reason why MPI_Win_create() cannot be implemented in this fast way using DMAPP?</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> but if you want your performance tuning to be portable, taking window creation off the critical path would be a good change to make.<br>
<br>
~Jim.<div class="im"><br>
<br>
On 5/30/12 10:48 AM, Timothy Stitt wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
Thanks Jeff...you provided some good suggestions. I'll consult the DMAPP<br>
documentation and also go back to the code to see if I can reuse window<br>
buffers in some way.<br>
<br>
Would you happen to have links to the DMAPP docs on-hand? I couldn't<br>
seem to find any tutorials etc. after a quick browse.<br>
<br>
Cheers,<br>
<br>
Tim.<br>
<br>
On May 30, 2012, at 11:40 AM, Jeff Hammond wrote:<br>
<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
If you don't care about portability, translating from MPI-2 RMA to<br>
DMAPP is mostly trivial and you can eliminate collective window<br>
creation altogether. However, I will note that my experience getting<br>
MPI and DMAPP to inter-operate properly on XE6 (Hopper, in fact) was<br>
terrible. And yes, I did everything the NERSC documentation and Cray<br>
told me to do.<br>
<br>
I wonder if you can reduce the time spent in MPI_WIN_CREATE by calling<br>
it less often. Can you not allocate the window once and keep reusing<br>
it? You might need to restructure your code to reuse the underlying<br>
local buffers but that isn't that complicated in some cases.<br>
<br>
Best,<br>
<br>
Jeff<br>
<br>
On Wed, May 30, 2012 at 10:36 AM, Jed Brown <<a href="mailto:jedbrown@mcs.anl.gov" target="_blank">jedbrown@mcs.anl.gov</a><br></div>
<mailto:<a href="mailto:jedbrown@mcs.anl.gov" target="_blank">jedbrown@mcs.anl.gov</a>>> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
On Wed, May 30, 2012 at 10:29 AM, Timothy Stitt<br></div>
<<a href="mailto:Timothy.Stitt.9@nd.edu" target="_blank">Timothy.Stitt.9@nd.edu</a> <mailto:<a href="mailto:Timothy.Stitt.9@nd.edu" target="_blank">Timothy.Stitt.9@nd.edu</a><u></u>>><div><div class="h5"><br>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Hi all,<br>
<br>
I am currently trying to improve the scaling of a CFD code on some Cray<br>
machines at NERSC (I believe Cray systems leverage mpich2 for their MPI<br>
communications, hence the posting to this list) and I am running<br>
into some<br>
scalability issues with the MPI_WIN_CREATE() routine.<br>
<br>
To cut a long story short, the CFD code requires each process to receive<br>
values from some neighborhood processes. Unfortunately, each process<br>
doesn't<br>
know who its neighbors should be in advance.<br>
</blockquote>
<br>
<br>
How often do the neighbors change? By what mechanism?<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
To overcome this we exploit the one-sided MPI_PUT() routine to<br>
communicate<br>
data from neighbors directly.<br>
<br>
Recent profiling at 256, 512 and 1024 processes shows that the<br>
MPI_WIN_CREATE routine is starting to dominate the walltime and<br>
reduce our<br>
scalability quite rapidly. For instance the %walltime for MPI_WIN_CREATE<br>
over various process sizes increases as follows:<br>
<br>
256 cores - 4.0%<br>
512 cores - 9.8%<br>
1024 cores - 24.3%<br>
</blockquote>
<br>
<br>
The current implementation of MPI_Win_create uses an Allgather which is<br>
synchronizing and relatively expensive.<br>
<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
<br>
<br>
I was wondering if anyone in the MPICH2 community had any advice on how<br>
one can improve the performance of MPI_WIN_CREATE? Or maybe someone<br>
has a<br>
better strategy for communicating the data that bypasses the (poorly<br>
scaling?) MPI_WIN_CREATE routine.<br>
<br>
Thanks in advance for any help you can provide.<br>
<br>
Regards,<br>
<br>
Tim.<br>
______________________________<u></u>_________________<br>
mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br></div></div>
<mailto:<a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.<u></u>gov</a>><div class="im"><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/<u></u>mailman/listinfo/mpich-discuss</a><br>
</div></blockquote><div class="im">
<br>
<br>
<br>
______________________________<u></u>_________________<br>
mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br></div>
<mailto:<a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.<u></u>gov</a>><div class="im"><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/<u></u>mailman/listinfo/mpich-discuss</a><br>
<br>
</div></blockquote><div class="im">
<br>
<br>
<br>
--<br>
Jeff Hammond<br>
Argonne Leadership Computing Facility<br>
University of Chicago Computation Institute<br>
</div><a href="mailto:jhammond@alcf.anl.gov" target="_blank">jhammond@alcf.anl.gov</a> <mailto:<a href="mailto:jhammond@alcf.anl.gov" target="_blank">jhammond@alcf.anl.gov</a>> / <a href="tel:%28630%29%20252-5381" value="+16302525381" target="_blank">(630) 252-5381</a><div class="im">
<br>
<a href="http://www.linkedin.com/in/jeffhammond" target="_blank">http://www.linkedin.com/in/<u></u>jeffhammond</a><br>
<a href="https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond" target="_blank">https://wiki.alcf.anl.gov/<u></u>parts/index.php/User:Jhammond</a><br>
______________________________<u></u>_________________<br>
mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/<u></u>mailman/listinfo/mpich-discuss</a><br>
</div></blockquote>
<br>
*Tim Stitt*PhD(User Support Manager).<div class="im"><br>
Center for Research Computing | University of Notre Dame |<br>
P.O. Box 539, Notre Dame, IN 46556 | Phone: <a href="tel:574-631-5287" value="+15746315287" target="_blank">574-631-5287</a> | Email:<br>
</div><a href="mailto:tstitt@nd.edu" target="_blank">tstitt@nd.edu</a> <mailto:<a href="mailto:tstitt@nd.edu" target="_blank">tstitt@nd.edu</a>><div class="im"><br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/<u></u>mailman/listinfo/mpich-discuss</a><br>
</div></blockquote><div class="HOEnZb"><div class="h5">
______________________________<u></u>_________________<br>
mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/<u></u>mailman/listinfo/mpich-discuss</a><br>
</div></div></blockquote></div><br>