[mpich-discuss] How expensive is MPI_Win_create() compared to memcpy()?

Jed Brown jedbrown at mcs.anl.gov
Tue Dec 20 17:21:42 CST 2011

On Tue, Dec 20, 2011 at 16:51, Dave Goodell <goodell at mcs.anl.gov> wrote:

> No, although it's not likely to be fixed within the next few months
> either.  Some of this can become a bit better with MPI-3 RMA, although
> possibly not for your use case.
> I'm not sure if 100% of the synchronization will be able to be eliminated.
>  But we can almost certainly fix all of the memory scalability issues,
> given enough software development effort.

If there was a way to create a window, post, and start without imposing a
hard synchronization, it would be useful. Alternatively, if we could
"re-seat" a window by giving it different memory...

> > 2. How expensive should I consider this operation to be? Are there
> micro-benchmark results scaling out to 10k-100k cores somewhere?
> Take a look at page 20 of this IBM slide deck that I found with a little
> bit of googling:
> http://www.scc.acad.bg/articles/library/BLue%20Gene%20P/MPI%20Collective%20Communications%20on%20The%20Blue%20Gene%20P.pdf
> It does show microbenchmark performance for Blue Gene/P for a variety of
> message sizes.  I'm guessing it's 16k processes based on the labels from
> the other plots.  Unfortunately it doesn't show Allgather performance for
> the worst case, a non-MPI_COMM_WORLD, non-rectangular communicator.  I'd
> say though that you are looking at something on the order of 1000 to 10000
> us for this data size.

Thanks. This is the time to memcpy a few megabytes, a reasonably typical
data volume. I won't stress over getting both variants implemented right
away, but I think persistent windows with a copy will be used more
frequently. That variant also gives me the opportunity to do my own packing
(hidden from the user).

> I'm sure that someone at ALCF could point us towards some more useful data
> for BG/P and BG/Q.  They would also have hard numbers on memcpy performance.

Memcpy is easy because it's also STREAM copy.

> > I'll end up implementing both versions, but it would be nice to know how
> urgent it is likely to be and have a guess for where to place the threshold.
> I think it will be hard for us to predict with any certainty and will
> depend substantially on the relative performance between the system's
> network and memcpy.  A Blue Gene system will have a very different ratio
> than an Intel cluster, for example.

Yup, but it's still nice to know which orders of magnitude to look at.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111220/3bab0648/attachment.htm>

More information about the mpich-discuss mailing list