[mpich2-commits] r9510 - mpich2/trunk/src/mpid/ch3/src
goodell at mcs.anl.gov
goodell at mcs.anl.gov
Mon Feb 20 23:42:51 CST 2012
Author: goodell
Date: 2012-02-20 23:42:50 -0600 (Mon, 20 Feb 2012)
New Revision: 9510
Modified:
mpich2/trunk/src/mpid/ch3/src/ch3u_eager.c
Log:
update comment about potential short msg optimization
Modified: mpich2/trunk/src/mpid/ch3/src/ch3u_eager.c
===================================================================
--- mpich2/trunk/src/mpid/ch3/src/ch3u_eager.c 2012-02-21 05:42:41 UTC (rev 9509)
+++ mpich2/trunk/src/mpid/ch3/src/ch3u_eager.c 2012-02-21 05:42:50 UTC (rev 9510)
@@ -444,6 +444,14 @@
/* Copy the payload. We could optimize this if recv_data_sz & 0x3 == 0
(copy (recv_data_sz >> 2) ints, inline that since data size is
currently limited to 4 ints */
+ /* We actually could optimize this a lot of ways, including just
+ * putting a memcpy here. Modern compilers will inline fast
+ * versions of the memcpy here (__builtin_memcpy, etc). Another
+ * option is a classic word-copy loop with a switch block at the end
+ * for a remainder. Alternatively a Duff's device loop could work.
+ * Any replacement should be profile driven, and no matter what
+ * we're likely to pick something suboptimal for at least one
+ * compiler out there. [goodell@ 2012-02-10] */
{
unsigned char const * restrict p =
(unsigned char *)eagershort_pkt->data;
More information about the mpich2-commits
mailing list