[mpich2-commits] r9510 - mpich2/trunk/src/mpid/ch3/src

goodell at mcs.anl.gov goodell at mcs.anl.gov
Mon Feb 20 23:42:51 CST 2012


Author: goodell
Date: 2012-02-20 23:42:50 -0600 (Mon, 20 Feb 2012)
New Revision: 9510

Modified:
   mpich2/trunk/src/mpid/ch3/src/ch3u_eager.c
Log:
update comment about potential short msg optimization

Modified: mpich2/trunk/src/mpid/ch3/src/ch3u_eager.c
===================================================================
--- mpich2/trunk/src/mpid/ch3/src/ch3u_eager.c	2012-02-21 05:42:41 UTC (rev 9509)
+++ mpich2/trunk/src/mpid/ch3/src/ch3u_eager.c	2012-02-21 05:42:50 UTC (rev 9510)
@@ -444,6 +444,14 @@
  	    /* Copy the payload. We could optimize this if recv_data_sz & 0x3 == 0 
 	       (copy (recv_data_sz >> 2) ints, inline that since data size is 
 	       currently limited to 4 ints */
+            /* We actually could optimize this a lot of ways, including just
+             * putting a memcpy here.  Modern compilers will inline fast
+             * versions of the memcpy here (__builtin_memcpy, etc).  Another
+             * option is a classic word-copy loop with a switch block at the end
+             * for a remainder.  Alternatively a Duff's device loop could work.
+             * Any replacement should be profile driven, and no matter what
+             * we're likely to pick something suboptimal for at least one
+             * compiler out there. [goodell@ 2012-02-10] */
 	    {
 		unsigned char const * restrict p = 
 		    (unsigned char *)eagershort_pkt->data;



More information about the mpich2-commits mailing list