[MPICH] FreeBSD and the ch3:smm channel?

Wed Jan 31 17:24:43 CST 2007

On Wed, Jan 31, 2007 at 04:49:59PM -0600, Darius Buntinas wrote:
> 
> First, what's the execution time of some of the codes you generally run on 
> those machines for the different configurations?

My single processor rip code take roughly 15 minutes of CPU time
to perform a computation.  The ripmp code does that same computation
with mpich2 and using ch3:sock reports about 39 seconds when spread
across the 24 effective cpus.  Wall clock is roughly 41-42 seconds.
There is very little communication (ie master sends initial data
to slaves and then sits and waits for a single send from each slave). 
It may be just some start-up latency that I now see with nemesis.

> Just to check message latency and bandwidth you can run netpipe that comes 
> with the MPICH2 distibution:
> 
> #substitute ${MPICH2_SRCDIR} with the location of the source directory
> cp ${MPICH2_SRCDIR}/test/mpi/basic/GetOpt.* .
> cp ${MPICH2_SRCDIR}/test/mpi/basic/netmpi.c .
> 
> #assuming MPICH2 bin is in your path
> mpicc Getopt.c netmpi.c -o netmpi
> 
> #just to see whether its running on the same node or not
> mpiexec -n 2 hostname

node10:kargl[246] mpiexec -n 2 hostname
node10.cimu.org
node10.cimu.org

> mpiexec -n 2 ./netmpi

Some selective output:

0: node10.cimu.org
1: node10.cimu.org
Latency: 0.000000749
Sync Time: 0.000002478
Now starting main loop
  0:         0 bytes 1048576 times -->    0.00 Mbps in 0.000000754 sec
  1:         1 bytes 1048576 times -->    8.44 Mbps in 0.000000904 sec
 10:        20 bytes 163950 times -->  181.48 Mbps in 0.000000841 sec
 20:        63 bytes 157433 times -->  534.17 Mbps in 0.000000900 sec
 30:       194 bytes 175328 times -->  1430.46 Mbps in 0.000001035 sec
 40:       764 bytes 93751 times -->  3780.93 Mbps in 0.000001542 sec
 50:      2047 bytes 43727 times -->  5420.73 Mbps in 0.000002881 sec
 60:      6146 bytes 26032 times -->  7221.15 Mbps in 0.000006493 sec
 70:     24572 bytes 8300 times -->  7731.53 Mbps in 0.000024247 sec
 80:     65535 bytes 1847 times -->  7298.06 Mbps in 0.000068510 sec
 90:    196610 bytes  914 times -->  8200.11 Mbps in 0.000182926 sec
100:    786428 bytes  256 times -->  6540.57 Mbps in 0.000917347 sec
110:   2097151 bytes   51 times -->  6542.41 Mbps in 0.002445581 sec
120:   6291458 bytes   22 times -->  6638.42 Mbps in 0.007230639 sec
130:  25165820 bytes    6 times -->  6914.70 Mbps in 0.027766923 sec
140:  67108863 bytes    3 times -->  6939.06 Mbps in 0.073785186 sec

> Try this for the different configurations, and with both processes on the 
> same node and on different nodes.

netmpi may be the app I need for some network.

> It's possible that your codes aren't that sensitive to latency or 
> bandwidth, so you won't see much of a difference.  Also, if certain 
> processes communicate more than others, it would be beneficial, especially 
> with nemesis, to put them on the same node.  By default, mpd will 
> distribute them in round-robin, so process 0 and 1 will be on separate 
> nodes.

My colleague's code is the heavier communicator.  After my last email,
I received the following from him concerning his code and nemesis.

--------------------------------------------------------------------------
Steve,

recompiling seems to have worked well! Now it runs much faster on multiple
cores on the same node!  It seems to be working, Thanks a lot.

F 

|
|Franco,
|
|Can you recompile all of your codes to test the shared memory 
|method?  I think you may need to link against the new 
|libraries to get this to work.
|
|--
|Steve
--------------------------------------------------------------------------

If he's happy, I'm happy.
Thanks for your help.  If you would like some future testing
of nemesis in a development version of mpich2, feel free to
drop me a note.  

-- 
Steve