[MPICH] FreeBSD and the ch3:smm channel?
Steve Kargl
sgk at troutmask.apl.washington.edu
Wed Jan 31 17:24:43 CST 2007
On Wed, Jan 31, 2007 at 04:49:59PM -0600, Darius Buntinas wrote:
>
> First, what's the execution time of some of the codes you generally run on
> those machines for the different configurations?
My single processor rip code take roughly 15 minutes of CPU time
to perform a computation. The ripmp code does that same computation
with mpich2 and using ch3:sock reports about 39 seconds when spread
across the 24 effective cpus. Wall clock is roughly 41-42 seconds.
There is very little communication (ie master sends initial data
to slaves and then sits and waits for a single send from each slave).
It may be just some start-up latency that I now see with nemesis.
> Just to check message latency and bandwidth you can run netpipe that comes
> with the MPICH2 distibution:
>
> #substitute ${MPICH2_SRCDIR} with the location of the source directory
> cp ${MPICH2_SRCDIR}/test/mpi/basic/GetOpt.* .
> cp ${MPICH2_SRCDIR}/test/mpi/basic/netmpi.c .
>
> #assuming MPICH2 bin is in your path
> mpicc Getopt.c netmpi.c -o netmpi
>
> #just to see whether its running on the same node or not
> mpiexec -n 2 hostname
node10:kargl[246] mpiexec -n 2 hostname
node10.cimu.org
node10.cimu.org
> mpiexec -n 2 ./netmpi
Some selective output:
0: node10.cimu.org
1: node10.cimu.org
Latency: 0.000000749
Sync Time: 0.000002478
Now starting main loop
0: 0 bytes 1048576 times --> 0.00 Mbps in 0.000000754 sec
1: 1 bytes 1048576 times --> 8.44 Mbps in 0.000000904 sec
10: 20 bytes 163950 times --> 181.48 Mbps in 0.000000841 sec
20: 63 bytes 157433 times --> 534.17 Mbps in 0.000000900 sec
30: 194 bytes 175328 times --> 1430.46 Mbps in 0.000001035 sec
40: 764 bytes 93751 times --> 3780.93 Mbps in 0.000001542 sec
50: 2047 bytes 43727 times --> 5420.73 Mbps in 0.000002881 sec
60: 6146 bytes 26032 times --> 7221.15 Mbps in 0.000006493 sec
70: 24572 bytes 8300 times --> 7731.53 Mbps in 0.000024247 sec
80: 65535 bytes 1847 times --> 7298.06 Mbps in 0.000068510 sec
90: 196610 bytes 914 times --> 8200.11 Mbps in 0.000182926 sec
100: 786428 bytes 256 times --> 6540.57 Mbps in 0.000917347 sec
110: 2097151 bytes 51 times --> 6542.41 Mbps in 0.002445581 sec
120: 6291458 bytes 22 times --> 6638.42 Mbps in 0.007230639 sec
130: 25165820 bytes 6 times --> 6914.70 Mbps in 0.027766923 sec
140: 67108863 bytes 3 times --> 6939.06 Mbps in 0.073785186 sec
> Try this for the different configurations, and with both processes on the
> same node and on different nodes.
netmpi may be the app I need for some network.
> It's possible that your codes aren't that sensitive to latency or
> bandwidth, so you won't see much of a difference. Also, if certain
> processes communicate more than others, it would be beneficial, especially
> with nemesis, to put them on the same node. By default, mpd will
> distribute them in round-robin, so process 0 and 1 will be on separate
> nodes.
My colleague's code is the heavier communicator. After my last email,
I received the following from him concerning his code and nemesis.
--------------------------------------------------------------------------
Steve,
recompiling seems to have worked well! Now it runs much faster on multiple
cores on the same node! It seems to be working, Thanks a lot.
F
|
|Franco,
|
|Can you recompile all of your codes to test the shared memory
|method? I think you may need to link against the new
|libraries to get this to work.
|
|--
|Steve
--------------------------------------------------------------------------
If he's happy, I'm happy.
Thanks for your help. If you would like some future testing
of nemesis in a development version of mpich2, feel free to
drop me a note.
--
Steve
More information about the mpich-discuss
mailing list