[MPICH] FreeBSD and the ch3:smm channel?

Wed Jan 31 16:20:46 CST 2007

On Wed, Jan 31, 2007 at 03:40:25PM -0600, Darius Buntinas wrote:
> 
> On Wed, 31 Jan 2007, Steve Kargl wrote:
> 
> >On Wed, Jan 31, 2007 at 01:34:24PM -0600, Darius Buntinas wrote:
> >>
> >>We're working on a fixing this the right way, but in the mean time, as
> >>long as you don't need to use --enable-fast, edit the file
> >>src/mpid/ch3/channels/nemesis/setup_channel.args and remove every thing
> >>after, and including, the line that starts with "eval" (i.e., the eval
> >>line and the whole for loop).  Then give configure a try again.
> >>
> >>Let me know if this helps.
> >>
> >
> >That appears to work.  I can build and install mpich2 with
> >the nemesis device.  Unfortunately, it doesn't appear to
> >help performance on the SMP systems.
> >
> 
> What kind of latency/bandwidth are you getting, and what kind or machine 
> are you running on?
> 

The cluster topology can be seen at 

http://troutmask.apl.washington.edu/~kargl/hpc.html

The nodes are connected with gigE ethernet using standard TCP/IP
packets.  I tried jumbo frames, but that also seemed to reduce
performance.  Each node has 2 dual-core opteron processors (ie
4 effective cpus per node).   I was hoping the ch3:smm (or 
ch3:nemesis) would improve communication for same-node-processes.

I'm still trying to determine the best way to measure the latency
for our codes.  All I have at the moment is antidotal measures
(ie, wall-clock and Fortran cpu_time).  My code is a standard
master-slave algorithm and ch3:sock works well.  My colleague uses
a scatter-gather algorithm and communication appears to be killing
him.  If I switch us to ch3:nemesis, performance appears to go
down for both codes.

With my code and ch3:nemesis and an otherwise idle cluster, I do

$ mpiexec -n 24 ./ripmp

and top(1) immediately shows

  PID USERNAME    THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
 4079 kargl         1  96    0 33880K 10276K select 0   0:01  0.00% python2.4
54225 kargl         1  96    0 32960K  9524K select 0   0:00  0.00% python2.4
54228 kargl         1  96    0 34148K 10548K select 0   0:00  0.00% python2.4
54227 kargl         1  96    0 34148K 10548K select 0   0:00  0.00% python2.4
54226 kargl         1  96    0 34148K 10548K select 3   0:00  0.00% python2.4
54229 kargl         1  96    0 34148K 10548K select 0   0:00  0.00% python2.4
54231 kargl         1   4    0  7156K  2304K sbwait 0   0:00  0.00% ripmp
54232 kargl         1   4    0  7156K  2304K sbwait 2   0:00  0.00% ripmp
54230 kargl         1   4    0  7156K  2304K sbwait 0   0:00  0.00% ripmp
54233 kargl         1   4    0  7156K  2304K sbwait 3   0:00  0.00% ripmp

ripmp is my code and the python2.4 jobs are from mpiexec.  The ripmp
jobs remain in the sbwait state for at least 45 seconds, then the state
changes to accept and back to sbwait, after 60+ seconds the ripmp jobs
suddenly start to run 

54233 kargl         1 112    0 40860K  3176K RUN    0   0:43 84.52% ripmp
54230 kargl         1 112    0 40836K  3192K CPU3   1   0:41 84.27% ripmp
54231 kargl         1 112    0 40836K  3200K CPU2   3   0:40 83.98% ripmp
54232 kargl         1 112    0 40836K  3196K CPU1   2   0:38 83.78% ripmp

With the ch3:sock, the ripmp jobs start to run almost immediately and
actually reach 98% cpu utilitization.

If you have a suggestion on how to measure the difference in 
latency for ch3:sock and ch3:nemesis, then I'll try to gather
some numbers.

-- 
Steve