<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16674" name=GENERATOR></HEAD>
<BODY
style="WORD-WRAP: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space">
<DIV dir=ltr align=left><SPAN class=781060603-05072008><FONT face=Arial
color=#0000ff size=2>Nemesis will be the default soon in 1.1. We should have
made it the default earlier, but it didn't support MPI-2 dynamic processes and
it wasn't passing all the extensive set of tests.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=781060603-05072008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=781060603-05072008><FONT face=Arial
color=#0000ff size=2>Rajeev</FONT></SPAN></DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> owner-mpich-discuss@mcs.anl.gov
[mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of </B>Robert
Kubrick<BR><B>Sent:</B> Friday, July 04, 2008 12:27 PM<BR><B>To:</B>
mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> Re: [mpich-discuss] core 2 quad
and other multiple core processors<BR></FONT><BR></DIV>
<DIV></DIV>I wonder why ch3:nemesis or ch3:ssm is the default in MPICH. Why
ch3:socket?
<DIV><BR></DIV>
<DIV>Robert</DIV>
<DIV><BR></DIV>
<DIV>
<DIV>
<DIV>On Jul 2, 2008, at 10:59 PM, Rajeev Thakur wrote:</DIV><BR
class=Apple-interchange-newline>
<BLOCKQUOTE type="cite">
<DIV dir=ltr align=left><SPAN class=812315802-03072008><FONT face=Arial
color=#0000ff size=2>For best performance, configure with
--with-device=ch3:nemesis. It will use the Nemesis device within MPICH2 that
communicates using shared memory within a node and TCP across
nodes.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=812315802-03072008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=812315802-03072008><FONT face=Arial
color=#0000ff size=2>Rajeev</FONT></SPAN></DIV><BR>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> owner-mpich-discuss@mcs.anl.gov [<A
href="mailto:owner-mpich-discuss@mcs.anl.gov">mailto:owner-mpich-discuss@mcs.anl.gov</A>]
<B>On Behalf Of </B>Ariovaldo de Souza Junior<BR><B>Sent:</B> Wednesday,
July 02, 2008 3:15 PM<BR><B>To:</B> <A
href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A><BR><B>Subject:</B>
[mpich-discuss] core 2 quad and other multiple core
processors<BR></FONT><BR></DIV>
<DIV></DIV>Hello everybody!<BR><BR>I'm really a newbie on clustering, so I
have some, let's say, stupid questions. When I'm starting a job like this
"mpiexec -l -n 6 ./cpi" in my small cluster of (until now) 6 core 2 quad
machines, I'm sending 1 process to each node, right? Assuming that I'm
correct, each process will utilize only 1 core of each node? and how to
make 1 process run utilizing the whole processing capacity of the
processor, the 4 cores? is there a way to do this? or I'll always utilize
just one processor for each process? if I change this submission to
"mpiexec -l -n 24 ./cpi" then the same process will run 24 times, 4 times
per node (maybe simultaneously) and one process per core,
right?<BR><BR>I'm asking all this because I think it is a bit strange to
see the processing time increasing each time I put one more process to
run, once in my mind it should be the contrary. I'll give some
examples:<BR><BR>mpiexec -n 1 ./cpi<BR>wall clock time =
0.000579<BR><BR>mpiexec -n 2 ./cpi<BR>wall clock time =
0.002442<BR><BR>mpiexec -n 3 ./cpi<BR>wall clock time =
0.004568<BR><BR>mpiexec -n 4 ./cpi<BR>wall clock time =
0.005150<BR><BR>mpiexec -n 5 ./cpi<BR>wall clock time =
0.008923<BR><BR>mpiexec -n 6 ./cpi<BR>wall clock time =
0.009309<BR><BR>mpiexec -n 12 ./cpi<BR>wall clock time =
0.019445<BR><BR>mpiexec -n 18 ./cpi<BR>wall clock time =
0.032204<BR><BR>mpiexec -n 24 ./cpi<BR>wall clock time =
0.045413<BR><BR>mpiexec -n 48 ./cpi<BR>wall clock time =
0.089815<BR><BR>mpiexec -n 96 ./cpi<BR>wall clock time =
0.218894<BR><BR>mpiexec -n 192 ./cpi<BR>wall clock time =
0.492870<BR><BR>So, as you all can see is that as more processes I add,
more time it takes, what makes me think that mpi is performing this test
192 times in the end and due to this the time increased. Is that correct
that mpi performed the same test 192? Or did it divide the process into
192 pieces, calculated and then gathered the results and mounted the
output again? I really would like to understand this relationship
processor # x process # x .<BR><BR>I have the feeling that my questions
are a bit "poor" and really from a newbie, but the answer will help me on
utilizing other programs that will need mpi to run.<BR><BR>Thanks to
all!<BR><BR>Ari - UFAM -
Brazil<BR><BR></BLOCKQUOTE></BLOCKQUOTE></DIV><BR></DIV></BLOCKQUOTE></BODY></HTML>