Yes, MPI_Barrier runs perfectly. There is another question: Is quite normal getting 100 seconds on parallel proceesing using 10/100 network and 20-30 seconds running over multiprocessor machine over sock channel and 2-3 seconds over shared memory, at the same task?

<br><br>The difference is so big? Thats normal?<br><br>Thanks.<br><br><div><span class="gmail_quote">On 3/14/07, <b class="gmail_sendername">Rajeev Thakur</b> &lt;<a href="mailto:thakur@mcs.anl.gov">thakur@mcs.anl.gov</a>

&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div>

<div dir="ltr" align="left"><span><font color="#0000ff" face="Arial" size="2">With 200,000 iterations, the sleep(1) probably causes some 

skew that cause the bcast and gather to go out of sync. Another experiment to 

try is to add an MPI_Barrier either at the beginning or end of the loop (for 

each iteration), with the sleep(1) still there.</font></span></div>

<div dir="ltr" align="left"><span><font color="#0000ff" face="Arial" size="2"></font></span>&nbsp;</div>

<div dir="ltr" align="left"><span><font color="#0000ff" face="Arial" size="2">Rajeev</font></span></div>

<div dir="ltr" align="left"><span>&nbsp;</span></div><br>

<blockquote style="border-left: 2px solid rgb(0, 0, 255); padding-left: 5px; margin-left: 5px; margin-right: 0px;">

  <div dir="ltr" align="left" lang="en-us">

  <hr>

  <font face="Tahoma" size="2"><span class="q"><b>From:</b> Bruno Simioni 

  [mailto:<a href="mailto:brunosimioni@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">brunosimioni@gmail.com</a>] <br></span><b>Sent:</b> Wednesday, March 14, 2007 

  12:54 PM<br><b>To:</b> Rajeev Thakur<br><b>Subject:</b> Re: [MPICH2 Req #3260] 

  Re: [MPICH] About ch3:nemesis.<br></font><br></div><div><span class="e" id="q_1115198ee37d987b_3">

  <div></div>Hey Rajeev,<br><br>About questions:<br><br>Yes, about 200000 

  iterations.<br><br>What happens if there is small numbers of iterations, I&#39;ll 

  unable to realize the problem. The large number of iteration acumulates the 

  problem. <br><br>Dummy computation: I&#39;ll put a for() loop later. now the lab 

  is busy. Do you believe that the fact of the thread sleep causes that 

  late?&nbsp; &#39;cause the same program running at one only machine is so fast 

  that using the network. <br><br>Bruno.<br><br>

  <div><span class="gmail_quote">On 3/14/07, <b class="gmail_sendername">Rajeev 

  Thakur</b> &lt;<a href="mailto:thakur@mcs.anl.gov" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)"> thakur@mcs.anl.gov</a>&gt; 

  wrote:</span> 

  <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

    <div>

    <div dir="ltr" align="left"><span><font color="#0000ff" face="Arial" size="2">Are you 

    running for a large number of iterations of the for() loop? What happens if 

    you run just 1 iteration or a small number of iterations (say 5)? Also, what 

    happens if you replace the sleep(1) with some dummy computation that takes 1 

    sec?</font></span></div>

    <div dir="ltr" align="left"><span><font color="#0000ff" face="Arial" size="2"></font></span>&nbsp;</div>

    <div dir="ltr" align="left"><span><font color="#0000ff" face="Arial" size="2">Rajeev</font></span></div>

    <div dir="ltr" align="left"><span></span>&nbsp;</div><br>

    <blockquote style="border-left: 2px solid rgb(0, 0, 255); padding-left: 5px; margin-left: 5px; margin-right: 0px;">

      <div dir="ltr" align="left" lang="en-us">

      <hr>

      <font face="Tahoma" size="2"><b>From:</b> Bruno Simioni [mailto:<a href="mailto:brunosimioni@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">brunosimioni@gmail.com</a>] <br><b>Sent:

</b> Tuesday, March 

      13, 2007 9:37 PM<br><b>To:</b> Darius Buntinas<br><b>Cc:</b> <a href="mailto:mpich2-maint@mcs.anl.gov" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">mpich2-maint@mcs.anl.gov</a><br><b>Subject:

</b> [MPICH2 Req 

      #3260] Re: [MPICH] About ch3:nemesis.<br><b>Importance:</b> 

      High<br></font><br></div>

      <div><span>

      <div></div>Hi!<br><br>Yeah, you&#39;re correct. My problem is described by 

      second situation.<br><br>3 nodes, with one processor per node, and one 

      process per processor.<br><br>The program use not MPI_Recv or MPI_Send, 

      but MPI_Gather and MPI_Bcast. <br><br>if (myid == 0)<br>&nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; 

      {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

      stuff...<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

      for (...)<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; {<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; /* Receive 

      information from all nodes of communicator. */<br>&nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; 

      MPI_Gather(&amp;rx,1,MPI_DOUBLE,&amp;r,1,MPI_DOUBLE,0,MPI_COMM_WORLD,status);\ 

      <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

      Calculate FR, using Rx of MPI_Gather.<br>&nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; /* Send Fr to everybody,. */<br>&nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; MPI_Bcast (&amp;fr, nn+1, MPI_DOUBLE, 0, 

      MPI_COMM_WORLD);<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; Calculate 

      something and write file. <br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; }<br>&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; 

      MPI_Finalize();<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

      }<br>&nbsp;&nbsp;&nbsp; else<br>&nbsp;&nbsp;&nbsp;&nbsp; 

      {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

      stuff...<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; for 

      (...)<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; 

      {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

      /* Send rx to root */<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      MPI_Gather(&amp;rx,1,MPI_DOUBLE,&amp;r,1,MPI_DOUBLE,0,MPI_COMM_WORLD,status);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

      <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

      Calculate 

      something<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

      Sleep(1); /* For expand the program. In future, i&#39;ll change that, 

      replacing that with a for(). See results. */<br>&nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; /* Receive FR from root. */ <br>&nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; MPI_Bcast (&amp;fr, nn+1, MPI_DOUBLE, 0, 

      MPI_COMM_WORLD);<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

      Calculate something and write file.<br>&nbsp;&nbsp;&nbsp; 

      &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      }<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      MPI_Finalize();<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 

      }<br>}<br><br><br>Basically, that is the code.<br><br>And thats the 

      results:<br><br>I calculate the time of the main for() to estimate and 

      compare the time of parallel programming.<br><br>the program running on 3 

      machines - 77 seconds<br>the program running on 3 process under the same 

      machine using sock channel - 109s <br><br>the program running on 3 

      machines AND the Sleep(1) line - 1565s<br>the program running on 3 process 

      under the same machine using sock channel AND the Sleep(1) line - 

      295s<br><br>How to explain that results?<br><br>If you do not understand 

      the line Sleep(1), I&#39;ll explain. For now, the algoritm is not done yet. A 

      lot of operations if missing. So, to replace that, i put the Sleep(1) 

      time, and test.<br><br>It appears that, if the node expends a lot of time 

      without of communicating, to turn it on again, takes a lot of time, right? 

      <br><br>Bruno.<br><br><br>

      <div><span class="gmail_quote">On 3/13/07, <b class="gmail_sendername">Darius 

      Buntinas</b> &lt;<a href="mailto:buntinas@mcs.anl.gov" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">buntinas@mcs.anl.gov</a>&gt; wrote:</span> 

      <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>Just 

        so I understand, the master is doing something like 

        this:<br><br>MPI_Send(small msg to slave)<br>MPI_Recv(small answer from 

        slave)<br><br>If the slave does something like 

        this;<br><br>MPI_Recv(small msg from master) <br>/* no processing 

        */<br>MPI_Send(small answer to master)<br><br>The time for the master to 

        complete the send and receive is relatively<br>short.&nbsp;&nbsp;But if 

        the slave does something like this:<br><br>MPI_Recv(small msg from 

        master) <br>sleep(120)<br>MPI_Send(small answer to master)<br><br>Then 

        the time for the master to complet the send and receive is 

        much<br>longer than 120 seconds more than the first case.<br><br>Is this 

        right?<br><br>How many nodes are you using? <br>How many processors does 

        each node have?<br>How many processes are running on each 

        node?<br><br>If you can send us the simplest program that demonstrates 

        this behavior we<br>can take a look at it.<br><br>Darius<br><br>On Tue, 

        13 Mar 2007, Bruno Simioni wrote:<br><br>&gt; Hey 

        Darius,<br>&gt;<br>&gt; Thank you for your help. That really cleared the 

        concept for me.<br>&gt;<br>&gt; Threre is another thing.<br>&gt;<br>&gt; 

        That ir related to speed and performance problem. <br>&gt;<br>&gt; In my 

        programs, I realize that if I send an TCP packet across network<br>&gt; 

        several time, one after one, without any late, the communication 

        runs<br>&gt; perfect, but, if some node make some complex computing that 

        take a piece of <br>&gt; time, the communication has a great 

        late.<br>&gt;<br>&gt; For example:<br>&gt;<br>&gt; The master send 

        several times packets to node. The node process some little<br>&gt; 

        thing and aswer to master, sending a packet. (ok, the communication is 

        <br>&gt; perfect)<br>&gt;<br>&gt; The trouble situation:<br>&gt;<br>&gt; 

        The master send a packet to node. The node process a long time, and 

        answer.<br>&gt; The answer takes the time of processing and another 

        time. A kind of <br>&gt; overhead. It&#39;s sounds like something &quot;halted&quot; 

        the network and when requestet<br>&gt; &quot;turn it up&quot; 

        again.<br>&gt;<br>&gt; I&#39;m I correct?<br>&gt;<br>&gt; I&#39;m using windows 

        XP, and the latest version of MPICH2. <br>&gt;<br>&gt; 

        Thanks.<br>&gt;<br>&gt; Bruno, from Brazil.<br>&gt;<br>&gt; On 3/13/07, 

        Darius Buntinas &lt;<a href="mailto:buntinas@mcs.anl.gov" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">buntinas@mcs.anl.gov</a>&gt; 

        wrote:<br>&gt;&gt;<br>&gt;&gt;<br>&gt;&gt;&nbsp;&nbsp;A channel is a 

        communication method. For example, the default channel, 

        <br>&gt;&gt;&nbsp;&nbsp;sock, uses tcp sockets for communication, while 

        the shm channel<br>&gt;&gt;&nbsp;&nbsp;communicates using 

        shared-memory.<br>&gt;&gt;<br>&gt;&gt;&nbsp;&nbsp;Nemesis is a channel 

        that uses shared-memory to communicate within a node, 

        <br>&gt;&gt;&nbsp;&nbsp;and a network to communicate between 

        nodes.&nbsp;&nbsp;Currently Nemesis supports<br>&gt;&gt;&nbsp;&nbsp;tcp, 

        gm, mx, and elan networks.&nbsp;&nbsp;(Eventually these will be 

        selectable at<br>&gt;&gt;&nbsp;&nbsp;runtime, but for now the network 

        has to be selected when MPICH2 is 

        <br>&gt;&gt;&nbsp;&nbsp;compiled.)<br>&gt;&gt;<br>&gt;&gt;&nbsp;&nbsp;Does 

        that 

        help?<br>&gt;&gt;<br>&gt;&gt;&nbsp;&nbsp;-d<br>&gt;&gt;<br>&gt;&gt;&nbsp;&nbsp;On 

        Tue, 13 Mar 2007, Bruno Simioni wrote:<br>&gt;&gt;<br>&gt;&gt; 

        &gt;&nbsp;&nbsp;Hi!<br>&gt;&gt; &gt;<br>&gt;&gt; &gt;&nbsp;&nbsp;Can 

        anyone explain to me what channel is, in mpich2? and what for is 

        <br>&gt;&gt;&nbsp;&nbsp;that<br>&gt;&gt; 

        &gt;&nbsp;&nbsp;used?<br>&gt;&gt; &gt;<br>&gt;&gt; &gt;&nbsp;&nbsp;The 

        next question: What channel nemesis is?<br>&gt;&gt; &gt;<br>&gt;&gt; 

        &gt;&nbsp;&nbsp;thanks.<br>&gt;&gt; &gt;<br>&gt;&gt; 

        &gt;<br>&gt;&gt;<br>&gt;<br>&gt;<br>&gt;<br>&gt;<br></blockquote></div><br><br clear="all"><br>-- <br>Bruno. 

  </span></div></blockquote></div></blockquote></div><br><br clear="all"><br>-- 

  <br>Bruno. </span></div></blockquote></div>

</blockquote></div> <br clear="all"> --  Bruno.