<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""></div><div>  <a href="https://www.mcs.anl.gov/petsc/documentation/faq.html#computers" class="">https://www.mcs.anl.gov/petsc/documentation/faq.html#computers</a></div><div><br class=""></div><div>  In particular looking at the results of the parallel run I see </div><div><br class=""></div><div><div>Average time to get PetscTime(): 3.933e-07</div><div>Average time for MPI_Barrier(): 0.00498015</div><div>Average time for zero size MPI_Send(): 0.000194207</div><div><br class=""></div><div>  So the times for communication are huge. 4.9 milliseconds for a synchronization of twenty processes. A millisecond is an eternity for parallel computing. It is not clear to me that this system is appropriate for tightly couple parallel simulations.</div><div><br class=""></div><div>  Barry</div><div><br class=""></div><div><br class=""></div><div><br class=""></div></div><div><br class=""><blockquote type="cite" class=""><div class="">On Feb 3, 2021, at 2:40 PM, Luciano Siqueira <<a href="mailto:luciano.siqueira@usp.br" class="">luciano.siqueira@usp.br</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">

  
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">

  
  <div class=""><p class="">Here are the (attached) output of -log_view for both cases. The

      beginning of the files has some info from the libmesh app.<br class="">

    </p><p class="">Running in 1 node, 32 cores: 01_node_log_view.txt</p><p class="">Running in 20 nodes, 32 cores each (640 cores in total):

      01_node_log_view.txt</p><p class="">Thanks!</p><p class="">Luciano.<br class="">

    </p>

    <div class="moz-cite-prefix">Em 03/02/2021 16:43, Matthew Knepley

      escreveu:<br class="">

    </div>

    <blockquote type="cite" cite="mid:CAMYG4GkuQd-DCU25Q=uc2kFJpGzYvEZxUhj6=y9i6ChN6tLpfw@mail.gmail.com" class="">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8" class="">

      <div dir="ltr" class="">

        <div dir="ltr" class="">On Wed, Feb 3, 2021 at 2:42 PM Luciano Siqueira

          <<a href="mailto:luciano.siqueira@usp.br" moz-do-not-send="true" class="">luciano.siqueira@usp.br</a>>

          wrote:<br class="">

        </div>

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">Hello,<br class="">

            <br class="">

            I'm evaluating the performance of an application in a

            distributed <br class="">

            environment and I notice that it's much slower when running

            in many <br class="">

            nodes/cores when compared to a single node with a fewer

            cores.<br class="">

            <br class="">

            When running the application in 20 nodes, the Main Stage

            time reported <br class="">

            in PETSc's log is up to 10 times slower than it is when

            running the same <br class="">

            application in only 1 node, even with fewer cores per node.<br class="">

            <br class="">

            The application I'm running is an example code provided by

            libmesh:<br class="">

            <br class="">

            <a href="http://libmesh.github.io/examples/introduction_ex4.html" rel="noreferrer" target="_blank" moz-do-not-send="true" class="">http://libmesh.github.io/examples/introduction_ex4.html</a><br class="">

            <br class="">

            The application runs inside a Singularity container, with

            openmpi-4.0.3 <br class="">

            and PETSc 3.14.3. The distributed processes are managed by

            slurm <br class="">

            17.02.11 and each node is equipped with two Intel CPU Xeon

            E5-2695v2 Ivy <br class="">

            Bridge (12c @2,4GHz) and 128Gb of RAM, all communications

            going through <br class="">

            infiniband.<br class="">

            <br class="">

            My questions are: Is the slowdown expected? Should the

            application be <br class="">

            specially tailored to work well in distributed environments?<br class="">

            <br class="">

            Also, where (maybe in PETSc documentation/source-code) can I

            find <br class="">

            information on how PETSc handles MPI communications? Do the

            KSP solvers <br class="">

            favor one-to-one process communication over broadcast

            messages or <br class="">

            vice-versa? I suspect inter-process communication must be

            the cause of <br class="">

            the poor performance when using many nodes, but not as much

            as I'm seeing.<br class="">

            <br class="">

            Thank you in advance!<br class="">

          </blockquote>

          <div class=""><br class="">

          </div>

          <div class="">We can't say anything about the performance without some

            data. Please send us the output</div>

          <div class="">of -log_view for both cases.</div>

          <div class=""><br class="">

          </div>

          <div class="">  Thanks,</div>

          <div class=""><br class="">

          </div>

          <div class="">     Matt</div>

          <div class=""> </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            Luciano.<br class="">

            <br class="">

          </blockquote>

        </div>

        <br clear="all" class="">

        <div class=""><br class="">

        </div>

        -- <br class="">

        <div dir="ltr" class="gmail_signature">

          <div dir="ltr" class="">

            <div class="">

              <div dir="ltr" class="">

                <div class="">

                  <div dir="ltr" class="">

                    <div class="">What most experimenters take for granted before

                      they begin their experiments is infinitely more

                      interesting than any results to which their

                      experiments lead.<br class="">

                      -- Norbert Wiener</div>

                    <div class=""><br class="">

                    </div>

                    <div class=""><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank" moz-do-not-send="true" class="">https://www.cse.buffalo.edu/~knepley/</a><br class="">

                    </div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

  </div>


<span id="cid:9223A33E-E85B-43EF-B692-B75D91D3AC96@hsd1.il.comcast.net"><01_node_log_view.txt></span><span id="cid:3991303D-5330-4ABA-987D-43861ACF9D71@hsd1.il.comcast.net"><20_node_log_view.txt></span></div></blockquote></div><br class=""></body></html>