<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p><br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 10/10/2017 12:47 PM, Mark Adams

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CADOhEh72OR-T_8rqMzK8HSMO7rTWQFph=roY8dC10cRkRTVF9w@mail.gmail.com">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote"><span class=""></span><br>

            <span class=""></span>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex"><span

                class="">

              </span>   What are you comparing? Are you using say 32 MPI

              processes and 2 threads or 16 MPI processes and 4 threads?

              How are you controlling the number of OpenMP threads,

              OpenMP environmental variable? What parts of the time in

              the code are you comparing? You should just -log_view and

              compare the times for PCApply and PCSetUp() between say 64

              MPI process/1 thread and 32 MPI processes/2 threads and

              send us the output for those two cases.<br>

            </blockquote>

            <div><br>

            </div>

            <div>These folks don't use many MPI processes. I'm not sure

              what the optimal configuration is with Chombo-Crunch when

              using all of Cori.</div>

            <div><br>

            </div>

            <div>Baky: how many MPI processes per socket are you aiming

              for on Cori-KNL?</div>

            <div> </div>

          </div>

        </div>

      </div>

    </blockquote>

    right now I am testing it on a single KNL node going from flat 64+1

    to 2+32 for comparison.<br>

    But as you can see from the plot in the previous mail, we have a

    sweet spot at 16+4 point, then we scale that accordingly when

    running<br>

    with 8k nodes.<br>

    <br>

    <br>

    <br>

    <br>

    <blockquote type="cite"

cite="mid:CADOhEh72OR-T_8rqMzK8HSMO7rTWQFph=roY8dC10cRkRTVF9w@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <span class=""><br>

                ><br>

                > It seems that it made no difference, so perhaps I

                am doing something wrong or my build is not configured

                right.<br>

                ><br>

                > Do you have any example that makes use of threads

                when running hybrid and show an advantage?<br>

                <br>

              </span>   There is not reason to think that using threads

              on KNL is faster than just using MPI processes. Despite

              what the NERSc/LBL web pages may say, just because a

              website says something doesn't make it true.<br>

              <div class="HOEnZb">

                <div class="h5"><br>

                  <br>

                  ><br>

                  > I'd like to test it and make sure that my libs

                  are configured correctly, before start to investigate

                  it further.<br>

                  ><br>

                  ><br>

                  > Thanks,<br>

                  ><br>

                  > Baky<br>

                  ><br>

                  ><br>

                  <br>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>