<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Feb 18, 2021, at 6:10 AM, Matthew Knepley <<a href="mailto:knepley@gmail.com" class="">knepley@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class="">On Thu, Feb 18, 2021 at 3:09 AM Roland Richter <<a href="mailto:roland.richter@ntnu.no" class="">roland.richter@ntnu.no</a>> wrote:<br class=""></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div class=""><p class="">Hei,</p><p class="">that was the reason for increased run times. When removing

      #pragma omp parallel for, my loop took ~18 seconds. When changing

      it to #pragma omp parallel for num_threads(2) or #pragma omp

      parallel for num_threads(4) (on a i7-6700), the loop took ~16 s,

      but when increasing it to #pragma omp parallel for num_threads(8),

      the loop took 28 s.</p><div class=""><br class="webkit-block-placeholder"></div></div></blockquote><div class="">Editorial: This is a reason I think OpenMP is inappropriate as a  tool for parallel computing (many people disagree). It makes resource management</div><div class="">difficult for the user and impossible for a library.</div></div></div></div></blockquote><div><br class=""></div>   It is possible to control these things properly with modern OpenMP APIs but, like MPI implementations, this can require some mucking around a beginner would not know about and the default settings can be terrible. MPI implementations are not better, their default bindings are generally horrible. </div><div><br class=""></div><div>  Barry</div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_quote"><div class=""><br class=""></div><div class="">  Thanks,</div><div class=""><br class=""></div><div class="">     Matt</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><p class="">Regards,</p><p class="">Roland<br class="">

    </p>

    <div class="">Am 17.02.21 um 18:51 schrieb Matthew

      Knepley:<br class="">

    </div>

    <blockquote type="cite" class="">

      <div dir="ltr" class="">Jed, is it possible that this is an

        oversubscription penalty from bad OpenMP settings? <said by a

        person who knows less about OpenMP than cuneiform>

        <div class=""><br class="">

        </div>

        <div class="">  Thanks,</div>

        <div class=""><br class="">

        </div>

        <div class="">     Matt</div>

      </div>

      <br class="">

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Wed, Feb 17, 2021 at 12:11

          PM Roland Richter <<a href="mailto:roland.richter@ntnu.no" target="_blank" class="">roland.richter@ntnu.no</a>> wrote:<br class="">

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div class="">

            <div class="">My PetscScalar is complex double (i.e. even higher

              penalty), but my matrix has a size of 8kk elements, so

              that should not an issue.<br class="">

              Regards,<br class="">

              Roland

              <hr style="display:inline-block;width:98%" class="">

              <div id="gmail-m_3012486240833435834gmail-m_2709721564415767467x_divRplyFwdMsg" dir="ltr" class=""><font style="font-size:11pt" face="Calibri,

                  sans-serif" class=""><b class="">Von:</b> Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank" class="">jed@jedbrown.org</a>><br class="">

                  <b class="">Gesendet:</b> Mittwoch, 17. Februar 2021 17:49:49<br class="">

                  <b class="">An:</b> Roland Richter; PETSc<br class="">

                  <b class="">Betreff:</b> Re: [petsc-users] Explicit linking to

                  OpenMP results in performance drop and wrong results</font>

                <div class=""> </div>

              </div>

            </div>

            <font size="2" class=""><span style="font-size:10pt" class="">

                <div class="">Roland Richter <<a href="mailto:roland.richter@ntnu.no" target="_blank" class="">roland.richter@ntnu.no</a>>

                  writes:<br class="">

                  <br class="">

                  > Hei,<br class="">

                  ><br class="">

                  > I replaced the linking line with<br class="">

                  ><br class="">

                  > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx 

                  -march=native -fopenmp-simd<br class="">

                  > -DMKL_LP64 -m64<br class="">

                  >

                  CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o

                  -o<br class="">

                  > bin/armadillo_with_PETSc <br class="">

                  >

                  -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib<br class="">

                  > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so

                  -lgfortran <br class="">

                  > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed

                  -lmkl_intel_lp64<br class="">

                  > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm

                  -ldl<br class="">

                  > /opt/boost/lib/libboost_filesystem.so.1.72.0<br class="">

                  > /opt/boost/lib/libboost_mpi.so.1.72.0<br class="">

                  > /opt/boost/lib/libboost_program_options.so.1.72.0<br class="">

                  > /opt/boost/lib/libboost_serialization.so.1.72.0<br class="">

                  > /opt/fftw3/lib64/libfftw3.so

                  /opt/fftw3/lib64/libfftw3_mpi.so<br class="">

                  > /opt/petsc_release/lib/libpetsc.so<br class="">

                  > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so<br class="">

                  > /<br class="">

                  ><br class="">

                  > and now the results are correct. Nevertheless,

                  when comparing the loop<br class="">

                  > in line 26-28 in file test_scaling.cpp<br class="">

                  ><br class="">

                  > /#pragma omp parallel for//<br class="">

                  > //    for(int i = 0; i < r_0 * r_1; ++i)//<br class="">

                  > //        *(out_mat_ptr + i) = (*(in_mat_ptr + i)

                  * scaling_factor);/<br class="">

                  ><br class="">

                  > the version without /#pragma omp parallel/ for is

                  significantly faster<br class="">

                  > (i.e. 18 s vs 28 s) compared to the version with

                  /omp./ Why is there<br class="">

                  > still such a big difference?<br class="">

                  <br class="">

                  Sounds like you're using a profile to attribute time?

                  Each `omp parallel` region incurs a cost ranging from

                  about a microsecond to 10 or more microseconds

                  depending on architecture, number of threads, and

                  OpenMP implementation. Your loop (for double

                  precision) operates at around 8 entries per clock

                  cycle (depending on architecture) if the operands are

                  in cache so the loop size r_0 * r_1 should be at least

                  10000 just to pay off the cost of `omp parallel`.<br class="">

                </div>

              </span></font>

          </div>

        </blockquote>

      </div>

      <br clear="all" class="">

      <div class=""><br class="">

      </div>

      -- <br class="">

      <div dir="ltr" class="">

        <div dir="ltr" class="">

          <div class="">

            <div dir="ltr" class="">

              <div class="">

                <div dir="ltr" class="">

                  <div class="">What most experimenters take for granted before

                    they begin their experiments is infinitely more

                    interesting than any results to which their

                    experiments lead.<br class="">

                    -- Norbert Wiener</div>

                  <div class=""><br class="">

                  </div>

                  <div class=""><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank" class="">https://www.cse.buffalo.edu/~knepley/</a><br class="">

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

  </div>

</blockquote></div><br clear="all" class=""><div class=""><br class=""></div>-- <br class=""><div dir="ltr" class="gmail_signature"><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class="">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br class="">-- Norbert Wiener</div><div class=""><br class=""></div><div class=""><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank" class="">https://www.cse.buffalo.edu/~knepley/</a><br class=""></div></div></div></div></div></div></div></div>

</div></blockquote></div><br class=""></body></html>