<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hei,</p>
<p>that was the reason for increased run times. When removing
#pragma omp parallel for, my loop took ~18 seconds. When changing
it to #pragma omp parallel for num_threads(2) or #pragma omp
parallel for num_threads(4) (on a i7-6700), the loop took ~16 s,
but when increasing it to #pragma omp parallel for num_threads(8),
the loop took 28 s.</p>
<p>Regards,</p>
<p>Roland<br>
</p>
<div class="moz-cite-prefix">Am 17.02.21 um 18:51 schrieb Matthew
Knepley:<br>
</div>
<blockquote type="cite"
cite="mid:CAMYG4Gmej=YDgdPPb8P60-oViESJO32=VE6iAobG4qHuy7=L8g@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div dir="ltr">Jed, is it possible that this is an
oversubscription penalty from bad OpenMP settings? <said by a
person who knows less about OpenMP than cuneiform>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Feb 17, 2021 at 12:11
PM Roland Richter <<a href="mailto:roland.richter@ntnu.no"
moz-do-not-send="true">roland.richter@ntnu.no</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div>My PetscScalar is complex double (i.e. even higher
penalty), but my matrix has a size of 8kk elements, so
that should not an issue.<br>
Regards,<br>
Roland
<hr style="display:inline-block;width:98%">
<div id="gmail-m_2709721564415767467x_divRplyFwdMsg"
dir="ltr"><font style="font-size:11pt" face="Calibri,
sans-serif" color="#000000"><b>Von:</b> Jed Brown <<a
href="mailto:jed@jedbrown.org" target="_blank"
moz-do-not-send="true">jed@jedbrown.org</a>><br>
<b>Gesendet:</b> Mittwoch, 17. Februar 2021 17:49:49<br>
<b>An:</b> Roland Richter; PETSc<br>
<b>Betreff:</b> Re: [petsc-users] Explicit linking to
OpenMP results in performance drop and wrong results</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt">
<div>Roland Richter <<a
href="mailto:roland.richter@ntnu.no" target="_blank"
moz-do-not-send="true">roland.richter@ntnu.no</a>>
writes:<br>
<br>
> Hei,<br>
><br>
> I replaced the linking line with<br>
><br>
> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx
-march=native -fopenmp-simd<br>
> -DMKL_LP64 -m64<br>
>
CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o
-o<br>
> bin/armadillo_with_PETSc <br>
>
-Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib<br>
> /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so
-lgfortran <br>
> -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed
-lmkl_intel_lp64<br>
> -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm
-ldl<br>
> /opt/boost/lib/libboost_filesystem.so.1.72.0<br>
> /opt/boost/lib/libboost_mpi.so.1.72.0<br>
> /opt/boost/lib/libboost_program_options.so.1.72.0<br>
> /opt/boost/lib/libboost_serialization.so.1.72.0<br>
> /opt/fftw3/lib64/libfftw3.so
/opt/fftw3/lib64/libfftw3_mpi.so<br>
> /opt/petsc_release/lib/libpetsc.so<br>
> /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so<br>
> /<br>
><br>
> and now the results are correct. Nevertheless,
when comparing the loop<br>
> in line 26-28 in file test_scaling.cpp<br>
><br>
> /#pragma omp parallel for//<br>
> // for(int i = 0; i < r_0 * r_1; ++i)//<br>
> // *(out_mat_ptr + i) = (*(in_mat_ptr + i)
* scaling_factor);/<br>
><br>
> the version without /#pragma omp parallel/ for is
significantly faster<br>
> (i.e. 18 s vs 28 s) compared to the version with
/omp./ Why is there<br>
> still such a big difference?<br>
<br>
Sounds like you're using a profile to attribute time?
Each `omp parallel` region incurs a cost ranging from
about a microsecond to 10 or more microseconds
depending on architecture, number of threads, and
OpenMP implementation. Your loop (for double
precision) operates at around 8 entries per clock
cycle (depending on architecture) if the operands are
in cache so the loop size r_0 * r_1 should be at least
10000 just to pay off the cost of `omp parallel`.<br>
</div>
</span></font>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted before
they begin their experiments is infinitely more
interesting than any results to which their
experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="http://www.cse.buffalo.edu/~knepley/"
target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>