<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Dear Matthew,<br>

    <br>

    using either "-build_twosided allreduce" or "-build_twosided

    redscatter" workarounds the Intel MPI internal error!<br>

    <br>

    Besides, with Intel MPI, the weak scaling figures look like:<br>

    <br>

    **preconditioner set up**<br>

    [0.8865120411, 2.598261833, 3.780511856]<br>

    <br>

    **PCG stage**<br>

    [0.5701429844, 1.771002054, 2.292135954]<br>

    <br>

    **number of PCG iterations**<br>

    [9,14,15]<br>

    <br>

    which now I consider much more reasonable, in particular, for the

    preconditioner set-up stage, compared to the ones<br>

    that I obtained with OpenMPI 1.10.7 and "-build_twosided ibarrier",

    below.<br>

    <br>

    Best regards,<br>

     Alberto.<br>

    <br>

    <br>

    <br>

    <div class="moz-cite-prefix">On 19/11/18 12:26, Matthew Knepley

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAMYG4GmtjzFvzY6eLhzGwkohqQzvpHvD1eu8ZqVrw5N1Ziyx1A@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div dir="ltr">

          <div class="gmail_quote">

            <div dir="ltr">On Mon, Nov 19, 2018 at 5:52 AM "Alberto F.

              Martín" <<a moz-do-not-send="true"

                href="mailto:amartin@cimne.upc.edu">amartin@cimne.upc.edu</a>>

              wrote:<br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"> Dear Mark, Dear Matthew, <br>

                <br>

                in order to discard load imbalance as the cause of the

                reported weak scaling issue in the GAMG preconditioner <br>

                set-up stage (as you said, we were feeding GAMG with a

                suboptimal mesh distribution, having empty processors, <br>

                among others), we simplified the weak scaling test by

                considering the standard body-fitted trilinear (Q1) FE

                discretization <br>

                of the 3D Poisson problem on a unit cube discretized

                with a <b><i>uniform, structured hexahedral mesh,</i></b>

                <i><b>partitioned</b></i><i><b><br>

                  </b></i><i><b>optimally (by hand) among processors</b></i>,

                with a fixed load/core of 30**3 hexahedra/core. Thus,

                all processors have the same load<br>

                (up-to strong Dirichlet boundary conditions on the

                subdomains touching the global boundary), and the

                edge-cut is minimum.<br>

                <br>

                We used the following GAMG preconditioner options:<br>

                <br>

                -pc_type gamg<br>

                -pc_gamg_type agg<br>

                -pc_gamg_est_ksp_type cg<br>

                -mg_levels_esteig_ksp_type cg<br>

                -mg_coarse_sub_pc_type cholesky<br>

                -mg_coarse_sub_pc_factor_mat_ordering_type nd<br>

                -pc_gamg_process_eq_limit 50<br>

                -pc_gamg_square_graph 10<br>

                -pc_gamg_agg_nsmooths 1<br>

                <br>

                The results that we obtained for 48 (4x4x3 subdomains), 

                10,368 (24x24x18 subdomains),  and 16,464 <br>

                (28x28x21 subdomains) CPU cores are as follows:<br>

                <br>

                **preconditioner set up**<br>

                [0.9844961860, <b>7.017674042</b>, <b>12.10154881</b>]<br>

                <br>

                **PCG stage**<br>

                [0.5849160422, 1.515251888, 1.859617710]<br>

                <br>

                **number of PCG iterations**<br>

                [9,14,15]<br>

                <br>

                As you can observe, <b>there is still a significant

                  time increase when scaling the problem from 48 to

                  10K/16K MPI tasks</b><b><br>

                </b><b>for the preconditioner setup stage. </b>This

                time increase is not as significant for the PCG stage.</div>

            </blockquote>

            <div><br>

            </div>

            <div>Actually, its almost perfect for PCG, given that</div>

            <div><br>

            </div>

            <div>  14/9 * 0.98   = 1.52 (Eff 100%)</div>

            <div>  15/14 * 1.52 = 1.63 (Eff 88%)</div>

            <div><br>

            </div>

            <div>Mark would have better comments on the scalability of

              the setup stage.</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"><b> </b>Please find attached the

                combined <br>

                output of -ksp_view and -log_view for these three points

                of the weak scaling curve.<br>

                <br>

                Given these results, I am starting to suspect that

                something within the underlying software + hardware

                stack might be<br>

                responsible for this. I am using OpenMPI 1.10.7 + Intel

                compilers version 18.0. The underlying supercomputer is

                MN-IV at <br>

                BSC (<a moz-do-not-send="true"

                  class="gmail-m_7803405081365825348moz-txt-link-freetext"

href="https://www.bsc.es/marenostrum/marenostrum/technical-information"

                  target="_blank">https://www.bsc.es/marenostrum/marenostrum/technical-information</a>).

                Have you ever conducted a weak scaling test<br>

                of GAMG with OpenMPI on a similar computer architecture?

                Can you share your experience with us? <br>

                (versions tested, outcome, etc.)<br>

                <br>

                We also tried an alternative MPI library, Intel(R) MPI

                Library for Linux* OS, Version 2018 Update 4 Build

                20180823 (id: 18555),<br>

                <b>without success. </b>For this MPI library, the

                preconditioner set-up stage crashes (find attached stack

                frames, and internal MPI library<br>

                errors) for the largest two core counts (it did not

                crash for the 48 CPU cores case), while it did not crash

                with OpenMPI 1.10.7.<br>

                Have you ever experienced errors like the ones<br>

                attached? Is there anyway to set up PETSc such that the

                subroutine that crashes is replaced by an alternative

                implementation of<br>

                the same concept? (this would be just a workaround). </div>

            </blockquote>

            <div><br>

            </div>

            <div>It certainly feels like an MPI (or driver) bug:</div>

            <div><br>

            </div>

            <div>

              <pre class="gmail-aLF-aPX-K0-aPE" style="font-family:"Courier New",Courier,monospace,arial,sans-serif;margin-top:0px;margin-bottom:0px;white-space:pre-wrap;word-wrap:break-word;color:rgb(0,0,0);font-size:14px">libmpi.so.12       00007F460E3AEBC8  PMPIDI_CH3I_Progr     Unknown  Unknown

libmpi.so.12       00007F460E72B90C  MPI_Testall           Unknown  Unknown

libpetsc.so.3.9.0  00007F4607391FFE  PetscCommBuildTwo     Unknown  Unknown

</pre>

              <br class="gmail-Apple-interchange-newline">

            </div>

            <div>You can try another variant using</div>

            <div><br>

            </div>

            <div>  -build_twosided <allreduce|ibarrier|redscatter></div>

            <div><br>

            </div>

            <div>I think ibarrier is currently the default if supported,

              but -help should tell you.</div>

            <div><br>

            </div>

            <div>  Thanks,</div>

            <div><br>

            </div>

            <div>    Matt</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF">It might be a BUG in the Intel MPI

                library, although I cannot confirm it. We also got <br>

                these errors with the unfitted FEM+space-filling curves

                version of our code.<br>

                <br>

                Thanks a lot for your help and valuable feedback!<br>

                Best regards,<br>

                 Alberto.<br>

                <br>

                <br>

                <br>

                <br>

                <br>

                <br>

                <br>

                <br>

                <br>

                <div class="gmail-m_7803405081365825348moz-cite-prefix">On

                  08/11/18 17:29, Mark Adams wrote:<br>

                </div>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div class="gmail_quote">

                      <blockquote class="gmail_quote" style="margin:0px

                        0px 0px 0.8ex;border-left:1px solid

                        rgb(204,204,204);padding-left:1ex">

                        <div bgcolor="#FFFFFF"><br>

                          I did not configured PETSc with ParMetis

                          support. Should I? <br>

                          <br>

                          I figured it out when I tried to use

                          "-pc_gamg_repartition". PETSc complained that

                          it was not compiled with ParMetis support.</div>

                      </blockquote>

                      <div><br>

                      </div>

                      <div>You need ParMetis, or some parallel mesh

                        partitioner, configured to use repartitioning. I

                        would guess that "-pc_gamg_repartition" would

                        not help and might hurt, because it just does

                        the coarse grids, not the fine grid. But it is

                        worth a try. Just configure with

                        --download-parmetis</div>

                      <div><br>

                      </div>

                      <div>The problem is that you are using space

                        filling curves on the background grid and are

                        getting empty processors. Right?  The mesh setup

                        phase is not super optimized, but your times</div>

                      <div><br>

                      </div>

                      <div>And you said in your attachment that you

                        added the near null space, but just the constant

                        vector. I trust you mean the three translational

                        rigid body modes. That is the default and so you

                        should not see any difference. If you added one

                        vector of all 1s then that would be bad. You

                        also want the rotational rigid body modes. Now,

                        you are converging pretty well and if your

                        solution does not have much rotation in it the

                        the rotational modes are not needed, but they

                        are required for optimality in general.</div>

                      <div> <br>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <br>

                <pre class="gmail-m_7803405081365825348moz-signature" cols="72">-- 

Alberto F. Martín-Huertas

Senior Researcher, PhD. Computational Science

Centre Internacional de Mètodes Numèrics a l'Enginyeria (CIMNE)

Parc Mediterrani de la Tecnologia, UPC

Esteve Terradas 5, Building C3, Office 215,

08860 Castelldefels (Barcelona, Spain)

Tel.: (+34) 9341 34223

<a moz-do-not-send="true" class="gmail-m_7803405081365825348moz-txt-link-abbreviated" href="mailto:e-mail:amartin@cimne.upc.edu" target="_blank">e-mail:amartin@cimne.upc.edu</a>

FEMPAR project co-founder

web: <a moz-do-not-send="true" class="gmail-m_7803405081365825348moz-txt-link-freetext" href="http://www.fempar.org" target="_blank">http://www.fempar.org</a> 

________________

IMPORTANT NOTICE

All personal data contained on this mail will be processed confidentially and registered in a file property of CIMNE in

order to manage corporate communications. You may exercise the rights of access, rectification, erasure and object by

letter sent to Ed. C1 Campus Norte UPC. Gran Capitán s/n Barcelona.

</pre>

              </div>

            </blockquote>

          </div>

          <br clear="all">

          <div><br>

          </div>

          -- <br>

          <div dir="ltr" class="gmail_signature">

            <div dir="ltr">

              <div>

                <div dir="ltr">

                  <div>

                    <div dir="ltr">

                      <div>What most experimenters take for granted

                        before they begin their experiments is

                        infinitely more interesting than any results to

                        which their experiments lead.<br>

                        -- Norbert Wiener</div>

                      <div><br>

                      </div>

                      <div><a moz-do-not-send="true"

                          href="http://www.cse.buffalo.edu/%7Eknepley/"

                          target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>

                      </div>

                    </div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Alberto F. Martín-Huertas

Senior Researcher, PhD. Computational Science

Centre Internacional de Mètodes Numèrics a l'Enginyeria (CIMNE)

Parc Mediterrani de la Tecnologia, UPC

Esteve Terradas 5, Building C3, Office 215,

08860 Castelldefels (Barcelona, Spain)

Tel.: (+34) 9341 34223

<a class="moz-txt-link-abbreviated" href="mailto:e-mail:amartin@cimne.upc.edu">e-mail:amartin@cimne.upc.edu</a>

FEMPAR project co-founder

web: <a class="moz-txt-link-freetext" href="http://www.fempar.org">http://www.fempar.org</a> 

________________

IMPORTANT NOTICE

All personal data contained on this mail will be processed confidentially and registered in a file property of CIMNE in

order to manage corporate communications. You may exercise the rights of access, rectification, erasure and object by

letter sent to Ed. C1 Campus Norte UPC. Gran Capitán s/n Barcelona.

</pre>

  </body>

</html>