<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Aug 7, 2020, at 12:26 PM, Nidish <<a href="mailto:nb25@rice.edu" class="">nb25@rice.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">

    <meta http-equiv="Content-Type" content="text/html;

      charset=windows-1252" class="">

  <div class=""><p class=""><br class="">

    </p>

    <div class="moz-cite-prefix">On 8/7/20 8:52 AM, Barry Smith wrote:<br class="">

    </div>

    <blockquote type="cite" cite="mid:6A7D902E-FACE-4778-B89A-90B043ED31C0@petsc.dev" class="">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252" class="">

      <br class="">

      <div class=""><br class="">

        <blockquote type="cite" class="">

          <div class="">On Aug 7, 2020, at 1:25 AM, Nidish <<a href="mailto:nb25@rice.edu" class="" moz-do-not-send="true">nb25@rice.edu</a>> wrote:</div>

          <br class="Apple-interchange-newline">

          <div class="">

            <meta http-equiv="Content-Type" content="text/html;

              charset=windows-1252" class="">

            <div class=""><p class="">Indeed - I was just using the default solver

                (GMRES with ILU).</p><p class="">Using just standard LU (direct solve with

                "-pc_type lu -ksp_type preonly"), I find elemental to be

                extremely slow even for a 1000x1000 matrix. </p>

            </div>

          </div>

        </blockquote>

        <div class=""><br class="">

        </div>

        What about on one process? <br class="">

      </div>

    </blockquote>

    <font color="#0d29db" class="">On just one process the performance is

      comparable.</font><br class="">

    <blockquote type="cite" cite="mid:6A7D902E-FACE-4778-B89A-90B043ED31C0@petsc.dev" class="">

      <div class=""><br class="">

      </div>

      <div class="">Elemental generally won't be competitive for such tiny

        matrices. <br class="">

        <blockquote type="cite" class="">

          <div class="">

            <div class=""><p class="">For MPIaij it's throwing me an error if I

                tried "-pc_type lu".</p>

            </div>

          </div>

        </blockquote>

        <div class=""><br class="">

        </div>

           Yes, there is no PETSc code for sparse parallel direct

        solver, this is expected.</div>

      <div class=""><br class="">

      </div>

      <div class="">   What about ?</div>

      <div class=""><br class="">

      </div>

      <div class="">

        <blockquote type="cite" class="">

          <div class="">

            <blockquote class=""><p class="">mpirun -n 1 ./ksps -N 1000 -mat_type mpidense

                -pc_type jacobi</p>

              <div class="">mpirun -n 4 ./ksps -N 1000 -mat_type

                mpidense -pc_type jacobi</div>

            </blockquote>

          </div>

        </blockquote>

      </div>

    </blockquote>

    <font color="#0d29db" class="">Same results - the elemental version is MUCH

      slower (for 1000x1000). </font><br class="">

    <blockquote type="cite" cite="mid:6A7D902E-FACE-4778-B89A-90B043ED31C0@petsc.dev" class="">

      <div class="">Where will your dense matrices be coming from and how big

        will they be in practice? This will help determine if an

        iterative solver is appropriate. If they will be 100,000 for

        example then testing with 1000 will tell you nothing useful, you

        need to test with the problem size you care about.</div>

    </blockquote><p class=""><font color="#0d29db" class="">The matrices in my application arise from

        substructuring/Component Mode Synthesis conducted on a system

        that is linear "almost everywhere", for example jointed systems.

        The procedure we follow is: build a mesh & identify the

        nodes corresponding to the interfaces, reduce the model using

        component mode synthesis to obtain a representation of the

        system using just the interface degrees-of-freedom along with

        some (~10s) generalized "modal coordinates". We conduct the

        non-linear analyses (transient, steady state harmonic, etc.)

        using this matrices. <br class="">

      </font></p><p class=""><font color="#0d29db" class="">I am interested in conducting non-linear

        mesh convergence for a particular system of interest wherein the

        interface DoFs are, approx, 4000, 8000, 12000, 16000. I'm fairly

        certain the dense matrices will not be larger. The <br class=""></font></p></div></div></blockquote><div><br class=""></div>   Ok, so it is not clear how well conditioned these dense matrices will be. </div><div><br class=""></div><div>    There are three questions that need to be answered.</div><div><br class=""></div><div>1) for your problem can iterative methods be used and will they require less work than direct solvers.</div><div><br class=""></div><div>       For direct LU the work is order N^3 to do the factorization with a relatively small constant. Because of smart organization inside dense LU the flops can be done very efficiently. </div><div><br class=""></div><div>       For GMRES with Jacobi preconditioning the work is order N^2 (the time for a dense matrix-vector product) for each iteration. If the number of iterations small than the total work is much less than a direct solver. In the worst case the number of iterations is order N so the total work is order N^3, the same order as a direct method.  But the efficiency of a dense matrix-vector product is much lower than the efficiency of a LU factorization so even if the work is the same order it can take longer.  One should use mpidense as the matrix format for iterative.</div><div><br class=""></div><div>       With iterative methods YOU get to decide how accurate you need your solution, you do this by setting how small you want the residual to be (since you can't directly control the error). By default PETSc uses a relative decrease in the residual of 1e-5. </div><div><br class=""></div><div>2) for your size problems can parallelism help? </div><div><br class=""></div><div>    I think it should but elemental since it requires a different data layout has additional overhead cost to get the data into the optimal format for parallelism. </div><div><br class=""></div><div>3) can parallelism help on YOUR machine. Just because a machine has multiple cores it may not be able to utilize them efficiently for solvers if the total machine memory bandwidth is limited. </div><div><br class=""></div><div>   So the first thing to do is on the machine you plan to use for your computations run the streams benchmark discussed in <a href="https://www.mcs.anl.gov/petsc/documentation/faq.html#computers" class="">https://www.mcs.anl.gov/petsc/documentation/faq.html#computers</a> this will give us some general idea of how much parallelism you can take advantage of. Is the machine a parallel cluster or just a single node? </div><div><br class=""></div><div>   After this I'll give you a few specific cases to run to get a feeling for what approach would be best for your problems,</div><div><br class=""></div><div>   Barry</div><div><br class=""></div><div><br class=""></div><div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><p class=""><font color="#0d29db" class="">

      </font></p><p class=""><font color="#0d29db" class="">However for frequency domain simulations,

        we use matrices that are about 10 times the size of the original

        matrices (whose meshes have been shown to be convergent in

        static test cases). <br class="">

      </font></p><p class=""><font color="#0d29db" class="">Thank you,<br class="">

        Nidish</font><br class="">

    </p>

    <blockquote type="cite" cite="mid:6A7D902E-FACE-4778-B89A-90B043ED31C0@petsc.dev" class="">

      <div class=""><br class="">

      </div>

      <div class="">Barry</div>

      <div class=""><br class="">

        <blockquote type="cite" class="">

          <div class="">

            <div class=""><p class=""> I'm attaching the code here, in case you'd

                like to have a look at what I've been trying to do. <br class="">

              </p><p class="">The two configurations of interest are,</p>

              <blockquote class=""><p class="">$> mpirun -n 4 ./ksps -N 1000 -mat_type

                  mpiaij<br class="">

                  $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental</p>

              </blockquote><p class="">(for the GMRES with ILU) and,</p>

              <blockquote class=""><p class="">$> mpirun -n 4 ./ksps -N 1000 -mat_type

                  mpiaij -pc_type lu -ksp_type preonly<br class="">

                  $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental

                  -pc_type lu -ksp_type preonly</p>

              </blockquote><p class="">elemental seems to perform poorly in both

                cases.</p><p class="">Nidish<br class="">

              </p>

              <div class="moz-cite-prefix">On 8/7/20 12:50 AM, Barry

                Smith wrote:<br class="">

              </div>

              <blockquote type="cite" cite="mid:85F9F817-2754-4F55-9222-3E23003E79FD@petsc.dev" class="">

                <meta http-equiv="Content-Type" content="text/html;

                  charset=windows-1252" class="">

                <div class=""><br class="">

                </div>

                <div class="">  What is the output of -ksp_view  for the

                  two case?</div>

                <div class=""><br class="">

                </div>

                <div class="">  It is not only the matrix format but

                  also the matrix solver that matters. For example if

                  you are using an iterative solver the elemental format

                  won't be faster, you should use the PETSc MPIDENSE

                  format. The elemental format is really intended when

                  you use a direct LU solver for the matrix. For tiny

                  matrices like this an iterative solver could easily be

                  faster than the direct solver, it depends on the

                  conditioning (eigenstructure) of the dense matrix.

                  Also the default PETSc solver uses block Jacobi with

                  ILU on each process if using a sparse format, ILU

                  applied to a dense matrix is actually LU so your

                  solver is probably different also between the MPIAIJ

                  and the elemental. </div>

                <div class=""><br class="">

                </div>

                <div class="">  Barry</div>

                <div class=""><br class="">

                </div>

                <div class=""><br class="">

                </div>

                <div class="">  <br class="">

                  <div class=""><br class="">

                    <blockquote type="cite" class="">

                      <div class="">On Aug 7, 2020, at 12:30 AM, Nidish

                        <<a href="mailto:nb25@rice.edu" class="" moz-do-not-send="true">nb25@rice.edu</a>>

                        wrote:</div>

                      <br class="Apple-interchange-newline">

                      <div class="">

                        <div style="zoom: 0%;" class="">

                          <div dir="auto" class="">Thank you for the

                            response.<br class="">

                            <br class="">

                          </div>

                          <div dir="auto" class="">I've just been

                            running some tests with matrices up to 2e4

                            dimensions (dense). When I compared the

                            solution times for "-mat_type elemental" and

                            "-mat_type mpiaij" running with 4 cores, I

                            found the mpidense versions running way

                            faster than elemental. I have not been able

                            to make the elemental version finish up for

                            2e4 so far (my patience runs out faster). <br class="">

                            <br class="">

                          </div>

                          <div dir="auto" class="">What's going on here?

                            I thought elemental was supposed to be

                            superior for dense matrices.<br class="">

                            <br class="">

                          </div>

                          <div dir="auto" class="">I can share the code

                            if that's appropriate for this forum (sorry,

                            I'm new here). <br class="">

                            <br class="">

                          </div>

                          <div dir="auto" class="">Nidish</div>

                          <div class="gmail_quote">On Aug 6, 2020, at

                            23:01, Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank" class="" moz-do-not-send="true">bsmith@petsc.dev</a>>

                            wrote:

                            <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex;

                              border-left: 1px solid rgb(204, 204, 204);

                              padding-left: 1ex;">

                              <pre class="blue"><blockquote class="gmail_quote" style="margin: 0pt 0pt 1ex 0.8ex; border-left: 1px solid #729fcf; padding-left: 1ex;"> On Aug 6, 2020, at 7:32 PM, Nidish <<a href="mailto:nb25@rice.edu" class="" moz-do-not-send="true">nb25@rice.edu</a>> wrote:

 I'm relatively new to PETSc, and my applications involve (for the most part) dense matrix solves.

 I read in the documentation that this is an area PETSc does not specialize in but instead recommends external libraries such as Elemental. I'm wondering if there are any "best" practices in this regard. Some questions I'd like answered are:

 1. Can I just declare my dense matrix as a sparse one and fill the whole matrix up? Do any of the others go this route? What're possible pitfalls/unfavorable outcomes for this? I understand the memory overhead probably shoots up.

</blockquote>

  No, this isn't practical, the performance will be terrible.

<blockquote class="gmail_quote" style="margin: 0pt 0pt 1ex 0.8ex; border-left: 1px solid #729fcf; padding-left: 1ex;"> 2. Are there any specific guidelines on when I can expect elemental to perform better in parallel than in serial?

</blockquote>

  Because the computation to communication ratio for dense matrices is higher than for sparse you will see better parallel performance for dense problems of a given size than sparse problems of a similar size. In other words parallelism can help for dense matrices for relatively small problems, of course the specifics of your machine hardware and software also play a role.

   Barry

<blockquote class="gmail_quote" style="margin: 0pt 0pt 1ex 0.8ex; border-left: 1px solid #729fcf; padding-left: 1ex;"> 

 Of course, I'm interesting in any other details that may be important in this regard.

 Thank you,

 Nidish

</blockquote>

</pre>

                            </blockquote>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </div>

                  <br class="">

                </div>

              </blockquote>

              <div class="moz-signature">-- <br class="">

                Nidish</div>

            </div>

            <span id="cid:FF8DD1F1-CA48-405B-8E23-364936BB6B64" class=""><ksps.cpp></span></div>

        </blockquote>

      </div>

      <br class="">

    </blockquote>

    <div class="moz-signature">-- <br class="">

      Nidish</div>

  </div>

</div></blockquote></div><br class=""></body></html>