<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Hi Hong,</p>

    <p>Thanks very much for that example I appreciate it.</p>

    <p>I'll test that in a few days and come back with questions if

      needed,</p>

    <p><br>

    </p>

    <p>Thibaut<br>

    </p>

    <div class="moz-cite-prefix">On 25/07/2019 21:25,

      <a class="moz-txt-link-abbreviated" href="mailto:hong@aspiritech.org">hong@aspiritech.org</a> wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAGCphBuuDtR34NqMvn3zC8XiK3LQ6+yu=0snWw=vUbq9kFOqmw@mail.gmail.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr">Thibaut:<br>

        </div>

        <div>I added an example (in the

          branch hzhang/add-ksp-tutorials-ex6/master)</div>

        <div><a

href="https://bitbucket.org/petsc/petsc/commits/cf847786fd804b3606d0281d404c4763f36fe475?at=hzhang/add-ksp-tutorials-ex6/master"

            moz-do-not-send="true">https://bitbucket.org/petsc/petsc/commits/cf847786fd804b3606d0281d404c4763f36fe475?at=hzhang/add-ksp-tutorials-ex6/master</a><br>

        </div>

        <div><br>

        </div>

        <div>You can run it with</div>

        <div>mpiexec -n 2 ./ex6 -num_numfac 2 -pc_type lu

          -pc_factor_mat_solver_type mumps -ksp_monitor -log_view<br>

        </div>

        <div>...</div>

        <div>MatLUFactorSym         1 1.0 3.5911e-03 <br>

          MatLUFactorNum         2 1.0 6.3920e-03<br>

        </div>

        <div><br>

        </div>

        <div>This shows the code does one symbolic factorization and two

          numeric factorizations.</div>

        <div>For your convenience, the code ex6.c is attached below.</div>

        <div>Let me know if you have encounter any problems.</div>

        <div>Hong</div>

        <div><br>

        </div>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Thu, Jul 25, 2019 at 1:30

            PM Zhang, Hong via petsc-users <<a

              href="mailto:petsc-users@mcs.anl.gov"

              moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div>

              <div dir="ltr">

                <div dir="ltr">Thibaut:<br>

                </div>

                <div>I'm writing a simple example using KSP directly --

                  will send to you soon.</div>

                <div>Hong</div>

                <div class="gmail_quote">

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    <div bgcolor="#FFFFFF">

                      <p>Hi Hong,</p>

                      <p>That sounds like a more reasonable approach, I

                        had no idea the PETSc/MUMPS interface could

                        provide such a level of control on the solve

                        process. Therefore, after assembling the matrix

                        A_0, I should do something along the lines of:</p>

                      <p><font face="Courier New, Courier, monospace">MatGetFactor(A_0,

                          MATSOLVERMUMPS, MAT_FACTOR_LU, F)</font></p>

                      <p><font face="Courier New, Courier, monospace">MatLUFactorSymbolic(F,

                          A_0, NULL, NULL, info)</font></p>

                      <p><font face="Courier New, Courier, monospace">MatLUFactorNumeric(F,

                          A_0, info)</font></p>

                      <p>and then call MatSolve? However I don't

                        understand, I thought F would remain the same

                        during the whole process but it's an input

                        parameter of MatSolve so I'd need one F_m for

                        each A_m? Which is not what you mentioned (do

                        one symbolic factorization only)</p>

                      <p><br>

                      </p>

                      <p>On a side note, after preallocating and

                        assembling the first matrix, should I

                        create/assemble all the others with</p>

                      <p><font face="Courier New, Courier, monospace">MatDuplicate(A_0,

                          MAT_DO_NOT_COPY_VALUES, A_m)</font></p>

                      <p><font face="Courier New, Courier, monospace">Calls

                          to MatSetValues( ... )<br>

                        </font></p>

                      <font face="Courier New, Courier, monospace">MatAssemblyBegin(A_m,

                        MAT_FINAL_ASSEMBLY)<br>

                        MatAssemblyEnd(A_m, MAT_FINAL_ASSEMBLY)</font><br>

                      <br>

                      <p>Is that the recommended/most scalable way of

                        duplicating a matrix + its non-zero structure?<br>

                      </p>

                      <p><br>

                      </p>

                      <p>Thank you for your support and suggestions,</p>

                      <p>Thibaut</p>

                      <p><br>

                      </p>

                      <div

class="gmail-m_1409220144151092612gmail-m_7299842392715427480moz-cite-prefix">On

                        23/07/2019 18:38, Zhang, Hong wrote:<br>

                      </div>

                      <blockquote type="cite">

                        <div dir="ltr">

                          <div dir="ltr">Thibaut:</div>

                          <div class="gmail_quote">

                            <blockquote class="gmail_quote"

                              style="margin:0px 0px 0px

                              0.8ex;border-left:1px solid

                              rgb(204,204,204);padding-left:1ex">

                              Thanks for taking the time. I would

                              typically run that on a small <br>

                              cluster node of 16 or 32 physical cores

                              with 2 or 4 sockets. I use 16 or <br>

                              32 MPI ranks and bind them to cores.<br>

                              <br>

                              The matrices would ALL have the same size

                              and the same nonzero structure <br>

                              - it's just a few numerical values that

                              would differ.<br>

                            </blockquote>

                            <div>You may do one symbolic factorization

                              of A_m, use it in the m-i loop:</div>

                            <div>- numeric factorization of A_m</div>

                            <div>- solve A_m x_m,i = b_m,i</div>

                            <div>in mumps, numeric factorization and

                              solve are scalable. Repeated numeric

                              factorization of A_m are likely faster

                              than reading data files from the disc.</div>

                            <div>Hong</div>

                            <blockquote class="gmail_quote"

                              style="margin:0px 0px 0px

                              0.8ex;border-left:1px solid

                              rgb(204,204,204);padding-left:1ex">

                              <br>

                              This is a good point you've raised as I

                              don't think MUMPS is able to <br>

                              exploit that - I asked the question in

                              their users list just to be sure. <br>

                              There are some options in SuperLU dist to

                              reuse permutation arrays, but <br>

                              there's no I/O for that solver. And the

                              native PETSc LU solver is not <br>

                              parallel?<br>

                              <br>

                              I'm using high-order finite differences so

                              I'm suffering from a lot of <br>

                              fill-in, one of the reasons why storing

                              factorizations in RAM is not <br>

                              viable. In comparison, I have almost

                              unlimited disk space.<br>

                              <br>

                              I'm aware my need might seem

                              counter-intuitive, but I'm really willing

                              <br>

                              to sacrifice performance in the I/O part.

                              My code is already heavily <br>

                              based on PETSc (preallocation, assembly

                              for matrices/vectors) coupled <br>

                              with MUMPS I'm minimizing the loss of

                              efficiency.<br>

                              <br>

                              Thibaut<br>

                              <br>

                              On 23/07/2019 17:13, Smith, Barry F.

                              wrote:<br>

                              >    What types of computing systems

                              will you be doing the computations?

                              Roughly how many MPI_ranks?<br>

                              ><br>

                              > Are the matrices all the same size?

                              Do they have the same or different nonzero

                              structures? Would it be possible to use

                              the same symbolic representation for all

                              of them and just have different numerical

                              values?<br>

                              ><br>

                              >    Clusters and large scale computing

                              centers are notoriously terrible at IO;

                              often IO is orders of magnitude slower

                              than compute/memory making this type of

                              workflow unrealistically slow. From a cost

                              analysis point of view often just buying

                              lots of memory might be the most 

                              efficacious approach.<br>

                              ><br>

                              >    That said, what you suggest might

                              require only a few lines of code (though

                              determining where to put them is the

                              tricky part) depending on the MUMPS

                              interface for saving a filer to disk. What

                              we would do is keep the PETSc wrapper that

                              lives around the MUMPS matrix Mat_MUMPS

                              but using the MUMPS API save the

                              information in the DMUMPS_STRUC_C id; and

                              then reload it when needed.<br>

                              ><br>

                              >    The user level API could be

                              something like<br>

                              ><br>

                              >    MatMumpsSaveToDisk(Mat) and

                              MatMumpsLoadFromDisk(Mat) they would just

                              money with DMUMPS_STRUC_C id; item.<br>

                              ><br>

                              ><br>

                              >    Barry<br>

                              ><br>

                              ><br>

                              >> On Jul 23, 2019, at 9:24 AM,

                              Thibaut Appel via petsc-users <<a

                                href="mailto:petsc-users@mcs.anl.gov"

                                target="_blank" moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>

                              wrote:<br>

                              >><br>

                              >> Dear PETSc users,<br>

                              >><br>

                              >> I need to solve several linear

                              systems successively, with LU

                              factorization, as part of an iterative

                              process in my Fortran application code.<br>

                              >><br>

                              >> The process would solve M systems

                              (A_m)(x_m,i) = (b_m,i) for m=1,M at each

                              iteration i, but computing the LU

                              factorization of A_m only once.<br>

                              >> The RHSs (b_m,i+1) are computed

                              from all the different (x_m,i) and all

                              depend upon each other.<br>

                              >><br>

                              >> The way I envisage to perform

                              that is to use MUMPS to compute,

                              successively, each of the LU

                              factorizations (m) in parallel and store

                              the factors on disk,

                              creating/assembling/destroying the

                              matrices A_m on the go.<br>

                              >> Then whenever needed, read the

                              factors in parallel to solve the systems.

                              Since version 5.2, MUMPS has a

                              save/restore feature that allows that, see

                              <a

                                href="http://mumps.enseeiht.fr/doc/userguide_5.2.1.pdf"

                                rel="noreferrer" target="_blank"

                                moz-do-not-send="true">

http://mumps.enseeiht.fr/doc/userguide_5.2.1.pdf</a> p.20, 24 and 58.<br>

                              >><br>

                              >> In its current state, the

                              PETSc/MUMPS interface does not incorporate

                              that feature. I'm an advanced Fortran

                              programmer but not in C so I don't think I

                              would do an amazing job having a go inside

                              src/mat/impls/aij/mpi/mumps/mumps.c.<br>

                              >><br>

                              >> I was picturing something like

                              creating as many KSP objects as linear

                              systems to be solved, with some sort of

                              flag to force the storage of LU factors on

                              disk after the first call to KSPSolve.

                              Then keep calling KSPSolve as many times

                              as needed.<br>

                              >><br>

                              >> Would you support such a feature?<br>

                              >><br>

                              >> Thanks for your support,<br>

                              >><br>

                              >> Thibaut<br>

                            </blockquote>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </blockquote>

                </div>

              </div>

            </div>

          </blockquote>

        </div>

      </div>

    </blockquote>

  </body>

</html>