<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi Hong,</p>
    <p>Thanks very much for that example I appreciate it.</p>
    <p>I'll test that in a few days and come back with questions if
      needed,</p>
    <p><br>
    </p>
    <p>Thibaut<br>
    </p>
    <div class="moz-cite-prefix">On 25/07/2019 21:25,
      <a class="moz-txt-link-abbreviated" href="mailto:hong@aspiritech.org">hong@aspiritech.org</a> wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAGCphBuuDtR34NqMvn3zC8XiK3LQ6+yu=0snWw=vUbq9kFOqmw@mail.gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">Thibaut:<br>
        </div>
        <div>I added an example (in the
          branch hzhang/add-ksp-tutorials-ex6/master)</div>
        <div><a
href="https://bitbucket.org/petsc/petsc/commits/cf847786fd804b3606d0281d404c4763f36fe475?at=hzhang/add-ksp-tutorials-ex6/master"
            moz-do-not-send="true">https://bitbucket.org/petsc/petsc/commits/cf847786fd804b3606d0281d404c4763f36fe475?at=hzhang/add-ksp-tutorials-ex6/master</a><br>
        </div>
        <div><br>
        </div>
        <div>You can run it with</div>
        <div>mpiexec -n 2 ./ex6 -num_numfac 2 -pc_type lu
          -pc_factor_mat_solver_type mumps -ksp_monitor -log_view<br>
        </div>
        <div>...</div>
        <div>MatLUFactorSym         1 1.0 3.5911e-03 <br>
          MatLUFactorNum         2 1.0 6.3920e-03<br>
        </div>
        <div><br>
        </div>
        <div>This shows the code does one symbolic factorization and two
          numeric factorizations.</div>
        <div>For your convenience, the code ex6.c is attached below.</div>
        <div>Let me know if you have encounter any problems.</div>
        <div>Hong</div>
        <div><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Thu, Jul 25, 2019 at 1:30
            PM Zhang, Hong via petsc-users <<a
              href="mailto:petsc-users@mcs.anl.gov"
              moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <div dir="ltr">
                <div dir="ltr">Thibaut:<br>
                </div>
                <div>I'm writing a simple example using KSP directly --
                  will send to you soon.</div>
                <div>Hong</div>
                <div class="gmail_quote">
                  <blockquote class="gmail_quote" style="margin:0px 0px
                    0px 0.8ex;border-left:1px solid
                    rgb(204,204,204);padding-left:1ex">
                    <div bgcolor="#FFFFFF">
                      <p>Hi Hong,</p>
                      <p>That sounds like a more reasonable approach, I
                        had no idea the PETSc/MUMPS interface could
                        provide such a level of control on the solve
                        process. Therefore, after assembling the matrix
                        A_0, I should do something along the lines of:</p>
                      <p><font face="Courier New, Courier, monospace">MatGetFactor(A_0,
                          MATSOLVERMUMPS, MAT_FACTOR_LU, F)</font></p>
                      <p><font face="Courier New, Courier, monospace">MatLUFactorSymbolic(F,
                          A_0, NULL, NULL, info)</font></p>
                      <p><font face="Courier New, Courier, monospace">MatLUFactorNumeric(F,
                          A_0, info)</font></p>
                      <p>and then call MatSolve? However I don't
                        understand, I thought F would remain the same
                        during the whole process but it's an input
                        parameter of MatSolve so I'd need one F_m for
                        each A_m? Which is not what you mentioned (do
                        one symbolic factorization only)</p>
                      <p><br>
                      </p>
                      <p>On a side note, after preallocating and
                        assembling the first matrix, should I
                        create/assemble all the others with</p>
                      <p><font face="Courier New, Courier, monospace">MatDuplicate(A_0,
                          MAT_DO_NOT_COPY_VALUES, A_m)</font></p>
                      <p><font face="Courier New, Courier, monospace">Calls
                          to MatSetValues( ... )<br>
                        </font></p>
                      <font face="Courier New, Courier, monospace">MatAssemblyBegin(A_m,
                        MAT_FINAL_ASSEMBLY)<br>
                        MatAssemblyEnd(A_m, MAT_FINAL_ASSEMBLY)</font><br>
                      <br>
                      <p>Is that the recommended/most scalable way of
                        duplicating a matrix + its non-zero structure?<br>
                      </p>
                      <p><br>
                      </p>
                      <p>Thank you for your support and suggestions,</p>
                      <p>Thibaut</p>
                      <p><br>
                      </p>
                      <div
class="gmail-m_1409220144151092612gmail-m_7299842392715427480moz-cite-prefix">On
                        23/07/2019 18:38, Zhang, Hong wrote:<br>
                      </div>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div dir="ltr">Thibaut:</div>
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote"
                              style="margin:0px 0px 0px
                              0.8ex;border-left:1px solid
                              rgb(204,204,204);padding-left:1ex">
                              Thanks for taking the time. I would
                              typically run that on a small <br>
                              cluster node of 16 or 32 physical cores
                              with 2 or 4 sockets. I use 16 or <br>
                              32 MPI ranks and bind them to cores.<br>
                              <br>
                              The matrices would ALL have the same size
                              and the same nonzero structure <br>
                              - it's just a few numerical values that
                              would differ.<br>
                            </blockquote>
                            <div>You may do one symbolic factorization
                              of A_m, use it in the m-i loop:</div>
                            <div>- numeric factorization of A_m</div>
                            <div>- solve A_m x_m,i = b_m,i</div>
                            <div>in mumps, numeric factorization and
                              solve are scalable. Repeated numeric
                              factorization of A_m are likely faster
                              than reading data files from the disc.</div>
                            <div>Hong</div>
                            <blockquote class="gmail_quote"
                              style="margin:0px 0px 0px
                              0.8ex;border-left:1px solid
                              rgb(204,204,204);padding-left:1ex">
                              <br>
                              This is a good point you've raised as I
                              don't think MUMPS is able to <br>
                              exploit that - I asked the question in
                              their users list just to be sure. <br>
                              There are some options in SuperLU dist to
                              reuse permutation arrays, but <br>
                              there's no I/O for that solver. And the
                              native PETSc LU solver is not <br>
                              parallel?<br>
                              <br>
                              I'm using high-order finite differences so
                              I'm suffering from a lot of <br>
                              fill-in, one of the reasons why storing
                              factorizations in RAM is not <br>
                              viable. In comparison, I have almost
                              unlimited disk space.<br>
                              <br>
                              I'm aware my need might seem
                              counter-intuitive, but I'm really willing
                              <br>
                              to sacrifice performance in the I/O part.
                              My code is already heavily <br>
                              based on PETSc (preallocation, assembly
                              for matrices/vectors) coupled <br>
                              with MUMPS I'm minimizing the loss of
                              efficiency.<br>
                              <br>
                              Thibaut<br>
                              <br>
                              On 23/07/2019 17:13, Smith, Barry F.
                              wrote:<br>
                              >    What types of computing systems
                              will you be doing the computations?
                              Roughly how many MPI_ranks?<br>
                              ><br>
                              > Are the matrices all the same size?
                              Do they have the same or different nonzero
                              structures? Would it be possible to use
                              the same symbolic representation for all
                              of them and just have different numerical
                              values?<br>
                              ><br>
                              >    Clusters and large scale computing
                              centers are notoriously terrible at IO;
                              often IO is orders of magnitude slower
                              than compute/memory making this type of
                              workflow unrealistically slow. From a cost
                              analysis point of view often just buying
                              lots of memory might be the most 
                              efficacious approach.<br>
                              ><br>
                              >    That said, what you suggest might
                              require only a few lines of code (though
                              determining where to put them is the
                              tricky part) depending on the MUMPS
                              interface for saving a filer to disk. What
                              we would do is keep the PETSc wrapper that
                              lives around the MUMPS matrix Mat_MUMPS
                              but using the MUMPS API save the
                              information in the DMUMPS_STRUC_C id; and
                              then reload it when needed.<br>
                              ><br>
                              >    The user level API could be
                              something like<br>
                              ><br>
                              >    MatMumpsSaveToDisk(Mat) and
                              MatMumpsLoadFromDisk(Mat) they would just
                              money with DMUMPS_STRUC_C id; item.<br>
                              ><br>
                              ><br>
                              >    Barry<br>
                              ><br>
                              ><br>
                              >> On Jul 23, 2019, at 9:24 AM,
                              Thibaut Appel via petsc-users <<a
                                href="mailto:petsc-users@mcs.anl.gov"
                                target="_blank" moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>
                              wrote:<br>
                              >><br>
                              >> Dear PETSc users,<br>
                              >><br>
                              >> I need to solve several linear
                              systems successively, with LU
                              factorization, as part of an iterative
                              process in my Fortran application code.<br>
                              >><br>
                              >> The process would solve M systems
                              (A_m)(x_m,i) = (b_m,i) for m=1,M at each
                              iteration i, but computing the LU
                              factorization of A_m only once.<br>
                              >> The RHSs (b_m,i+1) are computed
                              from all the different (x_m,i) and all
                              depend upon each other.<br>
                              >><br>
                              >> The way I envisage to perform
                              that is to use MUMPS to compute,
                              successively, each of the LU
                              factorizations (m) in parallel and store
                              the factors on disk,
                              creating/assembling/destroying the
                              matrices A_m on the go.<br>
                              >> Then whenever needed, read the
                              factors in parallel to solve the systems.
                              Since version 5.2, MUMPS has a
                              save/restore feature that allows that, see
                              <a
                                href="http://mumps.enseeiht.fr/doc/userguide_5.2.1.pdf"
                                rel="noreferrer" target="_blank"
                                moz-do-not-send="true">
http://mumps.enseeiht.fr/doc/userguide_5.2.1.pdf</a> p.20, 24 and 58.<br>
                              >><br>
                              >> In its current state, the
                              PETSc/MUMPS interface does not incorporate
                              that feature. I'm an advanced Fortran
                              programmer but not in C so I don't think I
                              would do an amazing job having a go inside
                              src/mat/impls/aij/mpi/mumps/mumps.c.<br>
                              >><br>
                              >> I was picturing something like
                              creating as many KSP objects as linear
                              systems to be solved, with some sort of
                              flag to force the storage of LU factors on
                              disk after the first call to KSPSolve.
                              Then keep calling KSPSolve as many times
                              as needed.<br>
                              >><br>
                              >> Would you support such a feature?<br>
                              >><br>
                              >> Thanks for your support,<br>
                              >><br>
                              >> Thibaut<br>
                            </blockquote>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </blockquote>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
  </body>
</html>