<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi Hong,</p>
    <p>That's exactly what I was looking for that's perfect.</p>
    <p>This will be of significant help.<br>
    </p>
    <p>Have a nice weekend,<br>
    </p>
    <p>Thibaut<br>
    </p>
    <div class="moz-cite-prefix">On 01/08/2019 19:15, Zhang, Hong wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAGCphBuhfq4zW-8Zh_T=kOx78vvg3tOExEp+jLwLFT=dB8iZ_A@mail.gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">Thibaut :<br>
        </div>
        <div>In the branch hzhang/add-ksp-tutorials-ex6/master, I added
          another example</div>
        <div>src/mat/examples/tests/ex28.c<br>
        </div>
        <div>which creates A[k], k=0,...,4 with same data structure.</div>
        <div>Using a single symbolic factor F, it runs a loop with
          updated numerical values on A[k] and solve.</div>
        <div>Hong</div>
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div bgcolor="#FFFFFF">
              <p>Hi Hong,</p>
              <p>Thanks very much for that example I appreciate it.</p>
              <p>I'll test that in a few days and come back with
                questions if needed,</p>
              <p><br>
              </p>
              <p>Thibaut<br>
              </p>
              <div class="gmail-m_1776711103632413238moz-cite-prefix">On
                25/07/2019 21:25, <a
                  class="gmail-m_1776711103632413238moz-txt-link-abbreviated"
                  href="mailto:hong@aspiritech.org" target="_blank"
                  moz-do-not-send="true">
                  hong@aspiritech.org</a> wrote:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">
                  <div dir="ltr">Thibaut:<br>
                  </div>
                  <div>I added an example (in the
                    branch hzhang/add-ksp-tutorials-ex6/master)</div>
                  <div><a
href="https://bitbucket.org/petsc/petsc/commits/cf847786fd804b3606d0281d404c4763f36fe475?at=hzhang/add-ksp-tutorials-ex6/master"
                      target="_blank" moz-do-not-send="true">https://bitbucket.org/petsc/petsc/commits/cf847786fd804b3606d0281d404c4763f36fe475?at=hzhang/add-ksp-tutorials-ex6/master</a><br>
                  </div>
                  <div><br>
                  </div>
                  <div>You can run it with</div>
                  <div>mpiexec -n 2 ./ex6 -num_numfac 2 -pc_type lu
                    -pc_factor_mat_solver_type mumps -ksp_monitor
                    -log_view<br>
                  </div>
                  <div>...</div>
                  <div>MatLUFactorSym         1 1.0 3.5911e-03 <br>
                    MatLUFactorNum         2 1.0 6.3920e-03<br>
                  </div>
                  <div><br>
                  </div>
                  <div>This shows the code does one symbolic
                    factorization and two numeric factorizations.</div>
                  <div>For your convenience, the code ex6.c is attached
                    below.</div>
                  <div>Let me know if you have encounter any problems.</div>
                  <div>Hong</div>
                  <div><br>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Thu, Jul 25,
                      2019 at 1:30 PM Zhang, Hong via petsc-users <<a
                        href="mailto:petsc-users@mcs.anl.gov"
                        target="_blank" moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <div>
                        <div dir="ltr">
                          <div dir="ltr">Thibaut:<br>
                          </div>
                          <div>I'm writing a simple example using KSP
                            directly -- will send to you soon.</div>
                          <div>Hong</div>
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote"
                              style="margin:0px 0px 0px
                              0.8ex;border-left:1px solid
                              rgb(204,204,204);padding-left:1ex">
                              <div bgcolor="#FFFFFF">
                                <p>Hi Hong,</p>
                                <p>That sounds like a more reasonable
                                  approach, I had no idea the
                                  PETSc/MUMPS interface could provide
                                  such a level of control on the solve
                                  process. Therefore, after assembling
                                  the matrix A_0, I should do something
                                  along the lines of:</p>
                                <p><font face="Courier New, Courier,
                                    monospace">MatGetFactor(A_0,
                                    MATSOLVERMUMPS, MAT_FACTOR_LU, F)</font></p>
                                <p><font face="Courier New, Courier,
                                    monospace">MatLUFactorSymbolic(F,
                                    A_0, NULL, NULL, info)</font></p>
                                <p><font face="Courier New, Courier,
                                    monospace">MatLUFactorNumeric(F,
                                    A_0, info)</font></p>
                                <p>and then call MatSolve? However I
                                  don't understand, I thought F would
                                  remain the same during the whole
                                  process but it's an input parameter of
                                  MatSolve so I'd need one F_m for each
                                  A_m? Which is not what you mentioned
                                  (do one symbolic factorization only)</p>
                                <p><br>
                                </p>
                                <p>On a side note, after preallocating
                                  and assembling the first matrix,
                                  should I create/assemble all the
                                  others with</p>
                                <p><font face="Courier New, Courier,
                                    monospace">MatDuplicate(A_0,
                                    MAT_DO_NOT_COPY_VALUES, A_m)</font></p>
                                <p><font face="Courier New, Courier,
                                    monospace">Calls to MatSetValues(
                                    ... )<br>
                                  </font></p>
                                <font face="Courier New, Courier,
                                  monospace">MatAssemblyBegin(A_m,
                                  MAT_FINAL_ASSEMBLY)<br>
                                  MatAssemblyEnd(A_m,
                                  MAT_FINAL_ASSEMBLY)</font><br>
                                <br>
                                <p>Is that the recommended/most scalable
                                  way of duplicating a matrix + its
                                  non-zero structure?<br>
                                </p>
                                <p><br>
                                </p>
                                <p>Thank you for your support and
                                  suggestions,</p>
                                <p>Thibaut</p>
                                <p><br>
                                </p>
                                <div
class="gmail-m_1776711103632413238gmail-m_1409220144151092612gmail-m_7299842392715427480moz-cite-prefix">On
                                  23/07/2019 18:38, Zhang, Hong wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div dir="ltr">Thibaut:</div>
                                    <div class="gmail_quote">
                                      <blockquote class="gmail_quote"
                                        style="margin:0px 0px 0px
                                        0.8ex;border-left:1px solid
                                        rgb(204,204,204);padding-left:1ex">
                                        Thanks for taking the time. I
                                        would typically run that on a
                                        small <br>
                                        cluster node of 16 or 32
                                        physical cores with 2 or 4
                                        sockets. I use 16 or <br>
                                        32 MPI ranks and bind them to
                                        cores.<br>
                                        <br>
                                        The matrices would ALL have the
                                        same size and the same nonzero
                                        structure <br>
                                        - it's just a few numerical
                                        values that would differ.<br>
                                      </blockquote>
                                      <div>You may do one symbolic
                                        factorization of A_m, use it in
                                        the m-i loop:</div>
                                      <div>- numeric factorization of
                                        A_m</div>
                                      <div>- solve A_m x_m,i = b_m,i</div>
                                      <div>in mumps, numeric
                                        factorization and solve are
                                        scalable. Repeated numeric
                                        factorization of A_m are likely
                                        faster than reading data files
                                        from the disc.</div>
                                      <div>Hong</div>
                                      <blockquote class="gmail_quote"
                                        style="margin:0px 0px 0px
                                        0.8ex;border-left:1px solid
                                        rgb(204,204,204);padding-left:1ex">
                                        <br>
                                        This is a good point you've
                                        raised as I don't think MUMPS is
                                        able to <br>
                                        exploit that - I asked the
                                        question in their users list
                                        just to be sure. <br>
                                        There are some options in
                                        SuperLU dist to reuse
                                        permutation arrays, but <br>
                                        there's no I/O for that solver.
                                        And the native PETSc LU solver
                                        is not <br>
                                        parallel?<br>
                                        <br>
                                        I'm using high-order finite
                                        differences so I'm suffering
                                        from a lot of <br>
                                        fill-in, one of the reasons why
                                        storing factorizations in RAM is
                                        not <br>
                                        viable. In comparison, I have
                                        almost unlimited disk space.<br>
                                        <br>
                                        I'm aware my need might seem
                                        counter-intuitive, but I'm
                                        really willing <br>
                                        to sacrifice performance in the
                                        I/O part. My code is already
                                        heavily <br>
                                        based on PETSc (preallocation,
                                        assembly for matrices/vectors)
                                        coupled <br>
                                        with MUMPS I'm minimizing the
                                        loss of efficiency.<br>
                                        <br>
                                        Thibaut<br>
                                        <br>
                                        On 23/07/2019 17:13, Smith,
                                        Barry F. wrote:<br>
                                        >    What types of computing
                                        systems will you be doing the
                                        computations? Roughly how many
                                        MPI_ranks?<br>
                                        ><br>
                                        > Are the matrices all the
                                        same size? Do they have the same
                                        or different nonzero structures?
                                        Would it be possible to use the
                                        same symbolic representation for
                                        all of them and just have
                                        different numerical values?<br>
                                        ><br>
                                        >    Clusters and large scale
                                        computing centers are
                                        notoriously terrible at IO;
                                        often IO is orders of magnitude
                                        slower than compute/memory
                                        making this type of workflow
                                        unrealistically slow. From a
                                        cost analysis point of view
                                        often just buying lots of memory
                                        might be the most  efficacious
                                        approach.<br>
                                        ><br>
                                        >    That said, what you
                                        suggest might require only a few
                                        lines of code (though
                                        determining where to put them is
                                        the tricky part) depending on
                                        the MUMPS interface for saving a
                                        filer to disk. What we would do
                                        is keep the PETSc wrapper that
                                        lives around the MUMPS matrix
                                        Mat_MUMPS but using the MUMPS
                                        API save the information in the
                                        DMUMPS_STRUC_C id; and then
                                        reload it when needed.<br>
                                        ><br>
                                        >    The user level API could
                                        be something like<br>
                                        ><br>
                                        >    MatMumpsSaveToDisk(Mat)
                                        and MatMumpsLoadFromDisk(Mat)
                                        they would just money with
                                        DMUMPS_STRUC_C id; item.<br>
                                        ><br>
                                        ><br>
                                        >    Barry<br>
                                        ><br>
                                        ><br>
                                        >> On Jul 23, 2019, at
                                        9:24 AM, Thibaut Appel via
                                        petsc-users <<a
                                          href="mailto:petsc-users@mcs.anl.gov"
                                          target="_blank"
                                          moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>
                                        wrote:<br>
                                        >><br>
                                        >> Dear PETSc users,<br>
                                        >><br>
                                        >> I need to solve several
                                        linear systems successively,
                                        with LU factorization, as part
                                        of an iterative process in my
                                        Fortran application code.<br>
                                        >><br>
                                        >> The process would solve
                                        M systems (A_m)(x_m,i) = (b_m,i)
                                        for m=1,M at each iteration i,
                                        but computing the LU
                                        factorization of A_m only once.<br>
                                        >> The RHSs (b_m,i+1) are
                                        computed from all the different
                                        (x_m,i) and all depend upon each
                                        other.<br>
                                        >><br>
                                        >> The way I envisage to
                                        perform that is to use MUMPS to
                                        compute, successively, each of
                                        the LU factorizations (m) in
                                        parallel and store the factors
                                        on disk,
                                        creating/assembling/destroying
                                        the matrices A_m on the go.<br>
                                        >> Then whenever needed,
                                        read the factors in parallel to
                                        solve the systems. Since version
                                        5.2, MUMPS has a save/restore
                                        feature that allows that, see
                                        <a
                                          href="http://mumps.enseeiht.fr/doc/userguide_5.2.1.pdf"
                                          rel="noreferrer"
                                          target="_blank"
                                          moz-do-not-send="true">
http://mumps.enseeiht.fr/doc/userguide_5.2.1.pdf</a> p.20, 24 and 58.<br>
                                        >><br>
                                        >> In its current state,
                                        the PETSc/MUMPS interface does
                                        not incorporate that feature.
                                        I'm an advanced Fortran
                                        programmer but not in C so I
                                        don't think I would do an
                                        amazing job having a go inside
                                        src/mat/impls/aij/mpi/mumps/mumps.c.<br>
                                        >><br>
                                        >> I was picturing
                                        something like creating as many
                                        KSP objects as linear systems to
                                        be solved, with some sort of
                                        flag to force the storage of LU
                                        factors on disk after the first
                                        call to KSPSolve. Then keep
                                        calling KSPSolve as many times
                                        as needed.<br>
                                        >><br>
                                        >> Would you support such
                                        a feature?<br>
                                        >><br>
                                        >> Thanks for your
                                        support,<br>
                                        >><br>
                                        >> Thibaut<br>
                                      </blockquote>
                                    </div>
                                  </div>
                                </blockquote>
                              </div>
                            </blockquote>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                </div>
              </blockquote>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
  </body>
</html>