[petsc-users] Solving a sequence of linear systems stored on disk with MUMPS

Thibaut Appel t.appel17 at imperial.ac.uk
Fri Aug 2 08:17:56 CDT 2019


Hi Hong,

That's exactly what I was looking for that's perfect.

This will be of significant help.

Have a nice weekend,

Thibaut

On 01/08/2019 19:15, Zhang, Hong wrote:
> Thibaut :
> In the branch hzhang/add-ksp-tutorials-ex6/master, I added another example
> src/mat/examples/tests/ex28.c
> which creates A[k], k=0,...,4 with same data structure.
> Using a single symbolic factor F, it runs a loop with updated 
> numerical values on A[k] and solve.
> Hong
>
>     Hi Hong,
>
>     Thanks very much for that example I appreciate it.
>
>     I'll test that in a few days and come back with questions if needed,
>
>
>     Thibaut
>
>     On 25/07/2019 21:25, hong at aspiritech.org
>     <mailto:hong at aspiritech.org> wrote:
>>     Thibaut:
>>     I added an example (in the
>>     branch hzhang/add-ksp-tutorials-ex6/master)
>>     https://bitbucket.org/petsc/petsc/commits/cf847786fd804b3606d0281d404c4763f36fe475?at=hzhang/add-ksp-tutorials-ex6/master
>>
>>     You can run it with
>>     mpiexec -n 2 ./ex6 -num_numfac 2 -pc_type lu
>>     -pc_factor_mat_solver_type mumps -ksp_monitor -log_view
>>     ...
>>     MatLUFactorSym         1 1.0 3.5911e-03
>>     MatLUFactorNum         2 1.0 6.3920e-03
>>
>>     This shows the code does one symbolic factorization and two
>>     numeric factorizations.
>>     For your convenience, the code ex6.c is attached below.
>>     Let me know if you have encounter any problems.
>>     Hong
>>
>>
>>     On Thu, Jul 25, 2019 at 1:30 PM Zhang, Hong via petsc-users
>>     <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>
>>         Thibaut:
>>         I'm writing a simple example using KSP directly -- will send
>>         to you soon.
>>         Hong
>>
>>             Hi Hong,
>>
>>             That sounds like a more reasonable approach, I had no
>>             idea the PETSc/MUMPS interface could provide such a level
>>             of control on the solve process. Therefore, after
>>             assembling the matrix A_0, I should do something along
>>             the lines of:
>>
>>             MatGetFactor(A_0, MATSOLVERMUMPS, MAT_FACTOR_LU, F)
>>
>>             MatLUFactorSymbolic(F, A_0, NULL, NULL, info)
>>
>>             MatLUFactorNumeric(F, A_0, info)
>>
>>             and then call MatSolve? However I don't understand, I
>>             thought F would remain the same during the whole process
>>             but it's an input parameter of MatSolve so I'd need one
>>             F_m for each A_m? Which is not what you mentioned (do one
>>             symbolic factorization only)
>>
>>
>>             On a side note, after preallocating and assembling the
>>             first matrix, should I create/assemble all the others with
>>
>>             MatDuplicate(A_0, MAT_DO_NOT_COPY_VALUES, A_m)
>>
>>             Calls to MatSetValues( ... )
>>
>>             MatAssemblyBegin(A_m, MAT_FINAL_ASSEMBLY)
>>             MatAssemblyEnd(A_m, MAT_FINAL_ASSEMBLY)
>>
>>             Is that the recommended/most scalable way of duplicating
>>             a matrix + its non-zero structure?
>>
>>
>>             Thank you for your support and suggestions,
>>
>>             Thibaut
>>
>>
>>             On 23/07/2019 18:38, Zhang, Hong wrote:
>>>             Thibaut:
>>>
>>>                 Thanks for taking the time. I would typically run
>>>                 that on a small
>>>                 cluster node of 16 or 32 physical cores with 2 or 4
>>>                 sockets. I use 16 or
>>>                 32 MPI ranks and bind them to cores.
>>>
>>>                 The matrices would ALL have the same size and the
>>>                 same nonzero structure
>>>                 - it's just a few numerical values that would differ.
>>>
>>>             You may do one symbolic factorization of A_m, use it in
>>>             the m-i loop:
>>>             - numeric factorization of A_m
>>>             - solve A_m x_m,i = b_m,i
>>>             in mumps, numeric factorization and solve are scalable.
>>>             Repeated numeric factorization of A_m are likely faster
>>>             than reading data files from the disc.
>>>             Hong
>>>
>>>
>>>                 This is a good point you've raised as I don't think
>>>                 MUMPS is able to
>>>                 exploit that - I asked the question in their users
>>>                 list just to be sure.
>>>                 There are some options in SuperLU dist to reuse
>>>                 permutation arrays, but
>>>                 there's no I/O for that solver. And the native PETSc
>>>                 LU solver is not
>>>                 parallel?
>>>
>>>                 I'm using high-order finite differences so I'm
>>>                 suffering from a lot of
>>>                 fill-in, one of the reasons why storing
>>>                 factorizations in RAM is not
>>>                 viable. In comparison, I have almost unlimited disk
>>>                 space.
>>>
>>>                 I'm aware my need might seem counter-intuitive, but
>>>                 I'm really willing
>>>                 to sacrifice performance in the I/O part. My code is
>>>                 already heavily
>>>                 based on PETSc (preallocation, assembly for
>>>                 matrices/vectors) coupled
>>>                 with MUMPS I'm minimizing the loss of efficiency.
>>>
>>>                 Thibaut
>>>
>>>                 On 23/07/2019 17:13, Smith, Barry F. wrote:
>>>                 >    What types of computing systems will you be
>>>                 doing the computations? Roughly how many MPI_ranks?
>>>                 >
>>>                 > Are the matrices all the same size? Do they have
>>>                 the same or different nonzero structures? Would it
>>>                 be possible to use the same symbolic representation
>>>                 for all of them and just have different numerical
>>>                 values?
>>>                 >
>>>                 >    Clusters and large scale computing centers are
>>>                 notoriously terrible at IO; often IO is orders of
>>>                 magnitude slower than compute/memory making this
>>>                 type of workflow unrealistically slow. From a cost
>>>                 analysis point of view often just buying lots of
>>>                 memory might be the most  efficacious approach.
>>>                 >
>>>                 >    That said, what you suggest might require only
>>>                 a few lines of code (though determining where to put
>>>                 them is the tricky part) depending on the MUMPS
>>>                 interface for saving a filer to disk. What we would
>>>                 do is keep the PETSc wrapper that lives around the
>>>                 MUMPS matrix Mat_MUMPS but using the MUMPS API save
>>>                 the information in the DMUMPS_STRUC_C id; and then
>>>                 reload it when needed.
>>>                 >
>>>                 >    The user level API could be something like
>>>                 >
>>>                 >    MatMumpsSaveToDisk(Mat) and
>>>                 MatMumpsLoadFromDisk(Mat) they would just money with
>>>                 DMUMPS_STRUC_C id; item.
>>>                 >
>>>                 >
>>>                 >    Barry
>>>                 >
>>>                 >
>>>                 >> On Jul 23, 2019, at 9:24 AM, Thibaut Appel via
>>>                 petsc-users <petsc-users at mcs.anl.gov
>>>                 <mailto:petsc-users at mcs.anl.gov>> wrote:
>>>                 >>
>>>                 >> Dear PETSc users,
>>>                 >>
>>>                 >> I need to solve several linear systems
>>>                 successively, with LU factorization, as part of an
>>>                 iterative process in my Fortran application code.
>>>                 >>
>>>                 >> The process would solve M systems (A_m)(x_m,i) =
>>>                 (b_m,i) for m=1,M at each iteration i, but computing
>>>                 the LU factorization of A_m only once.
>>>                 >> The RHSs (b_m,i+1) are computed from all the
>>>                 different (x_m,i) and all depend upon each other.
>>>                 >>
>>>                 >> The way I envisage to perform that is to use
>>>                 MUMPS to compute, successively, each of the LU
>>>                 factorizations (m) in parallel and store the factors
>>>                 on disk, creating/assembling/destroying the matrices
>>>                 A_m on the go.
>>>                 >> Then whenever needed, read the factors in
>>>                 parallel to solve the systems. Since version 5.2,
>>>                 MUMPS has a save/restore feature that allows that,
>>>                 see http://mumps.enseeiht.fr/doc/userguide_5.2.1.pdf
>>>                 p.20, 24 and 58.
>>>                 >>
>>>                 >> In its current state, the PETSc/MUMPS interface
>>>                 does not incorporate that feature. I'm an advanced
>>>                 Fortran programmer but not in C so I don't think I
>>>                 would do an amazing job having a go inside
>>>                 src/mat/impls/aij/mpi/mumps/mumps.c.
>>>                 >>
>>>                 >> I was picturing something like creating as many
>>>                 KSP objects as linear systems to be solved, with
>>>                 some sort of flag to force the storage of LU factors
>>>                 on disk after the first call to KSPSolve. Then keep
>>>                 calling KSPSolve as many times as needed.
>>>                 >>
>>>                 >> Would you support such a feature?
>>>                 >>
>>>                 >> Thanks for your support,
>>>                 >>
>>>                 >> Thibaut
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190802/b3329283/attachment.html>


More information about the petsc-users mailing list