[petsc-users] Solving a sequence of linear systems stored on disk with MUMPS

Thu Jul 25 15:25:42 CDT 2019

Thibaut:
I added an example (in the branch hzhang/add-ksp-tutorials-ex6/master)
https://bitbucket.org/petsc/petsc/commits/cf847786fd804b3606d0281d404c4763f36fe475?at=hzhang/add-ksp-tutorials-ex6/master

You can run it with
mpiexec -n 2 ./ex6 -num_numfac 2 -pc_type lu -pc_factor_mat_solver_type
mumps -ksp_monitor -log_view
...
MatLUFactorSym         1 1.0 3.5911e-03
MatLUFactorNum         2 1.0 6.3920e-03

This shows the code does one symbolic factorization and two numeric
factorizations.
For your convenience, the code ex6.c is attached below.
Let me know if you have encounter any problems.
Hong

On Thu, Jul 25, 2019 at 1:30 PM Zhang, Hong via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Thibaut:
> I'm writing a simple example using KSP directly -- will send to you soon.
> Hong
>
>> Hi Hong,
>>
>> That sounds like a more reasonable approach, I had no idea the
>> PETSc/MUMPS interface could provide such a level of control on the solve
>> process. Therefore, after assembling the matrix A_0, I should do something
>> along the lines of:
>>
>> MatGetFactor(A_0, MATSOLVERMUMPS, MAT_FACTOR_LU, F)
>>
>> MatLUFactorSymbolic(F, A_0, NULL, NULL, info)
>>
>> MatLUFactorNumeric(F, A_0, info)
>>
>> and then call MatSolve? However I don't understand, I thought F would
>> remain the same during the whole process but it's an input parameter of
>> MatSolve so I'd need one F_m for each A_m? Which is not what you mentioned
>> (do one symbolic factorization only)
>>
>>
>> On a side note, after preallocating and assembling the first matrix,
>> should I create/assemble all the others with
>>
>> MatDuplicate(A_0, MAT_DO_NOT_COPY_VALUES, A_m)
>>
>> Calls to MatSetValues( ... )
>> MatAssemblyBegin(A_m, MAT_FINAL_ASSEMBLY)
>> MatAssemblyEnd(A_m, MAT_FINAL_ASSEMBLY)
>>
>> Is that the recommended/most scalable way of duplicating a matrix + its
>> non-zero structure?
>>
>>
>> Thank you for your support and suggestions,
>>
>> Thibaut
>>
>>
>> On 23/07/2019 18:38, Zhang, Hong wrote:
>>
>> Thibaut:
>>
>>> Thanks for taking the time. I would typically run that on a small
>>> cluster node of 16 or 32 physical cores with 2 or 4 sockets. I use 16 or
>>> 32 MPI ranks and bind them to cores.
>>>
>>> The matrices would ALL have the same size and the same nonzero structure
>>> - it's just a few numerical values that would differ.
>>>
>> You may do one symbolic factorization of A_m, use it in the m-i loop:
>> - numeric factorization of A_m
>> - solve A_m x_m,i = b_m,i
>> in mumps, numeric factorization and solve are scalable. Repeated numeric
>> factorization of A_m are likely faster than reading data files from the
>> disc.
>> Hong
>>
>>>
>>> This is a good point you've raised as I don't think MUMPS is able to
>>> exploit that - I asked the question in their users list just to be sure.
>>> There are some options in SuperLU dist to reuse permutation arrays, but
>>> there's no I/O for that solver. And the native PETSc LU solver is not
>>> parallel?
>>>
>>> I'm using high-order finite differences so I'm suffering from a lot of
>>> fill-in, one of the reasons why storing factorizations in RAM is not
>>> viable. In comparison, I have almost unlimited disk space.
>>>
>>> I'm aware my need might seem counter-intuitive, but I'm really willing
>>> to sacrifice performance in the I/O part. My code is already heavily
>>> based on PETSc (preallocation, assembly for matrices/vectors) coupled
>>> with MUMPS I'm minimizing the loss of efficiency.
>>>
>>> Thibaut
>>>
>>> On 23/07/2019 17:13, Smith, Barry F. wrote:
>>> >    What types of computing systems will you be doing the computations?
>>> Roughly how many MPI_ranks?
>>> >
>>> > Are the matrices all the same size? Do they have the same or different
>>> nonzero structures? Would it be possible to use the same symbolic
>>> representation for all of them and just have different numerical values?
>>> >
>>> >    Clusters and large scale computing centers are notoriously terrible
>>> at IO; often IO is orders of magnitude slower than compute/memory making
>>> this type of workflow unrealistically slow. From a cost analysis point of
>>> view often just buying lots of memory might be the most  efficacious
>>> approach.
>>> >
>>> >    That said, what you suggest might require only a few lines of code
>>> (though determining where to put them is the tricky part) depending on the
>>> MUMPS interface for saving a filer to disk. What we would do is keep the
>>> PETSc wrapper that lives around the MUMPS matrix Mat_MUMPS but using the
>>> MUMPS API save the information in the DMUMPS_STRUC_C id; and then reload it
>>> when needed.
>>> >
>>> >    The user level API could be something like
>>> >
>>> >    MatMumpsSaveToDisk(Mat) and MatMumpsLoadFromDisk(Mat) they would
>>> just money with DMUMPS_STRUC_C id; item.
>>> >
>>> >
>>> >    Barry
>>> >
>>> >
>>> >> On Jul 23, 2019, at 9:24 AM, Thibaut Appel via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>> >>
>>> >> Dear PETSc users,
>>> >>
>>> >> I need to solve several linear systems successively, with LU
>>> factorization, as part of an iterative process in my Fortran application
>>> code.
>>> >>
>>> >> The process would solve M systems (A_m)(x_m,i) = (b_m,i) for m=1,M at
>>> each iteration i, but computing the LU factorization of A_m only once.
>>> >> The RHSs (b_m,i+1) are computed from all the different (x_m,i) and
>>> all depend upon each other.
>>> >>
>>> >> The way I envisage to perform that is to use MUMPS to compute,
>>> successively, each of the LU factorizations (m) in parallel and store the
>>> factors on disk, creating/assembling/destroying the matrices A_m on the go.
>>> >> Then whenever needed, read the factors in parallel to solve the
>>> systems. Since version 5.2, MUMPS has a save/restore feature that allows
>>> that, see http://mumps.enseeiht.fr/doc/userguide_5.2.1.pdf p.20, 24 and
>>> 58.
>>> >>
>>> >> In its current state, the PETSc/MUMPS interface does not incorporate
>>> that feature. I'm an advanced Fortran programmer but not in C so I don't
>>> think I would do an amazing job having a go inside
>>> src/mat/impls/aij/mpi/mumps/mumps.c.
>>> >>
>>> >> I was picturing something like creating as many KSP objects as linear
>>> systems to be solved, with some sort of flag to force the storage of LU
>>> factors on disk after the first call to KSPSolve. Then keep calling
>>> KSPSolve as many times as needed.
>>> >>
>>> >> Would you support such a feature?
>>> >>
>>> >> Thanks for your support,
>>> >>
>>> >> Thibaut
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190725/9bac6be2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex6.c
Type: application/octet-stream
Size: 3320 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190725/9bac6be2/attachment-0001.obj>