[petsc-users] Best practices for solving Dense Linear systems
Nidish
nb25 at rice.edu
Sat Aug 8 16:55:46 CDT 2020
On 8/7/20 12:55 PM, Barry Smith wrote:
>
>
>> On Aug 7, 2020, at 12:26 PM, Nidish <nb25 at rice.edu
>> <mailto:nb25 at rice.edu>> wrote:
>>
>>
>> On 8/7/20 8:52 AM, Barry Smith wrote:
>>>
>>>
>>>> On Aug 7, 2020, at 1:25 AM, Nidish <nb25 at rice.edu
>>>> <mailto:nb25 at rice.edu>> wrote:
>>>>
>>>> Indeed - I was just using the default solver (GMRES with ILU).
>>>>
>>>> Using just standard LU (direct solve with "-pc_type lu -ksp_type
>>>> preonly"), I find elemental to be extremely slow even for a
>>>> 1000x1000 matrix.
>>>>
>>>
>>> What about on one process?
>> On just one process the performance is comparable.
>>>
>>> Elemental generally won't be competitive for such tiny matrices.
>>>>
>>>> For MPIaij it's throwing me an error if I tried "-pc_type lu".
>>>>
>>>
>>> Yes, there is no PETSc code for sparse parallel direct solver,
>>> this is expected.
>>>
>>> What about ?
>>>
>>>> mpirun -n 1 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi
>>>>
>>>> mpirun -n 4 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi
>>>>
>> Same results - the elemental version is MUCH slower (for 1000x1000).
>>> Where will your dense matrices be coming from and how big will they
>>> be in practice? This will help determine if an iterative solver is
>>> appropriate. If they will be 100,000 for example then testing with
>>> 1000 will tell you nothing useful, you need to test with the problem
>>> size you care about.
>>
>> The matrices in my application arise from substructuring/Component
>> Mode Synthesis conducted on a system that is linear "almost
>> everywhere", for example jointed systems. The procedure we follow is:
>> build a mesh & identify the nodes corresponding to the interfaces,
>> reduce the model using component mode synthesis to obtain a
>> representation of the system using just the interface
>> degrees-of-freedom along with some (~10s) generalized "modal
>> coordinates". We conduct the non-linear analyses (transient, steady
>> state harmonic, etc.) using this matrices.
>>
>> I am interested in conducting non-linear mesh convergence for a
>> particular system of interest wherein the interface DoFs are, approx,
>> 4000, 8000, 12000, 16000. I'm fairly certain the dense matrices will
>> not be larger. The
>>
>
> Ok, so it is not clear how well conditioned these dense matrices
> will be.
>
> There are three questions that need to be answered.
>
> 1) for your problem can iterative methods be used and will they
> require less work than direct solvers.
>
> For direct LU the work is order N^3 to do the factorization
> with a relatively small constant. Because of smart organization inside
> dense LU the flops can be done very efficiently.
>
> For GMRES with Jacobi preconditioning the work is order N^2
> (the time for a dense matrix-vector product) for each iteration. If
> the number of iterations small than the total work is much less than a
> direct solver. In the worst case the number of iterations is order N
> so the total work is order N^3, the same order as a direct method.
> But the efficiency of a dense matrix-vector product is much lower
> than the efficiency of a LU factorization so even if the work is the
> same order it can take longer. One should use mpidense as the matrix
> format for iterative.
>
> With iterative methods YOU get to decide how accurate you need
> your solution, you do this by setting how small you want the residual
> to be (since you can't directly control the error). By default PETSc
> uses a relative decrease in the residual of 1e-5.
>
> 2) for your size problems can parallelism help?
>
> I think it should but elemental since it requires a different data
> layout has additional overhead cost to get the data into the optimal
> format for parallelism.
>
> 3) can parallelism help on YOUR machine. Just because a machine has
> multiple cores it may not be able to utilize them efficiently for
> solvers if the total machine memory bandwidth is limited.
>
> So the first thing to do is on the machine you plan to use for your
> computations run the streams benchmark discussed in
> https://www.mcs.anl.gov/petsc/documentation/faq.html#computers this
> will give us some general idea of how much parallelism you can take
> advantage of. Is the machine a parallel cluster or just a single node?
>
> After this I'll give you a few specific cases to run to get a
> feeling for what approach would be best for your problems,
>
> Barry
>
Thank you for the responses. Here's a pointwise response to your queries:
1) I am presently working with random matrices (with a large constant
value in the diagonals to ensure diagonal dominance) before I start
working with the matrices from my system. At the end of the day the
matrices I expect to be using can be thought of to be Schur complements
of a Laplacian operator.
2) Since my application is joint dynamics, I have a non-linear function
that has to be evaluated at quadrature locations on a 2D mesh and
integrated to form the residue vector as well as the Jacobian matrices.
There is thus potential speedup I expect for the function evaluations
definitely.
Since the matrices I will end up with will be dense (at least for static
simulations), I wanted directions to find the best solver options for my
problem.
3) I am presently on an octa-core (4 physical cores) machine with 16
Gigs of RAM. I plan to conduct code development and benchmarking on this
machine before I start running larger models on a cluster I have access to.
I was unable to run the streams benchmark on the cluster (PETSc 3.11.1
is installed there, and the benchmarks in the git directory was giving
issues), but I was able to do this in my local machine - here's the output:
scaling.log
1 13697.5004 Rate (MB/s)
2 13021.9505 Rate (MB/s) 0.950681
3 12402.6925 Rate (MB/s) 0.905471
4 12309.1712 Rate (MB/s) 0.898644
Could you point me to the part in the documentation that speaks about
the different options available for dealing with dense matrices? I just
realized that bindings for MUMPS are available in PETSc.
Thank you very much,
Nidish
>
>> However for frequency domain simulations, we use matrices that are
>> about 10 times the size of the original matrices (whose meshes have
>> been shown to be convergent in static test cases).
>>
>> Thank you,
>> Nidish
>>
>>>
>>> Barry
>>>
>>>> I'm attaching the code here, in case you'd like to have a look at
>>>> what I've been trying to do.
>>>>
>>>> The two configurations of interest are,
>>>>
>>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij
>>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental
>>>>
>>>> (for the GMRES with ILU) and,
>>>>
>>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij -pc_type lu
>>>> -ksp_type preonly
>>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental -pc_type lu
>>>> -ksp_type preonly
>>>>
>>>> elemental seems to perform poorly in both cases.
>>>>
>>>> Nidish
>>>>
>>>> On 8/7/20 12:50 AM, Barry Smith wrote:
>>>>>
>>>>> What is the output of -ksp_view for the two case?
>>>>>
>>>>> It is not only the matrix format but also the matrix solver that
>>>>> matters. For example if you are using an iterative solver the
>>>>> elemental format won't be faster, you should use the PETSc
>>>>> MPIDENSE format. The elemental format is really intended when you
>>>>> use a direct LU solver for the matrix. For tiny matrices like this
>>>>> an iterative solver could easily be faster than the direct solver,
>>>>> it depends on the conditioning (eigenstructure) of the dense
>>>>> matrix. Also the default PETSc solver uses block Jacobi with ILU
>>>>> on each process if using a sparse format, ILU applied to a dense
>>>>> matrix is actually LU so your solver is probably different also
>>>>> between the MPIAIJ and the elemental.
>>>>>
>>>>> Barry
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Aug 7, 2020, at 12:30 AM, Nidish <nb25 at rice.edu
>>>>>> <mailto:nb25 at rice.edu>> wrote:
>>>>>>
>>>>>> Thank you for the response.
>>>>>>
>>>>>> I've just been running some tests with matrices up to 2e4
>>>>>> dimensions (dense). When I compared the solution times for
>>>>>> "-mat_type elemental" and "-mat_type mpiaij" running with 4
>>>>>> cores, I found the mpidense versions running way faster than
>>>>>> elemental. I have not been able to make the elemental version
>>>>>> finish up for 2e4 so far (my patience runs out faster).
>>>>>>
>>>>>> What's going on here? I thought elemental was supposed to be
>>>>>> superior for dense matrices.
>>>>>>
>>>>>> I can share the code if that's appropriate for this forum (sorry,
>>>>>> I'm new here).
>>>>>>
>>>>>> Nidish
>>>>>> On Aug 6, 2020, at 23:01, Barry Smith <bsmith at petsc.dev
>>>>>> <mailto:bsmith at petsc.dev>> wrote:
>>>>>>
>>>>>> On Aug 6, 2020, at 7:32 PM, Nidish <nb25 at rice.edu
>>>>>> <mailto:nb25 at rice.edu>> wrote: I'm relatively new to
>>>>>> PETSc, and my applications involve (for the most part)
>>>>>> dense matrix solves. I read in the documentation that
>>>>>> this is an area PETSc does not specialize in but instead
>>>>>> recommends external libraries such as Elemental. I'm
>>>>>> wondering if there are any "best" practices in this
>>>>>> regard. Some questions I'd like answered are: 1. Can I
>>>>>> just declare my dense matrix as a sparse one and fill the
>>>>>> whole matrix up? Do any of the others go this route?
>>>>>> What're possible pitfalls/unfavorable outcomes for this?
>>>>>> I understand the memory overhead probably shoots up.
>>>>>>
>>>>>>
>>>>>> No, this isn't practical, the performance will be terrible.
>>>>>>
>>>>>> 2. Are there any specific guidelines on when I can expect
>>>>>> elemental to perform better in parallel than in serial?
>>>>>>
>>>>>>
>>>>>> Because the computation to communication ratio for dense matrices is higher than for sparse you will see better parallel performance for dense problems of a given size than sparse problems of a similar size. In other words parallelism can help for dense matrices for relatively small problems, of course the specifics of your machine hardware and software also play a role.
>>>>>>
>>>>>> Barry
>>>>>>
>>>>>> Of course, I'm interesting in any other details that may
>>>>>> be important in this regard. Thank you, Nidish
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> Nidish
>>>> <ksps.cpp>
>>>
>> --
>> Nidish
>
--
Nidish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200808/1e74f724/attachment-0001.html>
More information about the petsc-users
mailing list