[petsc-users] Setting up MUMPS in PETSc

Hong Zhang hzhang at mcs.anl.gov
Tue Oct 23 14:02:51 CDT 2012


Jinquan:

>
> I have a question on how to use mumps properly in PETSc.  It appears that
> I didn’t set up mumps right.  I followed the example in ****
>
> http://www.mcs.anl.gov/petsc/petsc-dev/src/mat/examples/tests/ex125.c.html
>

This example is for our internal testing, not intended for production runs.
Suggest using PETSc high level KSP solver which provides more flexibility.

> ****
>
> ** **
>
> to set up my program.  Here is my situation on using the default setting *
> ***
>
> ** **
>
>         PetscInt <http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Sys/PetscInt.html#PetscInt> icntl_7 = 5;****
>
>         MatMumpsSetIcntl<http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatMumpsSetIcntl.html#MatMumpsSetIcntl>
> (F,7,icntl_7);****
>
> ** **
>
> in the example ex125.c:
>
Using KSP sover, the mumps options can be chosen at runtime,
e.g.
~petsc/src/ksp/ksp/examples/tutorials/ex2.c:
mpiexec -n 2 ./ex2 -pc_type lu -pc_factor_mat_solver_package mumps
-mat_mumps_icntl_7 5
Norm of error 1.49777e-15 iterations 1

> **1.       **The program work fine on all *small* models (sparse matrices
> at  the order of m= 894, 1097, 31k with a dense matrix included in the
> sparse matrix). The residuals are at the magnitude of 10^-3.
>
                                                      ^^^^
With a direct solver, residual =  10^-3 indicates your matrix might be very
ill-conditioned or close to singular. What do you get for |R|/|rhs| = ?
Is this the reason you want to use a direct solver instead of iterative one?
What do you mean "31k with a dense matrix included in the sparse matrix"?
How sparse is your matrix, e.g., nnz(A)/(m*m)=?

> **2.       **The program has some issues on *medium* size problem
>  (m=460k with a dense matrix at the order of n=30k included in the sparse
> matrix).  The full sparse matrix is sized at 17GB.****
>
> **a.       **We used another software to generate sparse matrix by using
> 144 cores:  ****
>
> **                                                               i.      *
> *When I used the resource from 144 cores (12 nodes with 48GB/node), it
> could not provide the solution.  There was a complain on the memory
> violation.****
>
> **                                                             ii.      **When
> I used the resource from 432 cores (36 nodes with 48GB/node), it provided
> the solution.
>
Direct solvers are notoriously memory consuming. It seems your matrix is
quite dense, requiring more memory than 144 cores could provide.
What is "another software "?

> ****
>
> **b.      **We used another software to generate the same sparse matrix
> by using 576 cores:  ****
>
> **                                                               i.      *
> *When I used the resource from 576 cores (48 nodes with 48GB/node), it
> could not provide the solution.  There was a complain on the memory
> violation.****
>
> **                                                             ii.      **When
> I used the resource from 1152 cores (96 nodes with 48GB/node), it provided
> the solution.
>
Both a and b seem indicate that, you can use small num of cores to generate
original matrix A, but need more cores (resource) to solve A x =b.
This is because A = LU, the factored matrices L and U require far more
memory than original A. Run your code using KSP with your matrix data and
option -ksp_view
e.g., petsc/src/ksp/ksp/examples/tutorials/ex10.c
mpiexec -n 2 ./ex10 -f <matrix binary data file> -pc_type lu
-pc_factor_mat_solver_package mumps -ksp_view
...
then you'll see memory info provided by mumps.

> **3.       **The program could not solve the *large* size problem (m=640k
> with a dense matrix at the order of n=178k included in the sparse matrix).
> The full sparse matrix is sized at 511GB.****
>
> **a.       **We used another software to generate sparse matrix by using
> 900 cores:  ****
>
> **                                                               i.      *
> *When I used the resource from 900 cores (75 nodes with 48GB/node), it
> could not provide the solution.  There was a complain on the memory
> violation.****
>
> **                                                             ii.      **When
> I used the resource from 2400 cores (200 nodes with 48GB/node), it STILL
> COULD NOT provide the solution.
>
Your computer system and software have limits. Find the answers to your
'medium size' problems first.

> ****
>
>
> My confusion starts from the medium size problem:****
>
> **·         **It seems something was not right in the default setting in
> ex125.c for these problems.  ****
>
> **·         **I got the info that METIS was used instead of ParMETIS in
> solving these problems.
>
By default, petsc-mumps interface uses sequential symbolic factorization
(analysis phase). Use '-mat_mumps_icntl_28 2' to switch to parallel.
I tested it, but seems parmetis is still not used. Check mumps manual
or contact mumps developer on how to use parmetis.

> ****
>
> **·         **Furthermore, it appears that there was unreasonable demand
> on the solver even on the medium size problem.   ****
>
> **·         **I suspect one rank was trying to collect all data from
> other ranks.
>
Yes, analysis is sequential, and rhs vector must be in the host :-(
In general, direct solvers cannot be scaled to very large num of cores.

> ****
>
> ** **
>
> What other addition setting is needed for mumps such that it could deal
> with medium and large size problems?
>

Run your code with option '-help |grep mumps' and experiment with
various options, e.g., matrix orderings, nonzero-fills etc.
You may also try superlu_dist. Good luck!

> ****
>
> ** **
>
> Do you guys have similar experience on that?
>
I personally never used mumps  or superlu_dist for such large matrices.
Consult mumps developers.

Hong

> ****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jinquan****
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121023/84cf554f/attachment-0001.html>


More information about the petsc-users mailing list