# [petsc-users] Setting up MUMPS in PETSc

Jinquan Zhong jzhong at scsolutions.com
Tue Oct 23 14:36:14 CDT 2012

```Thanks, Hong and Jed.

1.       The program work fine on all small models (sparse matrices at  the order of m= 894, 1097, 31k with a dense matrix included in the sparse matrix). The residuals are at the magnitude of 10^-3.
^^^^
With a direct solver, residual =  10^-3 indicates your matrix might be very ill-conditioned or close to singular. What do you get for |R|/|rhs| = ?

>> That is a good point.  In these applications, we don't usually have a good-conditioned matrix . The condition number is always around 10^10--10^12.  This is out of our control.

Is this the reason you want to use a direct solver instead of iterative one?
What do you mean "31k with a dense matrix included in the sparse matrix"?

>> We have a dense matrix embedded inside a sparse matrix.  This dense matrix usually accounts for 99% of the total nnz's.

How sparse is your matrix, e.g., nnz(A)/(m*m)=?

>> ~=0.4% for medium size problem and 0.04% for large size problem.

2.       The program has some issues on medium size problem  (m=460k with a dense matrix at the order of n=30k included in the sparse matrix).  The full sparse matrix is sized at 17GB.

a.       We used another software to generate sparse matrix by using 144 cores:

i.      When I used the resource from 144 cores (12 nodes with 48GB/node), it could not provide the solution.  There was a complain on the memory violation.

ii.      When I used the resource from 432 cores (36 nodes with 48GB/node), it provided the solution.
Direct solvers are notoriously memory consuming. It seems your matrix is quite dense, requiring more memory than 144 cores could provide.
What is "another software "?

>> it is a propriety software that I don't have access to.

b.      We used another software to generate the same sparse matrix by using 576 cores:

i.      When I used the resource from 576 cores (48 nodes with 48GB/node), it could not provide the solution.  There was a complain on the memory violation.

ii.      When I used the resource from 1152 cores (96 nodes with 48GB/node), it provided the solution.
Both a and b seem indicate that, you can use small num of cores to generate original matrix A, but need more cores (resource) to solve A x =b.

>> My confusion is that since the sparse matrix size is the same, why resource for 1152 cores are needed for 576 partitions on A, while only resource for 432 cores are needed for 144 partitions on A?  If using 432 cores can solve the 144- partition Ax=b, why did it need 1152 cores to solve 576-partition Ax=b?  I expected 576 cores could do the job that 432 cores did on the 576-partition Ax=b.

This is because A = LU, the factored matrices L and U require far more memory than original A. Run your code using KSP with your matrix data and option -ksp_view
e.g., petsc/src/ksp/ksp/examples/tutorials/ex10.c
mpiexec -n 2 ./ex10 -f <matrix binary data file> -pc_type lu -pc_factor_mat_solver_package mumps -ksp_view
...
then you'll see memory info provided by mumps.

>> Good point. I will link in KSP to test it.

*         I suspect one rank was trying to collect all data from other ranks.
Yes, analysis is sequential, and rhs vector must be in the host :-(
In general, direct solvers cannot be scaled to very large num of cores.

>> I meant rank 0 is trying to collect all NNZ from all other ranks.

What other addition setting is needed for mumps such that it could deal with medium and large size problems?

Run your code with option '-help |grep mumps' and experiment with
various options, e.g., matrix orderings, nonzero-fills etc.
You may also try superlu_dist. Good luck!

>> superlu_dist failed in providing the solution for small size problems already.

Jinquan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121023/40255e51/attachment.html>
```