[petsc-users] Setting up MUMPS in PETSc
Jed Brown
jedbrown at mcs.anl.gov
Tue Oct 23 13:55:27 CDT 2012
On Tue, Oct 23, 2012 at 1:03 PM, Jinquan Zhong <jzhong at scsolutions.com>wrote:
> Dear folks,****
>
> ** **
>
> I have a question on how to use mumps properly in PETSc. It appears that
> I didn’t set up mumps right. I followed the example in ****
>
> http://www.mcs.anl.gov/petsc/petsc-dev/src/mat/examples/tests/ex125.c.html
> ****
>
> ** **
>
> to set up my program. Here is my situation on using the default setting *
> ***
>
> ** **
>
> PetscInt <http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Sys/PetscInt.html#PetscInt> icntl_7 = 5;****
>
> MatMumpsSetIcntl<http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatMumpsSetIcntl.html#MatMumpsSetIcntl>
> (F,7,icntl_7);****
>
> ** **
>
> in the example ex125.c: ****
>
> ** **
>
> **1. **The program work fine on all *small* models (sparse matrices
> at the order of m= 894, 1097, 31k with a dense matrix included in the
> sparse matrix). The residuals are at the magnitude of 10^-3.
>
This suggests that your systems are nearly singular. If you have a
condition number of 1e12, it's time to reconsider the model.
> ****
>
> ** **
>
> **2. **The program has some issues on *medium* size problem
> (m=460k with a dense matrix at the order of n=30k included in the sparse
> matrix). The full sparse matrix is sized at 17GB.****
>
> **a. **We used another software to generate sparse matrix by using
> 144 cores: ****
>
> ** i. *
> *When I used the resource from 144 cores (12 nodes with 48GB/node), it
> could not provide the solution. There was a complain on the memory
> violation.
>
Always send the entire error message.
> ****
>
> ** ii. **When
> I used the resource from 432 cores (36 nodes with 48GB/node), it provided
> the solution. ****
>
> **b. **We used another software to generate the same sparse matrix
> by using 576 cores: ****
>
> ** i. *
> *When I used the resource from 576 cores (48 nodes with 48GB/node), it
> could not provide the solution. There was a complain on the memory
> violation.****
>
> ** ii. **When
> I used the resource from 1152 cores (96 nodes with 48GB/node), it provided
> the solution. ****
>
> ** **
>
> **3. **The program could not solve the *large* size problem (m=640k
> with a dense matrix at the order of n=178k included in the sparse matrix).
> The full sparse matrix is sized at 511GB.****
>
> **a. **We used another software to generate sparse matrix by using
> 900 cores: ****
>
> ** i. *
> *When I used the resource from 900 cores (75 nodes with 48GB/node), it
> could not provide the solution. There was a complain on the memory
> violation.****
>
> ** ii. **When
> I used the resource from 2400 cores (200 nodes with 48GB/node), it STILL
> COULD NOT provide the solution.
>
This has a huge dense block and we can't tell from your description how
large the vertex separators are. MUMPS is well-known to have some
non-scalable data structures. They do some analysis on rank 0 and require
right hand sides to be provided entirely on rank 0.
> ****
>
> ** **
>
> My confusion starts from the medium size problem:****
>
> **· **It seems something was not right in the default setting in
> ex125.c for these problems.
>
I don't know what you're asking. Do a heap profile if you want to find out
where the memory is leaking. It's *much* better to call the solver directly
from the process that assembles the matrix. Going through a file is
terribly wasteful.
> ****
>
> **· **I got the info that METIS was used instead of ParMETIS in
> solving these problems.
>
Did you ask for parallel ordering (-mat_mumps_icntl_28) and parmetis
(-mat_mumps_icntl_29)? These options are shown in -help. (It's not our
fault the MUMPS developers have Fortran numbered option insanity baked in.
Read their manual and translate to our numbered options. While you're at
it, ask them to write a decent options and error reporting mechanism.
> ****
>
> **· **Furthermore, it appears that there was unreasonable demand
> on the solver even on the medium size problem.
>
Good, it's easier to debug. Check -log_summary for that run and use the
heap profilers at your computing facility.
> ****
>
> **· **I suspect one rank was trying to collect all data from
> other ranks.
>
Naturally.
> ****
>
> ** **
>
> What other addition setting is needed for mumps such that it could deal
> with medium and large size problems?****
>
> ** **
>
> Do you guys have similar experience on that?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jinquan****
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121023/fe9c3e05/attachment.html>
More information about the petsc-users
mailing list