[petsc-users] mumps freezes for bigger problems

Jack Poulson jack.poulson at gmail.com
Fri Dec 23 16:56:38 CST 2011


It looks like it's due to mixing different MPI implementations together
(i.e., including the wrong 'mpif.h'):
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-July/007559.html

If I recall correctly, MUMPS only uses ScaLAPACK to factor the root
separator when it is sufficiently large, and that would explain why it
works for him for smaller problems. I would double check that ScaLAPACK,
PETSc, and MUMPS are all compiled with the same MPI implementation.

Jack

On Wed, Dec 21, 2011 at 4:55 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:

> Hailong:
> I've never seen this type of error from MUMPS.
> It seems programming bug. Are you sure smaller problem runs correctly?
> Use valgrind check it.
>
> Hong
>
> > I got the error from MUMPS.
> >
> > When I run MUMPS (which requring scalapack) with matrix size (n) = 30620,
> > nonzeros (nz) = 785860,
> > I could run it. And could get result.
> > But when I run it with
> > nz=3112820
> > n =61240
> >
> >
> > I am getting the following error
> >
> >
> > 17 - <NO ERROR MESSAGE> : Could not convert index 1140850688 into a
> pointer
> > The index may be an incorrect argument.
> > Possible sources of this problem are a missing "include 'mpif.h'",
> > a misspelled MPI object (e.g., MPI_COM_WORLD instead of MPI_COMM_WORLD)
> > or a misspelled user variable for an MPI object (e.g.,
> > com instead of comm).
> > [17] [] Aborting Program!
> >
> >
> >
> > Do you know what happened?
> > Is that possible it is running out of memory?
> >
> > On Wed, Dec 21, 2011 at 7:15 AM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
> >>
> >> Direct solvers often require large memory for storing matrix factors.
> >> As Jed suggests, you may try superlu_dist.
> >>
> >> With mumps, I notice you use parallel analysis, which is relative new in
> >> mumps.
> >> What happens if you use default sequential analysis with
> >> different matrix orderings?
> >> I usually use matrix ordering '-mat_mumps_icntl_7 2'.
> >>
> >> Also, you can increase fill ratio,
> >> -mat_mumps_icntl_14 <20>: ICNTL(14): percentage of estimated workspace
> >> increase (None)
> >> i.e., default ration is 20, you may try 50? (I notice that you already
> use
> >> 30).
> >>
> >> It seems you use 16 CPUs for "a mere couple thousands
> >> elements" problems, and mumps "silently freezes". I do not have this
> type
> >> of experience with mumps. I usually can solve sparse matrix of size
> >> 10k with 1 cpu using mumps.
> >> When mumps runs out of memory or gets other problems, it terminates
> >> execution and dumps out error message,
> >> not freezes.
> >> Something is wrong here. Use a debugger and figuring out where it
> freezes.
> >>
> >> Hong
> >>
> >> On Wed, Dec 21, 2011 at 7:01 AM, Jed Brown <jedbrown at mcs.anl.gov>
> wrote:
> >> > -pc_type lu -pc_factor_mat_solver_package superlu_dist
> >> >
> >> > On Dec 21, 2011 6:19 AM, "Dominik Szczerba" <dominik at itis.ethz.ch>
> >> > wrote:
> >> >>
> >> >> I am successfully solving my indefinite systems with MUMPS but only
> >> >> for very small problems. To give a feeling, a mere couple thousands
> >> >> elements. If I only double the problem size, it silently freezes,
> even
> >> >> with max verbosity via the control parameters. Did anyone succeed
> here
> >> >> with big problems? Any recommendations for a drop-in replacement for
> >> >> MUMPS?
> >> >>
> >> >> Thanks for any hints,
> >> >> Dominik
> >> >>
> >> >>
> >> >>
> >> >> Options used:
> >> >> -mat_mumps_icntl_4 3 -mat_mumps_icntl_28 2 -mat_mumps_icntl_29
> >> >>
> >> >> Output:
> >> >>
> >> >> ****** FACTORIZATION STEP ********
> >> >>
> >> >>
> >> >>  GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
> >> >>  NUMBER OF WORKING PROCESSES              =          16
> >> >>  OUT-OF-CORE OPTION (ICNTL(22))           =           0
> >> >>  REAL SPACE FOR FACTORS                   =  1438970073
> >> >>  INTEGER SPACE FOR FACTORS                =    11376442
> >> >>  MAXIMUM FRONTAL SIZE (ESTIMATED)         =       16868
> >> >>  NUMBER OF NODES IN THE TREE              =       43676
> >> >>  Convergence error after scaling for ONE-NORM (option 7/8)   =
> 0.21D+01
> >> >>  Maximum effective relaxed size of S              =   231932340
> >> >>  Average effective relaxed size of S              =   182366303
> >> >>
> >> >>  REDISTRIB: TOTAL DATA LOCAL/SENT         =     1509215    22859750
> >> >>  GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.8270
> >> >>  ** Memory relaxation parameter ( ICNTL(14)  )            :        35
> >> >>  ** Rank of processor needing largest memory in facto     :         0
> >> >>  ** Space in MBYTES used by this processor for facto      :      2017
> >> >>  ** Avg. Space in MBYTES per working proc during facto    :      1618
> >
> >
> >
> >
> > --
> > Hailong
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20111223/f097f5e6/attachment-0001.htm>


More information about the petsc-users mailing list