[petsc-users] MUMPS error and superLU error

venkatesh g venkateshgk.j at gmail.com
Mon Jun 22 10:57:39 CDT 2015


Hi
I have restructured my matrix eigenvalue problem to see why B is singular
as you suggested by changing the governing equations in different form.

Now my matrix B is not singular. Both A and B are invertible in Ax=lambda
Bx.

Still I receive error in MUMPS as it uses large memory (attached is the
error log)

I gave the command: aprun -n 240 -N 24 ./ex7 -f1 A100t -f2 B100t -st_type
sinvert -eps_target 0.01 -st_ksp_type preonly -st_pc_type lu
-st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-5
-mat_mumps_icntl_4 2 -evecs v100t

The matrix A is 60% with zeros.

Kindly help me.

Venkatesh

On Sun, May 31, 2015 at 8:04 PM, Hong <hzhang at mcs.anl.gov> wrote:

> venkatesh,
>
> As we discussed previously, even on smaller problems,
> both mumps and superlu_dist failed, although Mumps gave "OOM" error in
> numerical factorization.
>
> You acknowledged that B is singular, which may need additional
> reformulation for your eigenvalue problems. The option '-st_type sinvert'
> likely uses B^{-1} (have you read slepc manual?), which could be the source
> of trouble.
>
> Please investigate your model, understand why B is singular; if there is a
> way to dump null space before submitting large size simulation.
>
> Hong
>
>
> On Sun, May 31, 2015 at 8:36 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>
>> It failed due to a lack of memory. "OOM" stands for "out of memory". OOM
>> killer terminated your job means you ran out of memory.
>>
>>
>>
>>
>> On Sunday, 31 May 2015, venkatesh g <venkateshgk.j at gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I tried to run my Generalized Eigenproblem in 120 x 24 = 2880 cores.
>>> The matrix size of A = 20GB and B = 5GB.
>>>
>>> It got killed after 7 Hrs of run time. Please see the mumps error log.
>>> Why must it fail ?
>>> I gave the command:
>>>
>>> aprun -n 240 -N 24 ./ex7 -f1 a110t -f2 b110t -st_type sinvert -eps_nev 1
>>> -log_summary -st_ksp_type preonly -st_pc_type lu
>>> -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-2
>>>
>>> Kindly let me know.
>>>
>>> cheers,
>>> Venkatesh
>>>
>>> On Fri, May 29, 2015 at 10:46 PM, venkatesh g <venkateshgk.j at gmail.com>
>>> wrote:
>>>
>>>> Hi Matt, users,
>>>>
>>>> Thanks for the info. Do you also use Petsc and Slepc with MUMPS ? I get
>>>> into the segmentation error if I increase my matrix size.
>>>>
>>>> Can you suggest other software for direct solver for QR in parallel
>>>> since as LU may not be good for a singular B matrix in Ax=lambda Bx ? I am
>>>> attaching the working version mumps log.
>>>>
>>>> My matrix size here is around 47000x47000. If I am not wrong, the
>>>> memory usage per core is 272MB.
>>>>
>>>> Can you tell me if I am wrong ? or really if its light on memory for
>>>> this matrix ?
>>>>
>>>> Thanks
>>>> cheers,
>>>> Venkatesh
>>>>
>>>> On Fri, May 29, 2015 at 4:00 PM, Matt Landreman <
>>>> matt.landreman at gmail.com> wrote:
>>>>
>>>>> Dear Venkatesh,
>>>>>
>>>>> As you can see in the error log, you are now getting a segmentation
>>>>> fault, which is almost certainly a separate issue from the info(1)=-9
>>>>> memory problem you had previously. Here is one idea which may or may not
>>>>> help. I've used mumps on the NERSC Edison system, and I found that I
>>>>> sometimes get segmentation faults when using the default Intel compiler.
>>>>> When I switched to the cray compiler the problem disappeared. So you could
>>>>> perhaps try a different compiler if one is available on your system.
>>>>>
>>>>> Matt
>>>>> On May 29, 2015 4:04 AM, "venkatesh g" <venkateshgk.j at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Matt,
>>>>>>
>>>>>> I did what you told and read the manual of that CNTL parameters. I
>>>>>> solve for that with CNTL(1)=1e-4. It is working.
>>>>>>
>>>>>> But it was a test matrix with size 46000x46000. Actual matrix size is
>>>>>> 108900x108900 and will increase in the future.
>>>>>>
>>>>>> I get this error of memory allocation failed. And the binary matrix
>>>>>> size of A is 20GB and B is 5 GB.
>>>>>>
>>>>>> Now I submit this in 240 processors each 4 GB RAM and also in 128
>>>>>> Processors with total 512 GB RAM.
>>>>>>
>>>>>> In both the cases, it fails with the following error like memory is
>>>>>> not enough. But for 90000x90000 size it had run serially in Matlab with
>>>>>> <256 GB RAM.
>>>>>>
>>>>>> Kindly let me know.
>>>>>>
>>>>>> Venkatesh
>>>>>>
>>>>>> On Tue, May 26, 2015 at 8:02 PM, Matt Landreman <
>>>>>> matt.landreman at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Venkatesh,
>>>>>>>
>>>>>>> I've struggled a bit with mumps memory allocation too.  I think the
>>>>>>> behavior of mumps is roughly the following. First, in the "analysis step",
>>>>>>> mumps computes a minimum memory required based on the structure of nonzeros
>>>>>>> in the matrix.  Then when it actually goes to factorize the matrix, if it
>>>>>>> ever encounters an element smaller than CNTL(1) (default=0.01) in the
>>>>>>> diagonal of a sub-matrix it is trying to factorize, it modifies the
>>>>>>> ordering to avoid the small pivot, which increases the fill-in (hence
>>>>>>> memory needed).  ICNTL(14) sets the margin allowed for this unanticipated
>>>>>>> fill-in.  Setting ICNTL(14)=200000 as in your email is not the solution,
>>>>>>> since this means mumps asks for a huge amount of memory at the start.
>>>>>>> Better would be to lower CNTL(1) or (I think) use static pivoting
>>>>>>> (CNTL(4)).  Read the section in the mumps manual about these CNTL
>>>>>>> parameters. I typically set CNTL(1)=1e-6, which eliminated all the
>>>>>>> INFO(1)=-9 errors for my problem, without having to modify ICNTL(14).
>>>>>>>
>>>>>>> Also, I recommend running with ICNTL(4)=3 to display diagnostics.
>>>>>>> Look for the line in standard output that says "TOTAL     space in MBYTES
>>>>>>> for IC factorization".  This is the amount of memory that mumps is trying
>>>>>>> to allocate, and for the default ICNTL(14), it should be similar to
>>>>>>> matlab's need.
>>>>>>>
>>>>>>> Hope this helps,
>>>>>>> -Matt Landreman
>>>>>>> University of Maryland
>>>>>>>
>>>>>>> On Tue, May 26, 2015 at 10:03 AM, venkatesh g <
>>>>>>> venkateshgk.j at gmail.com> wrote:
>>>>>>>
>>>>>>>> I posted a while ago in MUMPS forums but no one seems to reply.
>>>>>>>>
>>>>>>>> I am solving a large generalized Eigenvalue problem.
>>>>>>>>
>>>>>>>> I am getting the following error which is attached, after giving
>>>>>>>> the command:
>>>>>>>>
>>>>>>>> /cluster/share/venkatesh/petsc-3.5.3/linux-gnu/bin/mpiexec -np 64
>>>>>>>> -hosts compute-0-4,compute-0-6,compute-0-7,compute-0-8 ./ex7 -f1 a72t -f2
>>>>>>>> b72t -st_type sinvert -eps_nev 3 -eps_target 0.5 -st_ksp_type preonly
>>>>>>>> -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_icntl_14
>>>>>>>> 200000
>>>>>>>>
>>>>>>>> IT IS impossible to allocate so much memory per processor.. it is
>>>>>>>> asking like around 70 GB per processor.
>>>>>>>>
>>>>>>>> A serial job in MATLAB for the same matrices takes < 60GB.
>>>>>>>>
>>>>>>>> After trying out superLU_dist, I have attached the error there also
>>>>>>>> (segmentation error).
>>>>>>>>
>>>>>>>> Kindly help me.
>>>>>>>>
>>>>>>>> Venkatesh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150622/e6aedeae/attachment-0001.html>
-------------- next part --------------

Generalized eigenproblem stored in file.

 Reading COMPLEX matrices from binary files...
Entering ZMUMPS driver with JOB, N, NZ =   1       50000              0

 ZMUMPS 4.10.0        
L U Solver for unsymmetric matrices
Type of parallelism: Working host

 ****** ANALYSIS STEP ********

 ** Max-trans not allowed because matrix is distributed
 ... Structural symmetry (in percent)=   86
 Density: NBdense, Average, Median   =    01635120109
 Ordering based on METIS 
 A root of estimated size        21001  has been selected for Scalapack.

Leaving analysis phase with  ...
INFOG(1)                                       =               0
INFOG(2)                                       =               0
 -- (20) Number of entries in factors (estim.) =      1314700738
 --  (3) Storage of factors  (REAL, estimated) =      1416564824
 --  (4) Storage of factors  (INT , estimated) =        10444785
 --  (5) Maximum frontal size      (estimated) =           21500
 --  (6) Number of nodes in the tree           =             240
 -- (32) Type of analysis effectively used     =               1
 --  (7) Ordering option effectively used      =               5
ICNTL(6) Maximum transversal option            =               0
ICNTL(7) Pivot order option                    =               7
Percentage of memory relaxation (effective)    =              35
Number of level 2 nodes                        =             139
Number of split nodes                          =               2
RINFOG(1) Operations during elimination (estim)=   2.341D+13
Distributed matrix entry format (ICNTL(18))    =               3
 ** Rank of proc needing largest memory in IC facto        :        30
 ** Estimated corresponding MBYTES for IC facto            :     21593
 ** Estimated avg. MBYTES per work. proc at facto (IC)     :      7708
 ** TOTAL     space in MBYTES for IC factorization         :   1850075
 ** Rank of proc needing largest memory for OOC facto      :        30
 ** Estimated corresponding MBYTES for OOC facto           :     21681
 ** Estimated avg. MBYTES per work. proc at facto (OOC)    :      7782
 ** TOTAL     space in MBYTES for OOC factorization        :   1867805
Entering ZMUMPS driver with JOB, N, NZ =   2       50000      716459748

 ****** FACTORIZATION STEP ********


 GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
 NUMBER OF WORKING PROCESSES              =         240
 OUT-OF-CORE OPTION (ICNTL(22))           =           0
 REAL SPACE FOR FACTORS                   =  1416564824
 INTEGER SPACE FOR FACTORS                =    10444785
 MAXIMUM FRONTAL SIZE (ESTIMATED)         =       21500
 NUMBER OF NODES IN THE TREE              =         240
 Convergence error after scaling for ONE-NORM (option 7/8)   = 0.59D+01
 Maximum effective relaxed size of S              =  1181811925
 Average effective relaxed size of S              =   228990839

 REDISTRIB: TOTAL DATA LOCAL/SENT         =   328575589  1437471711
 GLOBAL TIME FOR MATRIX DISTRIBUTION       =    206.6792
 ** Memory relaxation parameter ( ICNTL(14)  )            :        35
 ** Rank of processor needing largest memory in facto     :        30
 ** Space in MBYTES used by this processor for facto      :     21593
 ** Avg. Space in MBYTES per working proc during facto    :      7708
[NID 01360] 2015-06-22 20:00:41 Apid 432433: initiated application termination
[NID 01360] 2015-06-22 19:59:18 Apid 432433: OOM killer terminated this process.
Application 432433 exit signals: Killed
Application 432433 resources: utime ~0s, stime ~20s, Rss ~7716, inblocks ~16040, outblocks ~2380


More information about the petsc-users mailing list