[petsc-users] MUMPS error and superLU error

venkatesh g venkateshgk.j at gmail.com
Sun May 31 07:37:20 CDT 2015


Hi all,

I tried to run my Generalized Eigenproblem in 120 x 24 = 2880 cores.
The matrix size of A = 20GB and B = 5GB.

It got killed after 7 Hrs of run time. Please see the mumps error log. Why
must it fail ?
I gave the command:

aprun -n 240 -N 24 ./ex7 -f1 a110t -f2 b110t -st_type sinvert -eps_nev 1
-log_summary -st_ksp_type preonly -st_pc_type lu
-st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-2

Kindly let me know.

cheers,
Venkatesh

On Fri, May 29, 2015 at 10:46 PM, venkatesh g <venkateshgk.j at gmail.com>
wrote:

> Hi Matt, users,
>
> Thanks for the info. Do you also use Petsc and Slepc with MUMPS ? I get
> into the segmentation error if I increase my matrix size.
>
> Can you suggest other software for direct solver for QR in parallel since
> as LU may not be good for a singular B matrix in Ax=lambda Bx ? I am
> attaching the working version mumps log.
>
> My matrix size here is around 47000x47000. If I am not wrong, the memory
> usage per core is 272MB.
>
> Can you tell me if I am wrong ? or really if its light on memory for this
> matrix ?
>
> Thanks
> cheers,
> Venkatesh
>
> On Fri, May 29, 2015 at 4:00 PM, Matt Landreman <matt.landreman at gmail.com>
> wrote:
>
>> Dear Venkatesh,
>>
>> As you can see in the error log, you are now getting a segmentation
>> fault, which is almost certainly a separate issue from the info(1)=-9
>> memory problem you had previously. Here is one idea which may or may not
>> help. I've used mumps on the NERSC Edison system, and I found that I
>> sometimes get segmentation faults when using the default Intel compiler.
>> When I switched to the cray compiler the problem disappeared. So you could
>> perhaps try a different compiler if one is available on your system.
>>
>> Matt
>> On May 29, 2015 4:04 AM, "venkatesh g" <venkateshgk.j at gmail.com> wrote:
>>
>>> Hi Matt,
>>>
>>> I did what you told and read the manual of that CNTL parameters. I solve
>>> for that with CNTL(1)=1e-4. It is working.
>>>
>>> But it was a test matrix with size 46000x46000. Actual matrix size is
>>> 108900x108900 and will increase in the future.
>>>
>>> I get this error of memory allocation failed. And the binary matrix size
>>> of A is 20GB and B is 5 GB.
>>>
>>> Now I submit this in 240 processors each 4 GB RAM and also in 128
>>> Processors with total 512 GB RAM.
>>>
>>> In both the cases, it fails with the following error like memory is not
>>> enough. But for 90000x90000 size it had run serially in Matlab with <256 GB
>>> RAM.
>>>
>>> Kindly let me know.
>>>
>>> Venkatesh
>>>
>>> On Tue, May 26, 2015 at 8:02 PM, Matt Landreman <
>>> matt.landreman at gmail.com> wrote:
>>>
>>>> Hi Venkatesh,
>>>>
>>>> I've struggled a bit with mumps memory allocation too.  I think the
>>>> behavior of mumps is roughly the following. First, in the "analysis step",
>>>> mumps computes a minimum memory required based on the structure of nonzeros
>>>> in the matrix.  Then when it actually goes to factorize the matrix, if it
>>>> ever encounters an element smaller than CNTL(1) (default=0.01) in the
>>>> diagonal of a sub-matrix it is trying to factorize, it modifies the
>>>> ordering to avoid the small pivot, which increases the fill-in (hence
>>>> memory needed).  ICNTL(14) sets the margin allowed for this unanticipated
>>>> fill-in.  Setting ICNTL(14)=200000 as in your email is not the solution,
>>>> since this means mumps asks for a huge amount of memory at the start.
>>>> Better would be to lower CNTL(1) or (I think) use static pivoting
>>>> (CNTL(4)).  Read the section in the mumps manual about these CNTL
>>>> parameters. I typically set CNTL(1)=1e-6, which eliminated all the
>>>> INFO(1)=-9 errors for my problem, without having to modify ICNTL(14).
>>>>
>>>> Also, I recommend running with ICNTL(4)=3 to display diagnostics. Look
>>>> for the line in standard output that says "TOTAL     space in MBYTES for IC
>>>> factorization".  This is the amount of memory that mumps is trying to
>>>> allocate, and for the default ICNTL(14), it should be similar to matlab's
>>>> need.
>>>>
>>>> Hope this helps,
>>>> -Matt Landreman
>>>> University of Maryland
>>>>
>>>> On Tue, May 26, 2015 at 10:03 AM, venkatesh g <venkateshgk.j at gmail.com>
>>>> wrote:
>>>>
>>>>> I posted a while ago in MUMPS forums but no one seems to reply.
>>>>>
>>>>> I am solving a large generalized Eigenvalue problem.
>>>>>
>>>>> I am getting the following error which is attached, after giving the
>>>>> command:
>>>>>
>>>>> /cluster/share/venkatesh/petsc-3.5.3/linux-gnu/bin/mpiexec -np 64
>>>>> -hosts compute-0-4,compute-0-6,compute-0-7,compute-0-8 ./ex7 -f1 a72t -f2
>>>>> b72t -st_type sinvert -eps_nev 3 -eps_target 0.5 -st_ksp_type preonly
>>>>> -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_icntl_14
>>>>> 200000
>>>>>
>>>>> IT IS impossible to allocate so much memory per processor.. it is
>>>>> asking like around 70 GB per processor.
>>>>>
>>>>> A serial job in MATLAB for the same matrices takes < 60GB.
>>>>>
>>>>> After trying out superLU_dist, I have attached the error there also
>>>>> (segmentation error).
>>>>>
>>>>> Kindly help me.
>>>>>
>>>>> Venkatesh
>>>>>
>>>>>
>>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150531/e4dbb44a/attachment-0001.html>
-------------- next part --------------

Generalized eigenproblem stored in file.

 Reading COMPLEX matrices from binary files...
Entering ZMUMPS driver with JOB, N, NZ =   1      108900              0

 ZMUMPS 4.10.0        
L U Solver for unsymmetric matrices
Type of parallelism: Working host

 ****** ANALYSIS STEP ********

 ** Max-trans not allowed because matrix is distributed
 ... Structural symmetry (in percent)=   70
 Density: NBdense, Average, Median   =    01232012102
 Ordering based on METIS 
 A root of estimated size        41878  has been selected for Scalapack.

Leaving analysis phase with  ...
INFOG(1)                                       =               0
INFOG(2)                                       =               0
 -- (20) Number of entries in factors (estim.) =      5475631310
 --  (3) Storage of factors  (REAL, estimated) =     89125045501
 --  (4) Storage of factors  (INT , estimated) =       655485547
 --  (5) Maximum frontal size      (estimated) =           41878
 --  (6) Number of nodes in the tree           =             471
 -- (32) Type of analysis effectively used     =               1
 --  (7) Ordering option effectively used      =               5
ICNTL(6) Maximum transversal option            =               0
ICNTL(7) Pivot order option                    =               7
Percentage of memory relaxation (effective)    =              35
Number of level 2 nodes                        =             439
Number of split nodes                          =             149
RINFOG(1) Operations during elimination (estim)=   1.584D+14
Distributed matrix entry format (ICNTL(18))    =               3
 ** Rank of proc needing largest memory in IC facto        :         0
 ** Estimated corresponding MBYTES for IC facto            :     33885
 ** Estimated avg. MBYTES per work. proc at facto (IC)     :      8679
 ** TOTAL     space in MBYTES for IC factorization         :  24997648
 ** Rank of proc needing largest memory for OOC facto      :         0
 ** Estimated corresponding MBYTES for OOC facto           :     33683
 ** Estimated avg. MBYTES per work. proc at facto (OOC)    :      8398
 ** TOTAL     space in MBYTES for OOC factorization        :  24187332
Entering ZMUMPS driver with JOB, N, NZ =   2      108900     1035808400

 ****** FACTORIZATION STEP ********


 GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
 NUMBER OF WORKING PROCESSES              =        2880
 OUT-OF-CORE OPTION (ICNTL(22))           =           0
 REAL SPACE FOR FACTORS                   = 89125045501
 INTEGER SPACE FOR FACTORS                =   655485547
 MAXIMUM FRONTAL SIZE (ESTIMATED)         =       41878
 NUMBER OF NODES IN THE TREE              =         471
 Convergence error after scaling for ONE-NORM (option 7/8)   = 0.95D+00
 Maximum effective relaxed size of S              =  1630229452
 Average effective relaxed size of S              =   225206115
[NID 01214] 2015-05-31 17:36:18 Apid 409924: initiated application termination
[NID 01214] 2015-05-31 17:34:59 Apid 409924: OOM killer terminated this process.
Application 409924 exit signals: Killed
Application 409924 resources: utime ~0s, stime ~225s, Rss ~7716, inblocks ~192480, outblocks ~28560


More information about the petsc-users mailing list