[petsc-users] MUMPS error and superLU error

Dave May dave.mayhem23 at gmail.com
Sun May 31 08:36:14 CDT 2015


It failed due to a lack of memory. "OOM" stands for "out of memory". OOM
killer terminated your job means you ran out of memory.




On Sunday, 31 May 2015, venkatesh g <venkateshgk.j at gmail.com
<javascript:_e(%7B%7D,'cvml','venkateshgk.j at gmail.com');>> wrote:

> Hi all,
>
> I tried to run my Generalized Eigenproblem in 120 x 24 = 2880 cores.
> The matrix size of A = 20GB and B = 5GB.
>
> It got killed after 7 Hrs of run time. Please see the mumps error log. Why
> must it fail ?
> I gave the command:
>
> aprun -n 240 -N 24 ./ex7 -f1 a110t -f2 b110t -st_type sinvert -eps_nev 1
> -log_summary -st_ksp_type preonly -st_pc_type lu
> -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-2
>
> Kindly let me know.
>
> cheers,
> Venkatesh
>
> On Fri, May 29, 2015 at 10:46 PM, venkatesh g <venkateshgk.j at gmail.com>
> wrote:
>
>> Hi Matt, users,
>>
>> Thanks for the info. Do you also use Petsc and Slepc with MUMPS ? I get
>> into the segmentation error if I increase my matrix size.
>>
>> Can you suggest other software for direct solver for QR in parallel since
>> as LU may not be good for a singular B matrix in Ax=lambda Bx ? I am
>> attaching the working version mumps log.
>>
>> My matrix size here is around 47000x47000. If I am not wrong, the memory
>> usage per core is 272MB.
>>
>> Can you tell me if I am wrong ? or really if its light on memory for this
>> matrix ?
>>
>> Thanks
>> cheers,
>> Venkatesh
>>
>> On Fri, May 29, 2015 at 4:00 PM, Matt Landreman <matt.landreman at gmail.com
>> > wrote:
>>
>>> Dear Venkatesh,
>>>
>>> As you can see in the error log, you are now getting a segmentation
>>> fault, which is almost certainly a separate issue from the info(1)=-9
>>> memory problem you had previously. Here is one idea which may or may not
>>> help. I've used mumps on the NERSC Edison system, and I found that I
>>> sometimes get segmentation faults when using the default Intel compiler.
>>> When I switched to the cray compiler the problem disappeared. So you could
>>> perhaps try a different compiler if one is available on your system.
>>>
>>> Matt
>>> On May 29, 2015 4:04 AM, "venkatesh g" <venkateshgk.j at gmail.com> wrote:
>>>
>>>> Hi Matt,
>>>>
>>>> I did what you told and read the manual of that CNTL parameters. I
>>>> solve for that with CNTL(1)=1e-4. It is working.
>>>>
>>>> But it was a test matrix with size 46000x46000. Actual matrix size is
>>>> 108900x108900 and will increase in the future.
>>>>
>>>> I get this error of memory allocation failed. And the binary matrix
>>>> size of A is 20GB and B is 5 GB.
>>>>
>>>> Now I submit this in 240 processors each 4 GB RAM and also in 128
>>>> Processors with total 512 GB RAM.
>>>>
>>>> In both the cases, it fails with the following error like memory is not
>>>> enough. But for 90000x90000 size it had run serially in Matlab with <256 GB
>>>> RAM.
>>>>
>>>> Kindly let me know.
>>>>
>>>> Venkatesh
>>>>
>>>> On Tue, May 26, 2015 at 8:02 PM, Matt Landreman <
>>>> matt.landreman at gmail.com> wrote:
>>>>
>>>>> Hi Venkatesh,
>>>>>
>>>>> I've struggled a bit with mumps memory allocation too.  I think the
>>>>> behavior of mumps is roughly the following. First, in the "analysis step",
>>>>> mumps computes a minimum memory required based on the structure of nonzeros
>>>>> in the matrix.  Then when it actually goes to factorize the matrix, if it
>>>>> ever encounters an element smaller than CNTL(1) (default=0.01) in the
>>>>> diagonal of a sub-matrix it is trying to factorize, it modifies the
>>>>> ordering to avoid the small pivot, which increases the fill-in (hence
>>>>> memory needed).  ICNTL(14) sets the margin allowed for this unanticipated
>>>>> fill-in.  Setting ICNTL(14)=200000 as in your email is not the solution,
>>>>> since this means mumps asks for a huge amount of memory at the start.
>>>>> Better would be to lower CNTL(1) or (I think) use static pivoting
>>>>> (CNTL(4)).  Read the section in the mumps manual about these CNTL
>>>>> parameters. I typically set CNTL(1)=1e-6, which eliminated all the
>>>>> INFO(1)=-9 errors for my problem, without having to modify ICNTL(14).
>>>>>
>>>>> Also, I recommend running with ICNTL(4)=3 to display diagnostics. Look
>>>>> for the line in standard output that says "TOTAL     space in MBYTES for IC
>>>>> factorization".  This is the amount of memory that mumps is trying to
>>>>> allocate, and for the default ICNTL(14), it should be similar to matlab's
>>>>> need.
>>>>>
>>>>> Hope this helps,
>>>>> -Matt Landreman
>>>>> University of Maryland
>>>>>
>>>>> On Tue, May 26, 2015 at 10:03 AM, venkatesh g <venkateshgk.j at gmail.com
>>>>> > wrote:
>>>>>
>>>>>> I posted a while ago in MUMPS forums but no one seems to reply.
>>>>>>
>>>>>> I am solving a large generalized Eigenvalue problem.
>>>>>>
>>>>>> I am getting the following error which is attached, after giving the
>>>>>> command:
>>>>>>
>>>>>> /cluster/share/venkatesh/petsc-3.5.3/linux-gnu/bin/mpiexec -np 64
>>>>>> -hosts compute-0-4,compute-0-6,compute-0-7,compute-0-8 ./ex7 -f1 a72t -f2
>>>>>> b72t -st_type sinvert -eps_nev 3 -eps_target 0.5 -st_ksp_type preonly
>>>>>> -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_icntl_14
>>>>>> 200000
>>>>>>
>>>>>> IT IS impossible to allocate so much memory per processor.. it is
>>>>>> asking like around 70 GB per processor.
>>>>>>
>>>>>> A serial job in MATLAB for the same matrices takes < 60GB.
>>>>>>
>>>>>> After trying out superLU_dist, I have attached the error there also
>>>>>> (segmentation error).
>>>>>>
>>>>>> Kindly help me.
>>>>>>
>>>>>> Venkatesh
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150531/29f55383/attachment.html>


More information about the petsc-users mailing list