[petsc-users] MUMPS error and superLU error
Hong
hzhang at mcs.anl.gov
Sun May 31 09:34:08 CDT 2015
venkatesh,
As we discussed previously, even on smaller problems,
both mumps and superlu_dist failed, although Mumps gave "OOM" error in
numerical factorization.
You acknowledged that B is singular, which may need additional
reformulation for your eigenvalue problems. The option '-st_type sinvert'
likely uses B^{-1} (have you read slepc manual?), which could be the source
of trouble.
Please investigate your model, understand why B is singular; if there is a
way to dump null space before submitting large size simulation.
Hong
On Sun, May 31, 2015 at 8:36 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
> It failed due to a lack of memory. "OOM" stands for "out of memory". OOM
> killer terminated your job means you ran out of memory.
>
>
>
>
> On Sunday, 31 May 2015, venkatesh g <venkateshgk.j at gmail.com> wrote:
>
>> Hi all,
>>
>> I tried to run my Generalized Eigenproblem in 120 x 24 = 2880 cores.
>> The matrix size of A = 20GB and B = 5GB.
>>
>> It got killed after 7 Hrs of run time. Please see the mumps error log.
>> Why must it fail ?
>> I gave the command:
>>
>> aprun -n 240 -N 24 ./ex7 -f1 a110t -f2 b110t -st_type sinvert -eps_nev 1
>> -log_summary -st_ksp_type preonly -st_pc_type lu
>> -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-2
>>
>> Kindly let me know.
>>
>> cheers,
>> Venkatesh
>>
>> On Fri, May 29, 2015 at 10:46 PM, venkatesh g <venkateshgk.j at gmail.com>
>> wrote:
>>
>>> Hi Matt, users,
>>>
>>> Thanks for the info. Do you also use Petsc and Slepc with MUMPS ? I get
>>> into the segmentation error if I increase my matrix size.
>>>
>>> Can you suggest other software for direct solver for QR in parallel
>>> since as LU may not be good for a singular B matrix in Ax=lambda Bx ? I am
>>> attaching the working version mumps log.
>>>
>>> My matrix size here is around 47000x47000. If I am not wrong, the memory
>>> usage per core is 272MB.
>>>
>>> Can you tell me if I am wrong ? or really if its light on memory for
>>> this matrix ?
>>>
>>> Thanks
>>> cheers,
>>> Venkatesh
>>>
>>> On Fri, May 29, 2015 at 4:00 PM, Matt Landreman <
>>> matt.landreman at gmail.com> wrote:
>>>
>>>> Dear Venkatesh,
>>>>
>>>> As you can see in the error log, you are now getting a segmentation
>>>> fault, which is almost certainly a separate issue from the info(1)=-9
>>>> memory problem you had previously. Here is one idea which may or may not
>>>> help. I've used mumps on the NERSC Edison system, and I found that I
>>>> sometimes get segmentation faults when using the default Intel compiler.
>>>> When I switched to the cray compiler the problem disappeared. So you could
>>>> perhaps try a different compiler if one is available on your system.
>>>>
>>>> Matt
>>>> On May 29, 2015 4:04 AM, "venkatesh g" <venkateshgk.j at gmail.com> wrote:
>>>>
>>>>> Hi Matt,
>>>>>
>>>>> I did what you told and read the manual of that CNTL parameters. I
>>>>> solve for that with CNTL(1)=1e-4. It is working.
>>>>>
>>>>> But it was a test matrix with size 46000x46000. Actual matrix size is
>>>>> 108900x108900 and will increase in the future.
>>>>>
>>>>> I get this error of memory allocation failed. And the binary matrix
>>>>> size of A is 20GB and B is 5 GB.
>>>>>
>>>>> Now I submit this in 240 processors each 4 GB RAM and also in 128
>>>>> Processors with total 512 GB RAM.
>>>>>
>>>>> In both the cases, it fails with the following error like memory is
>>>>> not enough. But for 90000x90000 size it had run serially in Matlab with
>>>>> <256 GB RAM.
>>>>>
>>>>> Kindly let me know.
>>>>>
>>>>> Venkatesh
>>>>>
>>>>> On Tue, May 26, 2015 at 8:02 PM, Matt Landreman <
>>>>> matt.landreman at gmail.com> wrote:
>>>>>
>>>>>> Hi Venkatesh,
>>>>>>
>>>>>> I've struggled a bit with mumps memory allocation too. I think the
>>>>>> behavior of mumps is roughly the following. First, in the "analysis step",
>>>>>> mumps computes a minimum memory required based on the structure of nonzeros
>>>>>> in the matrix. Then when it actually goes to factorize the matrix, if it
>>>>>> ever encounters an element smaller than CNTL(1) (default=0.01) in the
>>>>>> diagonal of a sub-matrix it is trying to factorize, it modifies the
>>>>>> ordering to avoid the small pivot, which increases the fill-in (hence
>>>>>> memory needed). ICNTL(14) sets the margin allowed for this unanticipated
>>>>>> fill-in. Setting ICNTL(14)=200000 as in your email is not the solution,
>>>>>> since this means mumps asks for a huge amount of memory at the start.
>>>>>> Better would be to lower CNTL(1) or (I think) use static pivoting
>>>>>> (CNTL(4)). Read the section in the mumps manual about these CNTL
>>>>>> parameters. I typically set CNTL(1)=1e-6, which eliminated all the
>>>>>> INFO(1)=-9 errors for my problem, without having to modify ICNTL(14).
>>>>>>
>>>>>> Also, I recommend running with ICNTL(4)=3 to display diagnostics.
>>>>>> Look for the line in standard output that says "TOTAL space in MBYTES
>>>>>> for IC factorization". This is the amount of memory that mumps is trying
>>>>>> to allocate, and for the default ICNTL(14), it should be similar to
>>>>>> matlab's need.
>>>>>>
>>>>>> Hope this helps,
>>>>>> -Matt Landreman
>>>>>> University of Maryland
>>>>>>
>>>>>> On Tue, May 26, 2015 at 10:03 AM, venkatesh g <
>>>>>> venkateshgk.j at gmail.com> wrote:
>>>>>>
>>>>>>> I posted a while ago in MUMPS forums but no one seems to reply.
>>>>>>>
>>>>>>> I am solving a large generalized Eigenvalue problem.
>>>>>>>
>>>>>>> I am getting the following error which is attached, after giving the
>>>>>>> command:
>>>>>>>
>>>>>>> /cluster/share/venkatesh/petsc-3.5.3/linux-gnu/bin/mpiexec -np 64
>>>>>>> -hosts compute-0-4,compute-0-6,compute-0-7,compute-0-8 ./ex7 -f1 a72t -f2
>>>>>>> b72t -st_type sinvert -eps_nev 3 -eps_target 0.5 -st_ksp_type preonly
>>>>>>> -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_icntl_14
>>>>>>> 200000
>>>>>>>
>>>>>>> IT IS impossible to allocate so much memory per processor.. it is
>>>>>>> asking like around 70 GB per processor.
>>>>>>>
>>>>>>> A serial job in MATLAB for the same matrices takes < 60GB.
>>>>>>>
>>>>>>> After trying out superLU_dist, I have attached the error there also
>>>>>>> (segmentation error).
>>>>>>>
>>>>>>> Kindly help me.
>>>>>>>
>>>>>>> Venkatesh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150531/4519c272/attachment.html>
More information about the petsc-users
mailing list