[petsc-users] Can't expand MemType 1: jcol 16104

hong at aspiritech.org hong at aspiritech.org
Wed Jul 8 12:56:03 CDT 2015


The runtime option for using parallel symbolic factorization with
petsc/superlu_dist is '-mat_superlu_dist_parsymbfact', e.g.,

petsc/src/ksp/ksp/examples/tutorials (master)
$ mpiexec -n 2 ./ex2 -pc_type lu -pc_factor_mat_solver_package superlu_dist
-mat_superlu_dist_parsymbfact

Hong

On Wed, Jul 8, 2015 at 8:46 AM, Xiaoye S. Li <xsli at lbl.gov> wrote:

> Did you find out how to change option to use parallel symbolic
> factorization?  Perhaps PETSc team can help.
>
> Sherry
>
>
> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>
>> Is there an inquiry function that tells you all the available options?
>>
>> Sherry
>>
>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas <aph at email.arizona.edu>
>> wrote:
>>
>>> Hi Sherry,
>>>
>>> Thanks for your message. I have used superlu_dist default options. I did
>>> not realize that I was doing serial symbolic factorization. That is
>>> probably the cause of my problem.
>>> Each node on Garnet has 60GB usable memory and I can run with 1,2,4,8,16
>>> or 32 core per node.
>>>
>>> So I should use:
>>>
>>> -mat_superlu_dist_r 20
>>> -mat_superlu_dist_c 32
>>>
>>> How do you specify the parallel symbolic factorization option? is it
>>> -mat_superlu_dist_matinput 1
>>>
>>> Thanks,
>>>
>>> Anthony
>>>
>>>
>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>>>
>>>> For superlu_dist failure, this occurs during symbolic factorization.
>>>> Since you are using serial symbolic factorization, it requires the entire
>>>> graph of A to be available in the memory of one MPI task. How much memory
>>>> do you have for each MPI task?
>>>>
>>>> It won't help even if you use more processes.  You should try to use
>>>> parallel symbolic factorization option.
>>>>
>>>> Another point.  You set up process grid as:
>>>>        Process grid nprow 32 x npcol 20
>>>> For better performance, you show swap the grid dimension. That is, it's
>>>> better to use 20 x 32, never gives nprow larger than npcol.
>>>>
>>>>
>>>> Sherry
>>>>
>>>>
>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>
>>>>>
>>>>>    I would suggest running a sequence of problems, 101 by 101 111 by
>>>>> 111 etc and get the memory usage in each case (when you run out of memory
>>>>> you can get NO useful information out about memory needs). You can then
>>>>> plot memory usage as a function of problem size to get a handle on how much
>>>>> memory it is using.  You can also run on more and more processes (which
>>>>> have a total of more memory) to see how large a problem you may be able to
>>>>> reach.
>>>>>
>>>>>    MUMPS also has an "out of core" version (which we have never used)
>>>>> that could in theory anyways let you get to large problems if you have lots
>>>>> of disk space, but you are on your own figuring out how to use it.
>>>>>
>>>>>   Barry
>>>>>
>>>>> > On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <aph at email.arizona.edu>
>>>>> wrote:
>>>>> >
>>>>> > Hi Jose,
>>>>> >
>>>>> > In my code, I use once PETSc to solve a linear system to get the
>>>>> baseflow (without using SLEPc) and then I use SLEPc to do the stability
>>>>> analysis of that baseflow. This is why, there are some SLEPc options that
>>>>> are not used in test.out-superlu_dist-151x151 (when I am solving for the
>>>>> baseflow with PETSc only). I have attached a 101x101 case for which I get
>>>>> the eigenvalues. That case works fine. However If i increase to 151x151, I
>>>>> get the error that you can see in test.out-superlu_dist-151x151 (similar
>>>>> error with mumps: see test.out-mumps-151x151 line 2918 ). If you look a the
>>>>> very end of the files test.out-superlu_dist-151x151 and
>>>>> test.out-mumps-151x151, you will see that the last info message printed is:
>>>>> >
>>>>> > On Processor (after EPSSetFromOptions)  0    memory:
>>>>> 0.65073152000E+08          =====>  (see line 807 of module_petsc.F90)
>>>>> >
>>>>> > This means that the memory error probably occurs in the call to
>>>>> EPSSolve (see module_petsc.F90 line 810). I would like to evaluate how much
>>>>> memory is required by the most memory intensive operation within EPSSolve.
>>>>> Since I am solving a generalized EVP, I would imagine that it would be the
>>>>> LU decomposition. But is there an accurate way of doing it?
>>>>> >
>>>>> > Before starting with iterative solvers, I would like to exploit as
>>>>> much as I can direct solvers. I tried GMRES with default preconditioner at
>>>>> some point but I had convergence problem. What solver/preconditioner would
>>>>> you recommend for a generalized non-Hermitian (EPS_GNHEP) EVP?
>>>>> >
>>>>> > Thanks,
>>>>> >
>>>>> > Anthony
>>>>> >
>>>>> > On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <jroman at dsic.upv.es>
>>>>> wrote:
>>>>> >
>>>>> > El 07/07/2015, a las 02:33, Anthony Haas escribió:
>>>>> >
>>>>> > > Hi,
>>>>> > >
>>>>> > > I am computing eigenvalues using PETSc/SLEPc and superlu_dist for
>>>>> the LU decomposition (my problem is a generalized eigenvalue problem). The
>>>>> code runs fine for a grid with 101x101 but when I increase to 151x151, I
>>>>> get the following error:
>>>>> > >
>>>>> > > Can't expand MemType 1: jcol 16104   (and then [NID 00037]
>>>>> 2015-07-06 19:19:17 Apid 31025976: OOM killer terminated this process.)
>>>>> > >
>>>>> > > It seems to be a memory problem. I monitor the memory usage as far
>>>>> as I can and it seems that memory usage is pretty low. The most memory
>>>>> intensive part of the program is probably the LU decomposition in the
>>>>> context of the generalized EVP. Is there a way to evaluate how much memory
>>>>> will be required for that step? I am currently running the debug version of
>>>>> the code which I would assume would use more memory?
>>>>> > >
>>>>> > > I have attached the output of the job. Note that the program uses
>>>>> twice PETSc: 1) to solve a linear system for which no problem occurs, and,
>>>>> 2) to solve the Generalized EVP with SLEPc, where I get the error.
>>>>> > >
>>>>> > > Thanks
>>>>> > >
>>>>> > > Anthony
>>>>> > > <test.out-superlu_dist-151x151>
>>>>> >
>>>>> > In the output you are attaching there are no SLEPc objects in the
>>>>> report and SLEPc options are not used. It seems that SLEPc calls are
>>>>> skipped?
>>>>> >
>>>>> > Do you get the same error with MUMPS? Have you tried to solve linear
>>>>> systems with a preconditioned iterative solver?
>>>>> >
>>>>> > Jose
>>>>> >
>>>>> >
>>>>> >
>>>>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150708/d3341a19/attachment-0001.html>


More information about the petsc-users mailing list