[petsc-users] Can't expand MemType 1: jcol 16104

Fri Jul 24 21:56:22 CDT 2015

Anthony:
I test your Amat_binary.m
using petsc/src/ksp/ksp/examples/tutorials/ex10.c.
Your matrix has many zero rows:
./ex10 -f0 ~/tmp/Amat_binary.m -rhs 0 -mat_view |more
Mat Object: 1 MPI processes
  type: seqaij
row 0: (0, 1)
row 1: (1, 0)
row 2: (2, 1)
row 3: (3, 0)
row 4: (4, 1)
row 5: (5, 0)
row 6: (6, 1)
row 7: (7, 0)
row 8: (8, 1)
row 9: (9, 0)
...
row 36: (1, 1)  (35, 0)  (36, 1)  (37, 0)  (38, 1)  (39, 0)  (40, 1)  (41,
0)  (42, 1)  (43, 0)  (44, 1)  (45,
0)  (46, 1)  (47, 0)  (48, 1)  (49, 0)  (50, 1)  (51, 0)  (52, 1)  (53, 0)
 (54, 1)  (55, 0)  (56, 1)  (57, 0)
 (58, 1)  (59, 0)  (60, 1)  ...

Do you send us correct matrix?

>
> I ran my code through valgrind and gdb as suggested by Barry. I am now
> coming back to some problem I have had while running with parallel symbolic
> factorization. I am attaching a test matrix (petsc binary format) that I LU
> decompose and then use to solve a linear system (see code below). I can run
> on 2 processors with parsymbfact or with 4 processors without parsymbfact.
> However, if I run on 4 procs with parsymbfact, the code is just hanging.
> Below is the simplified test case that I have used to test. The matrix A
> and B are built somewhere else in my program. The matrix I am attaching is
> A-sigma*B (see below).
>
> One thing is that I don't know for sparse matrices what is the optimum
> number of processors to use for a LU decomposition? Does it depend on the
> total number of nonzero? Do you have an easy way to compute it?
>

You have to experiment your matrix on a target machine to find out.

Hong

>
>
>
>      Subroutine HowBigLUCanBe(rank)
>
>       IMPLICIT NONE
>
>       integer(i4b),intent(in) :: rank
>       integer(i4b)            :: i,ct
>       real(dp)                :: begin,endd
>       complex(dpc)            :: sigma
>
>       PetscErrorCode ierr
>
>
>       if (rank==0) call cpu_time(begin)
>
>       if (rank==0) then
>          write(*,*)
>          write(*,*)'Testing How Big LU Can Be...'
>          write(*,*)'============================'
>          write(*,*)
>       endif
>
>       sigma = (1.0d0,0.0d0)
>       call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit A
> = A-sigma*B
>
> !.....Write Matrix to ASCII and Binary Format
>       !call PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)
>       !call MatView(DXX,viewer,ierr)
>       !call PetscViewerDestroy(viewer,ierr)
>
>       call
> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)
>       call MatView(A,viewer,ierr)
>       call PetscViewerDestroy(viewer,ierr)
>
> !.....Create Linear Solver Context
>       call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>
> !.....Set operators. Here the matrix that defines the linear system also
> serves as the preconditioning matrix.
>       !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha
> commented and replaced by next line
>       call KSPSetOperators(ksp,A,A,ierr) ! remember: here A = A-sigma*B
>
> !.....Set Relative and Absolute Tolerances and Uses Default for Divergence
> Tol
>       tol = 1.e-10
>       call
> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
>
> !.....Set the Direct (LU) Solver
>       call KSPSetType(ksp,KSPPREONLY,ierr)
>       call KSPGetPC(ksp,pc,ierr)
>       call PCSetType(pc,PCLU,ierr)
>       call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) !
> MATSOLVERSUPERLU_DIST MATSOLVERMUMPS
>
> !.....Create Right-Hand-Side Vector
>       call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)
>       call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)
>
>       allocate(xwork1(IendA-IstartA))
>       allocate(loc(IendA-IstartA))
>
>       ct=0
>       do i=IstartA,IendA-1
>          ct=ct+1
>          loc(ct)=i
>          xwork1(ct)=(1.0d0,0.0d0)
>       enddo
>
>       call VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)
>       call VecZeroEntries(sol,ierr)
>
>       deallocate(xwork1,loc)
>
> !.....Assemble Vectors
>       call VecAssemblyBegin(frhs,ierr)
>       call VecAssemblyEnd(frhs,ierr)
>
> !.....Solve the Linear System
>       call KSPSolve(ksp,frhs,sol,ierr)
>
>       !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)
>
>       if (rank==0) then
>          call cpu_time(endd)
>          write(*,*)
>          print '("Total time for HowBigLUCanBe = ",f21.3,"
> seconds.")',endd-begin
>       endif
>
>       call SlepcFinalize(ierr)
>
>       STOP
>
>
>     end Subroutine HowBigLUCanBe
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 07/08/2015 11:23 AM, Xiaoye S. Li wrote:
>
>  Indeed, the parallel symbolic factorization routine needs power of 2
> processes, however, you can use however many processes you need;
> internally, we redistribute matrix to nearest power of 2 processes, do
> symbolic, then redistribute back to all the processes to do factorization,
> triangular solve etc.  So, there is no  restriction from the users
> viewpoint.
>
>  It's difficult to tell what the problem is.  Do you think you can print
> your matrix, then, I can do some debugging by running superlu_dist
> standalone?
>
>  Sherry
>
>
> On Wed, Jul 8, 2015 at 10:34 AM, Anthony Paul Haas <aph at email.arizona.edu>
> wrote:
>
>>   Hi,
>>
>>  I have used the switch -mat_superlu_dist_parsymbfact in my pbs script.
>> However, although my program worked fine with sequential symbolic
>> factorization, I get one of the following 2 behaviors when I run with
>> parallel symbolic factorization (depending on the number of processors that
>> I use):
>>
>>  1) the program just hangs (it seems stuck in some subroutine ==> see
>> test.out-hangs)
>>  2) I get a floating point exception ==> see
>> test.out-floating-point-exception
>>
>>  Note that as suggested in the Superlu manual, I use a power of 2 number
>> of procs. Are there any tunable parameters for the parallel symbolic
>> factorization? Note that when I build my sparse matrix, most elements I add
>> are nonzero of course but to simplify the programming, I also add a few
>> zero elements in the sparse matrix. I was thinking that maybe if the
>> parallel symbolic factorization proceed by block, there could be some
>> blocks where the pivot would be zero, hence creating the FPE??
>>
>>  Thanks,
>>
>>  Anthony
>>
>>
>>
>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>>
>>>  Did you find out how to change option to use parallel symbolic
>>> factorization?  Perhaps PETSc team can help.
>>>
>>>  Sherry
>>>
>>>
>>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>>>
>>>>  Is there an inquiry function that tells you all the available options?
>>>>
>>>>  Sherry
>>>>
>>>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas <
>>>> aph at email.arizona.edu> wrote:
>>>>
>>>>>    Hi Sherry,
>>>>>
>>>>>  Thanks for your message. I have used superlu_dist default options. I
>>>>> did not realize that I was doing serial symbolic factorization. That is
>>>>> probably the cause of my problem.
>>>>>  Each node on Garnet has 60GB usable memory and I can run with
>>>>> 1,2,4,8,16 or 32 core per node.
>>>>>
>>>>>  So I should use:
>>>>>
>>>>> -mat_superlu_dist_r 20
>>>>> -mat_superlu_dist_c 32
>>>>>
>>>>>  How do you specify the parallel symbolic factorization option? is it
>>>>> -mat_superlu_dist_matinput 1
>>>>>
>>>>>  Thanks,
>>>>>
>>>>>  Anthony
>>>>>
>>>>>
>>>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>>>>>
>>>>>>  For superlu_dist failure, this occurs during symbolic
>>>>>> factorization.  Since you are using serial symbolic factorization, it
>>>>>> requires the entire graph of A to be available in the memory of one MPI
>>>>>> task. How much memory do you have for each MPI task?
>>>>>>
>>>>>>  It won't help even if you use more processes.  You should try to
>>>>>> use parallel symbolic factorization option.
>>>>>>
>>>>>>  Another point.  You set up process grid as:
>>>>>>        Process grid nprow 32 x npcol 20
>>>>>>  For better performance, you show swap the grid dimension. That is,
>>>>>> it's better to use 20 x 32, never gives nprow larger than npcol.
>>>>>>
>>>>>>
>>>>>>  Sherry
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>    I would suggest running a sequence of problems, 101 by 101 111 by
>>>>>>> 111 etc and get the memory usage in each case (when you run out of memory
>>>>>>> you can get NO useful information out about memory needs). You can then
>>>>>>> plot memory usage as a function of problem size to get a handle on how much
>>>>>>> memory it is using.  You can also run on more and more processes (which
>>>>>>> have a total of more memory) to see how large a problem you may be able to
>>>>>>> reach.
>>>>>>>
>>>>>>>    MUMPS also has an "out of core" version (which we have never
>>>>>>> used) that could in theory anyways let you get to large problems if you
>>>>>>> have lots of disk space, but you are on your own figuring out how to use it.
>>>>>>>
>>>>>>>   Barry
>>>>>>>
>>>>>>> > On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <
>>>>>>> aph at email.arizona.edu> wrote:
>>>>>>> >
>>>>>>> > Hi Jose,
>>>>>>> >
>>>>>>> > In my code, I use once PETSc to solve a linear system to get the
>>>>>>> baseflow (without using SLEPc) and then I use SLEPc to do the stability
>>>>>>> analysis of that baseflow. This is why, there are some SLEPc options that
>>>>>>> are not used in test.out-superlu_dist-151x151 (when I am solving for the
>>>>>>> baseflow with PETSc only). I have attached a 101x101 case for which I get
>>>>>>> the eigenvalues. That case works fine. However If i increase to 151x151, I
>>>>>>> get the error that you can see in test.out-superlu_dist-151x151 (similar
>>>>>>> error with mumps: see test.out-mumps-151x151 line 2918 ). If you look a the
>>>>>>> very end of the files test.out-superlu_dist-151x151 and
>>>>>>> test.out-mumps-151x151, you will see that the last info message printed is:
>>>>>>> >
>>>>>>> > On Processor (after EPSSetFromOptions)  0    memory:
>>>>>>> 0.65073152000E+08          =====>  (see line 807 of module_petsc.F90)
>>>>>>> >
>>>>>>> > This means that the memory error probably occurs in the call to
>>>>>>> EPSSolve (see module_petsc.F90 line 810). I would like to evaluate how much
>>>>>>> memory is required by the most memory intensive operation within EPSSolve.
>>>>>>> Since I am solving a generalized EVP, I would imagine that it would be the
>>>>>>> LU decomposition. But is there an accurate way of doing it?
>>>>>>> >
>>>>>>> > Before starting with iterative solvers, I would like to exploit as
>>>>>>> much as I can direct solvers. I tried GMRES with default preconditioner at
>>>>>>> some point but I had convergence problem. What solver/preconditioner would
>>>>>>> you recommend for a generalized non-Hermitian (EPS_GNHEP) EVP?
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> >
>>>>>>> > Anthony
>>>>>>> >
>>>>>>> > On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <jroman at dsic.upv.es>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > El 07/07/2015, a las 02:33, Anthony Haas escribió:
>>>>>>> >
>>>>>>> > > Hi,
>>>>>>> > >
>>>>>>> > > I am computing eigenvalues using PETSc/SLEPc and superlu_dist
>>>>>>> for the LU decomposition (my problem is a generalized eigenvalue problem).
>>>>>>> The code runs fine for a grid with 101x101 but when I increase to 151x151,
>>>>>>> I get the following error:
>>>>>>> > >
>>>>>>> > > Can't expand MemType 1: jcol 16104   (and then [NID 00037]
>>>>>>> 2015-07-06 19:19:17 Apid 31025976: OOM killer terminated this process.)
>>>>>>> > >
>>>>>>> > > It seems to be a memory problem. I monitor the memory usage as
>>>>>>> far as I can and it seems that memory usage is pretty low. The most memory
>>>>>>> intensive part of the program is probably the LU decomposition in the
>>>>>>> context of the generalized EVP. Is there a way to evaluate how much memory
>>>>>>> will be required for that step? I am currently running the debug version of
>>>>>>> the code which I would assume would use more memory?
>>>>>>> > >
>>>>>>> > > I have attached the output of the job. Note that the program
>>>>>>> uses twice PETSc: 1) to solve a linear system for which no problem occurs,
>>>>>>> and, 2) to solve the Generalized EVP with SLEPc, where I get the error.
>>>>>>> > >
>>>>>>> > > Thanks
>>>>>>> > >
>>>>>>> > > Anthony
>>>>>>> > > <test.out-superlu_dist-151x151>
>>>>>>> >
>>>>>>> > In the output you are attaching there are no SLEPc objects in the
>>>>>>> report and SLEPc options are not used. It seems that SLEPc calls are
>>>>>>> skipped?
>>>>>>> >
>>>>>>> > Do you get the same error with MUMPS? Have you tried to solve
>>>>>>> linear systems with a preconditioned iterative solver?
>>>>>>> >
>>>>>>> > Jose
>>>>>>> >
>>>>>>> >
>>>>>>>  >
>>>>>>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150724/ebc882f3/attachment-0001.html>