[petsc-users] Can't expand MemType 1: jcol 16104

Fri Jul 24 18:18:07 CDT 2015

Hi Sherry,

I ran my code through valgrind and gdb as suggested by Barry. I am now 
coming back to some problem I have had while running with parallel 
symbolic factorization. I am attaching a test matrix (petsc binary 
format) that I LU decompose and then use to solve a linear system (see 
code below). I can run on 2 processors with parsymbfact or with 4 
processors without parsymbfact. However, if I run on 4 procs with 
parsymbfact, the code is just hanging. Below is the simplified test case 
that I have used to test. The matrix A and B are built somewhere else in 
my program. The matrix I am attaching is A-sigma*B (see below).

One thing is that I don't know for sparse matrices what is the optimum 
number of processors to use for a LU decomposition? Does it depend on 
the total number of nonzero? Do you have an easy way to compute it?

Thanks,

Anthony

      Subroutine HowBigLUCanBe(rank)

       IMPLICIT NONE

       integer(i4b),intent(in) :: rank
       integer(i4b)            :: i,ct
       real(dp)                :: begin,endd
       complex(dpc)            :: sigma

       PetscErrorCode ierr

       if (rank==0) call cpu_time(begin)

       if (rank==0) then
          write(*,*)
          write(*,*)'Testing How Big LU Can Be...'
          write(*,*)'============================'
          write(*,*)
       endif

       sigma = (1.0d0,0.0d0)
       call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit 
A = A-sigma*B

!.....Write Matrix to ASCII and Binary Format
       !call PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)
       !call MatView(DXX,viewer,ierr)
       !call PetscViewerDestroy(viewer,ierr)

       call 
PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)
       call MatView(A,viewer,ierr)
       call PetscViewerDestroy(viewer,ierr)

!.....Create Linear Solver Context
       call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)

!.....Set operators. Here the matrix that defines the linear system also 
serves as the preconditioning matrix.
       !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) 
!aha commented and replaced by next line
       call KSPSetOperators(ksp,A,A,ierr) ! remember: here A = A-sigma*B

!.....Set Relative and Absolute Tolerances and Uses Default for 
Divergence Tol
       tol = 1.e-10
       call 
KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)

!.....Set the Direct (LU) Solver
       call KSPSetType(ksp,KSPPREONLY,ierr)
       call KSPGetPC(ksp,pc,ierr)
       call PCSetType(pc,PCLU,ierr)
       call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) ! 
MATSOLVERSUPERLU_DIST MATSOLVERMUMPS

!.....Create Right-Hand-Side Vector
       call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)
       call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)

       allocate(xwork1(IendA-IstartA))
       allocate(loc(IendA-IstartA))

       ct=0
       do i=IstartA,IendA-1
          ct=ct+1
          loc(ct)=i
          xwork1(ct)=(1.0d0,0.0d0)
       enddo

       call VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)
       call VecZeroEntries(sol,ierr)

       deallocate(xwork1,loc)

!.....Assemble Vectors
       call VecAssemblyBegin(frhs,ierr)
       call VecAssemblyEnd(frhs,ierr)

!.....Solve the Linear System
       call KSPSolve(ksp,frhs,sol,ierr)

       !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)

       if (rank==0) then
          call cpu_time(endd)
          write(*,*)
          print '("Total time for HowBigLUCanBe = ",f21.3," 
seconds.")',endd-begin
       endif

       call SlepcFinalize(ierr)

       STOP

     end Subroutine HowBigLUCanBe

On 07/08/2015 11:23 AM, Xiaoye S. Li wrote:
> Indeed, the parallel symbolic factorization routine needs power of 2 
> processes, however, you can use however many processes you need;  
> internally, we redistribute matrix to nearest power of 2 processes, do 
> symbolic, then redistribute back to all the processes to do 
> factorization, triangular solve etc.  So, there is no  restriction 
> from the users viewpoint.
>
> It's difficult to tell what the problem is.  Do you think you can 
> print your matrix, then, I can do some debugging by running 
> superlu_dist standalone?
>
> Sherry
>
>
> On Wed, Jul 8, 2015 at 10:34 AM, Anthony Paul Haas 
> <aph at email.arizona.edu <mailto:aph at email.arizona.edu>> wrote:
>
>     Hi,
>
>     I have used the switch -mat_superlu_dist_parsymbfact in my pbs
>     script. However, although my program worked fine with sequential
>     symbolic factorization, I get one of the following 2 behaviors
>     when I run with parallel symbolic factorization (depending on the
>     number of processors that I use):
>
>     1) the program just hangs (it seems stuck in some subroutine ==>
>     see test.out-hangs)
>     2) I get a floating point exception ==> see
>     test.out-floating-point-exception
>
>     Note that as suggested in the Superlu manual, I use a power of 2
>     number of procs. Are there any tunable parameters for the parallel
>     symbolic factorization? Note that when I build my sparse matrix,
>     most elements I add are nonzero of course but to simplify the
>     programming, I also add a few zero elements in the sparse matrix.
>     I was thinking that maybe if the parallel symbolic factorization
>     proceed by block, there could be some blocks where the pivot would
>     be zero, hence creating the FPE??
>
>     Thanks,
>
>     Anthony
>
>
>
>     On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <xsli at lbl.gov
>     <mailto:xsli at lbl.gov>> wrote:
>
>         Did you find out how to change option to use parallel symbolic
>         factorization?  Perhaps PETSc team can help.
>
>         Sherry
>
>
>         On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <xsli at lbl.gov
>         <mailto:xsli at lbl.gov>> wrote:
>
>             Is there an inquiry function that tells you all the
>             available options?
>
>             Sherry
>
>             On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas
>             <aph at email.arizona.edu <mailto:aph at email.arizona.edu>> wrote:
>
>                 Hi Sherry,
>
>                 Thanks for your message. I have used superlu_dist
>                 default options. I did not realize that I was doing
>                 serial symbolic factorization. That is probably the
>                 cause of my problem.
>                 Each node on Garnet has 60GB usable memory and I can
>                 run with 1,2,4,8,16 or 32 core per node.
>
>                 So I should use:
>
>                 -mat_superlu_dist_r 20
>                 -mat_superlu_dist_c 32*
>
>                 *
>                 How do you specify the parallel symbolic factorization
>                 option? is it -mat_superlu_dist_matinput 1*
>
>                 *
>                 Thanks,
>
>                 Anthony
>
>
>                 On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li
>                 <xsli at lbl.gov <mailto:xsli at lbl.gov>> wrote:
>
>                     For superlu_dist failure, this occurs during
>                     symbolic factorization. Since you are using serial
>                     symbolic factorization, it requires the entire
>                     graph of A to be available in the memory of one
>                     MPI task. How much memory do you have for each MPI
>                     task?
>
>                     It won't help even if you use more processes.  You
>                     should try to use parallel symbolic factorization
>                     option.
>
>                     Another point.  You set up process grid as:
>                            Process grid nprow 32 x npcol 20
>                     For better performance, you show swap the grid
>                     dimension. That is, it's better to use 20 x 32,
>                     never gives nprow larger than npcol.
>
>
>                     Sherry
>
>
>                     On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith
>                     <bsmith at mcs.anl.gov <mailto:bsmith at mcs.anl.gov>>
>                     wrote:
>
>
>                            I would suggest running a sequence of
>                         problems, 101 by 101 111 by 111 etc and get
>                         the memory usage in each case (when you run
>                         out of memory you can get NO useful
>                         information out about memory needs). You can
>                         then plot memory usage as a function of
>                         problem size to get a handle on how much
>                         memory it is using.  You can also run on more
>                         and more processes (which have a total of more
>                         memory) to see how large a problem you may be
>                         able to reach.
>
>                            MUMPS also has an "out of core" version
>                         (which we have never used) that could in
>                         theory anyways let you get to large problems
>                         if you have lots of disk space, but you are on
>                         your own figuring out how to use it.
>
>                           Barry
>
>                         > On Jul 7, 2015, at 2:37 PM, Anthony Paul
>                         Haas <aph at email.arizona.edu
>                         <mailto:aph at email.arizona.edu>> wrote:
>                         >
>                         > Hi Jose,
>                         >
>                         > In my code, I use once PETSc to solve a
>                         linear system to get the baseflow (without
>                         using SLEPc) and then I use SLEPc to do the
>                         stability analysis of that baseflow. This is
>                         why, there are some SLEPc options that are not
>                         used in test.out-superlu_dist-151x151 (when I
>                         am solving for the baseflow with PETSc only).
>                         I have attached a 101x101 case for which I get
>                         the eigenvalues. That case works fine. However
>                         If i increase to 151x151, I get the error that
>                         you can see in test.out-superlu_dist-151x151
>                         (similar error with mumps: see
>                         test.out-mumps-151x151 line 2918 ). If you
>                         look a the very end of the files
>                         test.out-superlu_dist-151x151 and
>                         test.out-mumps-151x151, you will see that the
>                         last info message printed is:
>                         >
>                         > On Processor (after EPSSetFromOptions) 0   
>                         memory: 0.65073152000E+08 =====> (see line 807
>                         of module_petsc.F90)
>                         >
>                         > This means that the memory error probably
>                         occurs in the call to EPSSolve (see
>                         module_petsc.F90 line 810). I would like to
>                         evaluate how much memory is required by the
>                         most memory intensive operation within
>                         EPSSolve. Since I am solving a generalized
>                         EVP, I would imagine that it would be the LU
>                         decomposition. But is there an accurate way of
>                         doing it?
>                         >
>                         > Before starting with iterative solvers, I
>                         would like to exploit as much as I can direct
>                         solvers. I tried GMRES with default
>                         preconditioner at some point but I had
>                         convergence problem. What
>                         solver/preconditioner would you recommend for
>                         a generalized non-Hermitian (EPS_GNHEP) EVP?
>                         >
>                         > Thanks,
>                         >
>                         > Anthony
>                         >
>                         > On Tue, Jul 7, 2015 at 12:17 AM, Jose E.
>                         Roman <jroman at dsic.upv.es
>                         <mailto:jroman at dsic.upv.es>> wrote:
>                         >
>                         > El 07/07/2015, a las 02:33, Anthony Haas
>                         escribió:
>                         >
>                         > > Hi,
>                         > >
>                         > > I am computing eigenvalues using
>                         PETSc/SLEPc and superlu_dist for the LU
>                         decomposition (my problem is a generalized
>                         eigenvalue problem). The code runs fine for a
>                         grid with 101x101 but when I increase to
>                         151x151, I get the following error:
>                         > >
>                         > > Can't expand MemType 1: jcol 16104  (and
>                         then [NID 00037] 2015-07-06 19:19:17 Apid
>                         31025976: OOM killer terminated this process.)
>                         > >
>                         > > It seems to be a memory problem. I monitor
>                         the memory usage as far as I can and it seems
>                         that memory usage is pretty low. The most
>                         memory intensive part of the program is
>                         probably the LU decomposition in the context
>                         of the generalized EVP. Is there a way to
>                         evaluate how much memory will be required for
>                         that step? I am currently running the debug
>                         version of the code which I would assume would
>                         use more memory?
>                         > >
>                         > > I have attached the output of the job.
>                         Note that the program uses twice PETSc: 1) to
>                         solve a linear system for which no problem
>                         occurs, and, 2) to solve the Generalized EVP
>                         with SLEPc, where I get the error.
>                         > >
>                         > > Thanks
>                         > >
>                         > > Anthony
>                         > > <test.out-superlu_dist-151x151>
>                         >
>                         > In the output you are attaching there are no
>                         SLEPc objects in the report and SLEPc options
>                         are not used. It seems that SLEPc calls are
>                         skipped?
>                         >
>                         > Do you get the same error with MUMPS? Have
>                         you tried to solve linear systems with a
>                         preconditioned iterative solver?
>                         >
>                         > Jose
>                         >
>                         >
>                         >
>                         <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
>
>
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150724/2b59b9d5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Amat_binary.m
Type: text/x-objcsrc
Size: 7906356 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150724/2b59b9d5/attachment-0001.bin>
-------------- next part --------------
-matload_block_size 1