[petsc-users] Can't expand MemType 1: jcol 16104
Anthony Haas
aph at email.arizona.edu
Fri Jul 24 18:18:07 CDT 2015
Hi Sherry,
I ran my code through valgrind and gdb as suggested by Barry. I am now
coming back to some problem I have had while running with parallel
symbolic factorization. I am attaching a test matrix (petsc binary
format) that I LU decompose and then use to solve a linear system (see
code below). I can run on 2 processors with parsymbfact or with 4
processors without parsymbfact. However, if I run on 4 procs with
parsymbfact, the code is just hanging. Below is the simplified test case
that I have used to test. The matrix A and B are built somewhere else in
my program. The matrix I am attaching is A-sigma*B (see below).
One thing is that I don't know for sparse matrices what is the optimum
number of processors to use for a LU decomposition? Does it depend on
the total number of nonzero? Do you have an easy way to compute it?
Thanks,
Anthony
Subroutine HowBigLUCanBe(rank)
IMPLICIT NONE
integer(i4b),intent(in) :: rank
integer(i4b) :: i,ct
real(dp) :: begin,endd
complex(dpc) :: sigma
PetscErrorCode ierr
if (rank==0) call cpu_time(begin)
if (rank==0) then
write(*,*)
write(*,*)'Testing How Big LU Can Be...'
write(*,*)'============================'
write(*,*)
endif
sigma = (1.0d0,0.0d0)
call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit
A = A-sigma*B
!.....Write Matrix to ASCII and Binary Format
!call PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)
!call MatView(DXX,viewer,ierr)
!call PetscViewerDestroy(viewer,ierr)
call
PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)
call MatView(A,viewer,ierr)
call PetscViewerDestroy(viewer,ierr)
!.....Create Linear Solver Context
call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
!.....Set operators. Here the matrix that defines the linear system also
serves as the preconditioning matrix.
!call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)
!aha commented and replaced by next line
call KSPSetOperators(ksp,A,A,ierr) ! remember: here A = A-sigma*B
!.....Set Relative and Absolute Tolerances and Uses Default for
Divergence Tol
tol = 1.e-10
call
KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
!.....Set the Direct (LU) Solver
call KSPSetType(ksp,KSPPREONLY,ierr)
call KSPGetPC(ksp,pc,ierr)
call PCSetType(pc,PCLU,ierr)
call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) !
MATSOLVERSUPERLU_DIST MATSOLVERMUMPS
!.....Create Right-Hand-Side Vector
call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)
call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)
allocate(xwork1(IendA-IstartA))
allocate(loc(IendA-IstartA))
ct=0
do i=IstartA,IendA-1
ct=ct+1
loc(ct)=i
xwork1(ct)=(1.0d0,0.0d0)
enddo
call VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)
call VecZeroEntries(sol,ierr)
deallocate(xwork1,loc)
!.....Assemble Vectors
call VecAssemblyBegin(frhs,ierr)
call VecAssemblyEnd(frhs,ierr)
!.....Solve the Linear System
call KSPSolve(ksp,frhs,sol,ierr)
!call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)
if (rank==0) then
call cpu_time(endd)
write(*,*)
print '("Total time for HowBigLUCanBe = ",f21.3,"
seconds.")',endd-begin
endif
call SlepcFinalize(ierr)
STOP
end Subroutine HowBigLUCanBe
On 07/08/2015 11:23 AM, Xiaoye S. Li wrote:
> Indeed, the parallel symbolic factorization routine needs power of 2
> processes, however, you can use however many processes you need;
> internally, we redistribute matrix to nearest power of 2 processes, do
> symbolic, then redistribute back to all the processes to do
> factorization, triangular solve etc. So, there is no restriction
> from the users viewpoint.
>
> It's difficult to tell what the problem is. Do you think you can
> print your matrix, then, I can do some debugging by running
> superlu_dist standalone?
>
> Sherry
>
>
> On Wed, Jul 8, 2015 at 10:34 AM, Anthony Paul Haas
> <aph at email.arizona.edu <mailto:aph at email.arizona.edu>> wrote:
>
> Hi,
>
> I have used the switch -mat_superlu_dist_parsymbfact in my pbs
> script. However, although my program worked fine with sequential
> symbolic factorization, I get one of the following 2 behaviors
> when I run with parallel symbolic factorization (depending on the
> number of processors that I use):
>
> 1) the program just hangs (it seems stuck in some subroutine ==>
> see test.out-hangs)
> 2) I get a floating point exception ==> see
> test.out-floating-point-exception
>
> Note that as suggested in the Superlu manual, I use a power of 2
> number of procs. Are there any tunable parameters for the parallel
> symbolic factorization? Note that when I build my sparse matrix,
> most elements I add are nonzero of course but to simplify the
> programming, I also add a few zero elements in the sparse matrix.
> I was thinking that maybe if the parallel symbolic factorization
> proceed by block, there could be some blocks where the pivot would
> be zero, hence creating the FPE??
>
> Thanks,
>
> Anthony
>
>
>
> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <xsli at lbl.gov
> <mailto:xsli at lbl.gov>> wrote:
>
> Did you find out how to change option to use parallel symbolic
> factorization? Perhaps PETSc team can help.
>
> Sherry
>
>
> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <xsli at lbl.gov
> <mailto:xsli at lbl.gov>> wrote:
>
> Is there an inquiry function that tells you all the
> available options?
>
> Sherry
>
> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas
> <aph at email.arizona.edu <mailto:aph at email.arizona.edu>> wrote:
>
> Hi Sherry,
>
> Thanks for your message. I have used superlu_dist
> default options. I did not realize that I was doing
> serial symbolic factorization. That is probably the
> cause of my problem.
> Each node on Garnet has 60GB usable memory and I can
> run with 1,2,4,8,16 or 32 core per node.
>
> So I should use:
>
> -mat_superlu_dist_r 20
> -mat_superlu_dist_c 32*
>
> *
> How do you specify the parallel symbolic factorization
> option? is it -mat_superlu_dist_matinput 1*
>
> *
> Thanks,
>
> Anthony
>
>
> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li
> <xsli at lbl.gov <mailto:xsli at lbl.gov>> wrote:
>
> For superlu_dist failure, this occurs during
> symbolic factorization. Since you are using serial
> symbolic factorization, it requires the entire
> graph of A to be available in the memory of one
> MPI task. How much memory do you have for each MPI
> task?
>
> It won't help even if you use more processes. You
> should try to use parallel symbolic factorization
> option.
>
> Another point. You set up process grid as:
> Process grid nprow 32 x npcol 20
> For better performance, you show swap the grid
> dimension. That is, it's better to use 20 x 32,
> never gives nprow larger than npcol.
>
>
> Sherry
>
>
> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith
> <bsmith at mcs.anl.gov <mailto:bsmith at mcs.anl.gov>>
> wrote:
>
>
> I would suggest running a sequence of
> problems, 101 by 101 111 by 111 etc and get
> the memory usage in each case (when you run
> out of memory you can get NO useful
> information out about memory needs). You can
> then plot memory usage as a function of
> problem size to get a handle on how much
> memory it is using. You can also run on more
> and more processes (which have a total of more
> memory) to see how large a problem you may be
> able to reach.
>
> MUMPS also has an "out of core" version
> (which we have never used) that could in
> theory anyways let you get to large problems
> if you have lots of disk space, but you are on
> your own figuring out how to use it.
>
> Barry
>
> > On Jul 7, 2015, at 2:37 PM, Anthony Paul
> Haas <aph at email.arizona.edu
> <mailto:aph at email.arizona.edu>> wrote:
> >
> > Hi Jose,
> >
> > In my code, I use once PETSc to solve a
> linear system to get the baseflow (without
> using SLEPc) and then I use SLEPc to do the
> stability analysis of that baseflow. This is
> why, there are some SLEPc options that are not
> used in test.out-superlu_dist-151x151 (when I
> am solving for the baseflow with PETSc only).
> I have attached a 101x101 case for which I get
> the eigenvalues. That case works fine. However
> If i increase to 151x151, I get the error that
> you can see in test.out-superlu_dist-151x151
> (similar error with mumps: see
> test.out-mumps-151x151 line 2918 ). If you
> look a the very end of the files
> test.out-superlu_dist-151x151 and
> test.out-mumps-151x151, you will see that the
> last info message printed is:
> >
> > On Processor (after EPSSetFromOptions) 0
> memory: 0.65073152000E+08 =====> (see line 807
> of module_petsc.F90)
> >
> > This means that the memory error probably
> occurs in the call to EPSSolve (see
> module_petsc.F90 line 810). I would like to
> evaluate how much memory is required by the
> most memory intensive operation within
> EPSSolve. Since I am solving a generalized
> EVP, I would imagine that it would be the LU
> decomposition. But is there an accurate way of
> doing it?
> >
> > Before starting with iterative solvers, I
> would like to exploit as much as I can direct
> solvers. I tried GMRES with default
> preconditioner at some point but I had
> convergence problem. What
> solver/preconditioner would you recommend for
> a generalized non-Hermitian (EPS_GNHEP) EVP?
> >
> > Thanks,
> >
> > Anthony
> >
> > On Tue, Jul 7, 2015 at 12:17 AM, Jose E.
> Roman <jroman at dsic.upv.es
> <mailto:jroman at dsic.upv.es>> wrote:
> >
> > El 07/07/2015, a las 02:33, Anthony Haas
> escribió:
> >
> > > Hi,
> > >
> > > I am computing eigenvalues using
> PETSc/SLEPc and superlu_dist for the LU
> decomposition (my problem is a generalized
> eigenvalue problem). The code runs fine for a
> grid with 101x101 but when I increase to
> 151x151, I get the following error:
> > >
> > > Can't expand MemType 1: jcol 16104 (and
> then [NID 00037] 2015-07-06 19:19:17 Apid
> 31025976: OOM killer terminated this process.)
> > >
> > > It seems to be a memory problem. I monitor
> the memory usage as far as I can and it seems
> that memory usage is pretty low. The most
> memory intensive part of the program is
> probably the LU decomposition in the context
> of the generalized EVP. Is there a way to
> evaluate how much memory will be required for
> that step? I am currently running the debug
> version of the code which I would assume would
> use more memory?
> > >
> > > I have attached the output of the job.
> Note that the program uses twice PETSc: 1) to
> solve a linear system for which no problem
> occurs, and, 2) to solve the Generalized EVP
> with SLEPc, where I get the error.
> > >
> > > Thanks
> > >
> > > Anthony
> > > <test.out-superlu_dist-151x151>
> >
> > In the output you are attaching there are no
> SLEPc objects in the report and SLEPc options
> are not used. It seems that SLEPc calls are
> skipped?
> >
> > Do you get the same error with MUMPS? Have
> you tried to solve linear systems with a
> preconditioned iterative solver?
> >
> > Jose
> >
> >
> >
> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150724/2b59b9d5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Amat_binary.m
Type: text/x-objcsrc
Size: 7906356 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150724/2b59b9d5/attachment-0001.bin>
-------------- next part --------------
-matload_block_size 1
More information about the petsc-users
mailing list