[petsc-users] Can't expand MemType 1: jcol 16104
Anthony Paul Haas
aph at email.arizona.edu
Mon Jul 27 13:25:30 CDT 2015
Hi Hong,
No that is not the correct matrix. Note that I forgot to mention that it is
a complex matrix. I tried loading the matrix I sent you this morning with:
!...Load a Matrix in Binary Format
call
PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr)
call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr)
call MatSetType(DLOAD,MATAIJ,ierr)
call MatLoad(DLOAD,viewer,ierr)
call PetscViewerDestroy(viewer,ierr)
call MatView(DLOAD,PETSC_VIEWER_STDOUT_WORLD,ierr)
The first 37 rows should look like this:
Mat Object: 2 MPI processes
type: mpiaij
row 0: (0, 1)
row 1: (1, 1)
row 2: (2, 1)
row 3: (3, 1)
row 4: (4, 1)
row 5: (5, 1)
row 6: (6, 1)
row 7: (7, 1)
row 8: (8, 1)
row 9: (9, 1)
row 10: (10, 1)
row 11: (11, 1)
row 12: (12, 1)
row 13: (13, 1)
row 14: (14, 1)
row 15: (15, 1)
row 16: (16, 1)
row 17: (17, 1)
row 18: (18, 1)
row 19: (19, 1)
row 20: (20, 1)
row 21: (21, 1)
row 22: (22, 1)
row 23: (23, 1)
row 24: (24, 1)
row 25: (25, 1)
row 26: (26, 1)
row 27: (27, 1)
row 28: (28, 1)
row 29: (29, 1)
row 30: (30, 1)
row 31: (31, 1)
row 32: (32, 1)
row 33: (33, 1)
row 34: (34, 1)
row 35: (35, 1)
row 36: (1, -41.2444) (35, -41.2444) (36, 118.049 - 0.999271 i) (37,
-21.447) (38, 5.18873) (39, -2.34856) (40, 1.3607) (41, -0.898206)
(42, 0.642715) (43, -0.48593) (44, 0.382471) (45, -0.310476) (46,
0.258302) (47, -0.219268) (48, 0.189304) (49, -0.165815) (50,
0.147076) (51, -0.131907) (52, 0.119478) (53, -0.109189) (54, 0.1006)
(55, -0.0933795) (56, 0.0872779) (57, -0.0821019) (58, 0.0777011) (59,
-0.0739575) (60, 0.0707775) (61, -0.0680868) (62, 0.0658258) (63,
-0.0639473) (64, 0.0624137) (65, -0.0611954) (66, 0.0602698) (67,
-0.0596202) (68, 0.0592349) (69, -0.0295536) (71, -21.447) (106,
5.18873) (141, -2.34856) (176, 1.3607) (211, -0.898206) (246,
0.642715) (281, -0.48593) (316, 0.382471) (351, -0.310476) (386,
0.258302) (421, -0.219268) (456, 0.189304) (491, -0.165815) (526,
0.147076) (561, -0.131907) (596, 0.119478) (631, -0.109189) (666,
0.1006) (701, -0.0933795) (736, 0.0872779) (771, -0.0821019) (806,
0.0777011) (841, -0.0739575) (876, 0.0707775) (911, -0.0680868) (946,
0.0658258) (981, -0.0639473) (1016, 0.0624137) (1051, -0.0611954)
(1086, 0.0602698) (1121, -0.0596202) (1156, 0.0592349) (1191,
-0.0295536) (1261, 0) (3676, 117.211) (3711, -58.4801) (3746,
-78.3633) (3781, 29.4911) (3816, -15.8073) (3851, 9.94324) (3886,
-6.87205) (3921, 5.05774) (3956, -3.89521) (3991, 3.10522) (4026,
-2.54388) (4061, 2.13082) (4096, -1.8182) (4131, 1.57606) (4166,
-1.38491) (4201, 1.23155) (4236, -1.10685) (4271, 1.00428) (4306,
-0.919116) (4341, 0.847829) (4376, -0.787776) (4411, 0.736933) (4446,
-0.693735) (4481, 0.656958) (4516, -0.625638) (4551, 0.599007) (4586,
-0.576454) (4621, 0.557491) (4656, -0.541726) (4691, 0.528849) (4726,
-0.518617) (4761, 0.51084) (4796, -0.50538) (4831, 0.502142) (4866,
-0.250534)
Thanks,
Anthony
On Fri, Jul 24, 2015 at 7:56 PM, Hong <hzhang at mcs.anl.gov> wrote:
> Anthony:
> I test your Amat_binary.m
> using petsc/src/ksp/ksp/examples/tutorials/ex10.c.
> Your matrix has many zero rows:
> ./ex10 -f0 ~/tmp/Amat_binary.m -rhs 0 -mat_view |more
> Mat Object: 1 MPI processes
> type: seqaij
> row 0: (0, 1)
> row 1: (1, 0)
> row 2: (2, 1)
> row 3: (3, 0)
> row 4: (4, 1)
> row 5: (5, 0)
> row 6: (6, 1)
> row 7: (7, 0)
> row 8: (8, 1)
> row 9: (9, 0)
> ...
> row 36: (1, 1) (35, 0) (36, 1) (37, 0) (38, 1) (39, 0) (40, 1) (41,
> 0) (42, 1) (43, 0) (44, 1) (45,
> 0) (46, 1) (47, 0) (48, 1) (49, 0) (50, 1) (51, 0) (52, 1) (53, 0)
> (54, 1) (55, 0) (56, 1) (57, 0)
> (58, 1) (59, 0) (60, 1) ...
>
> Do you send us correct matrix?
>
>>
>> I ran my code through valgrind and gdb as suggested by Barry. I am now
>> coming back to some problem I have had while running with parallel symbolic
>> factorization. I am attaching a test matrix (petsc binary format) that I LU
>> decompose and then use to solve a linear system (see code below). I can run
>> on 2 processors with parsymbfact or with 4 processors without parsymbfact.
>> However, if I run on 4 procs with parsymbfact, the code is just hanging.
>> Below is the simplified test case that I have used to test. The matrix A
>> and B are built somewhere else in my program. The matrix I am attaching is
>> A-sigma*B (see below).
>>
>> One thing is that I don't know for sparse matrices what is the optimum
>> number of processors to use for a LU decomposition? Does it depend on the
>> total number of nonzero? Do you have an easy way to compute it?
>>
>
> You have to experiment your matrix on a target machine to find out.
>
> Hong
>
>>
>>
>>
>> Subroutine HowBigLUCanBe(rank)
>>
>> IMPLICIT NONE
>>
>> integer(i4b),intent(in) :: rank
>> integer(i4b) :: i,ct
>> real(dp) :: begin,endd
>> complex(dpc) :: sigma
>>
>> PetscErrorCode ierr
>>
>>
>> if (rank==0) call cpu_time(begin)
>>
>> if (rank==0) then
>> write(*,*)
>> write(*,*)'Testing How Big LU Can Be...'
>> write(*,*)'============================'
>> write(*,*)
>> endif
>>
>> sigma = (1.0d0,0.0d0)
>> call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit A
>> = A-sigma*B
>>
>> !.....Write Matrix to ASCII and Binary Format
>> !call PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)
>> !call MatView(DXX,viewer,ierr)
>> !call PetscViewerDestroy(viewer,ierr)
>>
>> call
>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)
>> call MatView(A,viewer,ierr)
>> call PetscViewerDestroy(viewer,ierr)
>>
>> !.....Create Linear Solver Context
>> call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>>
>> !.....Set operators. Here the matrix that defines the linear system also
>> serves as the preconditioning matrix.
>> !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha
>> commented and replaced by next line
>> call KSPSetOperators(ksp,A,A,ierr) ! remember: here A = A-sigma*B
>>
>> !.....Set Relative and Absolute Tolerances and Uses Default for
>> Divergence Tol
>> tol = 1.e-10
>> call
>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
>>
>> !.....Set the Direct (LU) Solver
>> call KSPSetType(ksp,KSPPREONLY,ierr)
>> call KSPGetPC(ksp,pc,ierr)
>> call PCSetType(pc,PCLU,ierr)
>> call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) !
>> MATSOLVERSUPERLU_DIST MATSOLVERMUMPS
>>
>> !.....Create Right-Hand-Side Vector
>> call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)
>> call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)
>>
>> allocate(xwork1(IendA-IstartA))
>> allocate(loc(IendA-IstartA))
>>
>> ct=0
>> do i=IstartA,IendA-1
>> ct=ct+1
>> loc(ct)=i
>> xwork1(ct)=(1.0d0,0.0d0)
>> enddo
>>
>> call VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)
>> call VecZeroEntries(sol,ierr)
>>
>> deallocate(xwork1,loc)
>>
>> !.....Assemble Vectors
>> call VecAssemblyBegin(frhs,ierr)
>> call VecAssemblyEnd(frhs,ierr)
>>
>> !.....Solve the Linear System
>> call KSPSolve(ksp,frhs,sol,ierr)
>>
>> !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>
>> if (rank==0) then
>> call cpu_time(endd)
>> write(*,*)
>> print '("Total time for HowBigLUCanBe = ",f21.3,"
>> seconds.")',endd-begin
>> endif
>>
>> call SlepcFinalize(ierr)
>>
>> STOP
>>
>>
>> end Subroutine HowBigLUCanBe
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 07/08/2015 11:23 AM, Xiaoye S. Li wrote:
>>
>> Indeed, the parallel symbolic factorization routine needs power of 2
>> processes, however, you can use however many processes you need;
>> internally, we redistribute matrix to nearest power of 2 processes, do
>> symbolic, then redistribute back to all the processes to do factorization,
>> triangular solve etc. So, there is no restriction from the users
>> viewpoint.
>>
>> It's difficult to tell what the problem is. Do you think you can print
>> your matrix, then, I can do some debugging by running superlu_dist
>> standalone?
>>
>> Sherry
>>
>>
>> On Wed, Jul 8, 2015 at 10:34 AM, Anthony Paul Haas <aph at email.arizona.edu
>> > wrote:
>>
>>> Hi,
>>>
>>> I have used the switch -mat_superlu_dist_parsymbfact in my pbs script.
>>> However, although my program worked fine with sequential symbolic
>>> factorization, I get one of the following 2 behaviors when I run with
>>> parallel symbolic factorization (depending on the number of processors that
>>> I use):
>>>
>>> 1) the program just hangs (it seems stuck in some subroutine ==> see
>>> test.out-hangs)
>>> 2) I get a floating point exception ==> see
>>> test.out-floating-point-exception
>>>
>>> Note that as suggested in the Superlu manual, I use a power of 2
>>> number of procs. Are there any tunable parameters for the parallel symbolic
>>> factorization? Note that when I build my sparse matrix, most elements I add
>>> are nonzero of course but to simplify the programming, I also add a few
>>> zero elements in the sparse matrix. I was thinking that maybe if the
>>> parallel symbolic factorization proceed by block, there could be some
>>> blocks where the pivot would be zero, hence creating the FPE??
>>>
>>> Thanks,
>>>
>>> Anthony
>>>
>>>
>>>
>>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>>>
>>>> Did you find out how to change option to use parallel symbolic
>>>> factorization? Perhaps PETSc team can help.
>>>>
>>>> Sherry
>>>>
>>>>
>>>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>>>>
>>>>> Is there an inquiry function that tells you all the available
>>>>> options?
>>>>>
>>>>> Sherry
>>>>>
>>>>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas <
>>>>> aph at email.arizona.edu> wrote:
>>>>>
>>>>>> Hi Sherry,
>>>>>>
>>>>>> Thanks for your message. I have used superlu_dist default options.
>>>>>> I did not realize that I was doing serial symbolic factorization. That is
>>>>>> probably the cause of my problem.
>>>>>> Each node on Garnet has 60GB usable memory and I can run with
>>>>>> 1,2,4,8,16 or 32 core per node.
>>>>>>
>>>>>> So I should use:
>>>>>>
>>>>>> -mat_superlu_dist_r 20
>>>>>> -mat_superlu_dist_c 32
>>>>>>
>>>>>> How do you specify the parallel symbolic factorization option? is it
>>>>>> -mat_superlu_dist_matinput 1
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Anthony
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>>>>>>
>>>>>>> For superlu_dist failure, this occurs during symbolic
>>>>>>> factorization. Since you are using serial symbolic factorization, it
>>>>>>> requires the entire graph of A to be available in the memory of one MPI
>>>>>>> task. How much memory do you have for each MPI task?
>>>>>>>
>>>>>>> It won't help even if you use more processes. You should try to
>>>>>>> use parallel symbolic factorization option.
>>>>>>>
>>>>>>> Another point. You set up process grid as:
>>>>>>> Process grid nprow 32 x npcol 20
>>>>>>> For better performance, you show swap the grid dimension. That is,
>>>>>>> it's better to use 20 x 32, never gives nprow larger than npcol.
>>>>>>>
>>>>>>>
>>>>>>> Sherry
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> I would suggest running a sequence of problems, 101 by 101 111
>>>>>>>> by 111 etc and get the memory usage in each case (when you run out of
>>>>>>>> memory you can get NO useful information out about memory needs). You can
>>>>>>>> then plot memory usage as a function of problem size to get a handle on how
>>>>>>>> much memory it is using. You can also run on more and more processes
>>>>>>>> (which have a total of more memory) to see how large a problem you may be
>>>>>>>> able to reach.
>>>>>>>>
>>>>>>>> MUMPS also has an "out of core" version (which we have never
>>>>>>>> used) that could in theory anyways let you get to large problems if you
>>>>>>>> have lots of disk space, but you are on your own figuring out how to use it.
>>>>>>>>
>>>>>>>> Barry
>>>>>>>>
>>>>>>>> > On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <
>>>>>>>> aph at email.arizona.edu> wrote:
>>>>>>>> >
>>>>>>>> > Hi Jose,
>>>>>>>> >
>>>>>>>> > In my code, I use once PETSc to solve a linear system to get the
>>>>>>>> baseflow (without using SLEPc) and then I use SLEPc to do the stability
>>>>>>>> analysis of that baseflow. This is why, there are some SLEPc options that
>>>>>>>> are not used in test.out-superlu_dist-151x151 (when I am solving for the
>>>>>>>> baseflow with PETSc only). I have attached a 101x101 case for which I get
>>>>>>>> the eigenvalues. That case works fine. However If i increase to 151x151, I
>>>>>>>> get the error that you can see in test.out-superlu_dist-151x151 (similar
>>>>>>>> error with mumps: see test.out-mumps-151x151 line 2918 ). If you look a the
>>>>>>>> very end of the files test.out-superlu_dist-151x151 and
>>>>>>>> test.out-mumps-151x151, you will see that the last info message printed is:
>>>>>>>> >
>>>>>>>> > On Processor (after EPSSetFromOptions) 0 memory:
>>>>>>>> 0.65073152000E+08 =====> (see line 807 of module_petsc.F90)
>>>>>>>> >
>>>>>>>> > This means that the memory error probably occurs in the call to
>>>>>>>> EPSSolve (see module_petsc.F90 line 810). I would like to evaluate how much
>>>>>>>> memory is required by the most memory intensive operation within EPSSolve.
>>>>>>>> Since I am solving a generalized EVP, I would imagine that it would be the
>>>>>>>> LU decomposition. But is there an accurate way of doing it?
>>>>>>>> >
>>>>>>>> > Before starting with iterative solvers, I would like to exploit
>>>>>>>> as much as I can direct solvers. I tried GMRES with default preconditioner
>>>>>>>> at some point but I had convergence problem. What solver/preconditioner
>>>>>>>> would you recommend for a generalized non-Hermitian (EPS_GNHEP) EVP?
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> >
>>>>>>>> > Anthony
>>>>>>>> >
>>>>>>>> > On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <
>>>>>>>> jroman at dsic.upv.es> wrote:
>>>>>>>> >
>>>>>>>> > El 07/07/2015, a las 02:33, Anthony Haas escribió:
>>>>>>>> >
>>>>>>>> > > Hi,
>>>>>>>> > >
>>>>>>>> > > I am computing eigenvalues using PETSc/SLEPc and superlu_dist
>>>>>>>> for the LU decomposition (my problem is a generalized eigenvalue problem).
>>>>>>>> The code runs fine for a grid with 101x101 but when I increase to 151x151,
>>>>>>>> I get the following error:
>>>>>>>> > >
>>>>>>>> > > Can't expand MemType 1: jcol 16104 (and then [NID 00037]
>>>>>>>> 2015-07-06 19:19:17 Apid 31025976: OOM killer terminated this process.)
>>>>>>>> > >
>>>>>>>> > > It seems to be a memory problem. I monitor the memory usage as
>>>>>>>> far as I can and it seems that memory usage is pretty low. The most memory
>>>>>>>> intensive part of the program is probably the LU decomposition in the
>>>>>>>> context of the generalized EVP. Is there a way to evaluate how much memory
>>>>>>>> will be required for that step? I am currently running the debug version of
>>>>>>>> the code which I would assume would use more memory?
>>>>>>>> > >
>>>>>>>> > > I have attached the output of the job. Note that the program
>>>>>>>> uses twice PETSc: 1) to solve a linear system for which no problem occurs,
>>>>>>>> and, 2) to solve the Generalized EVP with SLEPc, where I get the error.
>>>>>>>> > >
>>>>>>>> > > Thanks
>>>>>>>> > >
>>>>>>>> > > Anthony
>>>>>>>> > > <test.out-superlu_dist-151x151>
>>>>>>>> >
>>>>>>>> > In the output you are attaching there are no SLEPc objects in the
>>>>>>>> report and SLEPc options are not used. It seems that SLEPc calls are
>>>>>>>> skipped?
>>>>>>>> >
>>>>>>>> > Do you get the same error with MUMPS? Have you tried to solve
>>>>>>>> linear systems with a preconditioned iterative solver?
>>>>>>>> >
>>>>>>>> > Jose
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150727/ff4933d7/attachment-0001.html>
More information about the petsc-users
mailing list