[petsc-users] Can't expand MemType 1: jcol 16104

Thu Jul 30 19:23:03 CDT 2015

Hi Hong, Satish,

I have been using petsc-3.6.1 with superlu_dist 4.1 
(--download-superlu_dist=superlu_dist_4.1.tar.gz) for a few days now. It 
seems to be working fine. However, reading Satish email below, I wonder 
if there is some kind of patch I need to apply for PETSc?

Thanks

Anthony

On 07/30/2015 01:51 PM, Satish Balay wrote:
> I've updated petsc to use v4.1. The changes are in branch
> 'balay/update-superlu_dist-4.1' - and merged to 'next' for now.
>
> Satish
>
> On Wed, 29 Jul 2015, Xiaoye S. Li wrote:
>
>> Thanks for quick update.  In the new tarball, I have already removed the
>> junk files, as pointed out by Satish.
>>
>> Sherry
>>
>> On Wed, Jul 29, 2015 at 8:36 AM, Hong <hzhang at mcs.anl.gov> wrote:
>>
>>> Sherry,
>>> With your bugfix, superlu_dist-4.1 works now:
>>>
>>> petsc/src/ksp/ksp/examples/tutorials (master)
>>> $ mpiexec -n 4 ./ex10 -f0 Amat_binary.m -rhs 0 -pc_type lu
>>> -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_parsymbfact
>>> Number of iterations =   1
>>> Residual norm 2.11605e-11
>>>
>>> Once you address Satish's request, we'll update petsc interface to this
>>> version of superlu_dist.
>>>
>>> Anthony:
>>> Please download the latest superlu_dist-v4.1,
>>> then configure petsc with
>>> '--download-superlu_dist=superlu_dist_4.1.tar.gz'
>>>
>>> Hong
>>>
>>> On Tue, Jul 28, 2015 at 11:11 AM, Satish Balay <balay at mcs.anl.gov> wrote:
>>>
>>>> Sherry,
>>>>
>>>> One minor issue with the tarball. I see the following new files in the
>>>> v4.1 tarball
>>>> [when comparing it with v4.0]. Some of these files are perhaps junk files
>>>> - and can
>>>> be removed from the tarball?
>>>>
>>>>     EXAMPLE/dscatter.c.bak
>>>>     EXAMPLE/g10.cua
>>>>     EXAMPLE/g4.cua
>>>>     EXAMPLE/g4.postorder.eps
>>>>     EXAMPLE/g4.rua
>>>>     EXAMPLE/g4_postorder.jpg
>>>>     EXAMPLE/hostname
>>>>     EXAMPLE/pdgssvx.c
>>>>     EXAMPLE/pdgstrf2.c
>>>>     EXAMPLE/pwd
>>>>     EXAMPLE/pzgstrf2.c
>>>>     EXAMPLE/pzgstrf_v3.3.c
>>>>     EXAMPLE/pzutil.c
>>>>     EXAMPLE/test.bat
>>>>     EXAMPLE/test.cpu.bat
>>>>     EXAMPLE/test.err
>>>>     EXAMPLE/test.err.1
>>>>     EXAMPLE/zlook_ahead_update.c
>>>>     FORTRAN/make.out
>>>>     FORTRAN/zcreate_dist_matrix.c
>>>>     MAKE_INC/make.xc30
>>>>     SRC/int_t
>>>>     SRC/lnbrow
>>>>     SRC/make.out
>>>>     SRC/rnbrow
>>>>     SRC/temp
>>>>     SRC/temp1
>>>>
>>>>
>>>> Thanks,
>>>> Satish
>>>>
>>>>
>>>> On Tue, 28 Jul 2015, Xiaoye S. Li wrote:
>>>>
>>>>> I am checking v4.1 now. I'll let you know when I fixed the problem.
>>>>>
>>>>> Sherry
>>>>>
>>>>> On Tue, Jul 28, 2015 at 8:27 AM, Hong <hzhang at mcs.anl.gov> wrote:
>>>>>
>>>>>> Sherry,
>>>>>> I tested with superlu_dist v4.1. The extra printings are gone, but
>>>> hang
>>>>>> remains.
>>>>>> It hangs at
>>>>>>
>>>>>> #5  0x00007fde5af1c818 in PMPI_Wait (request=0xb6e4e0,
>>>>>> status=0x7fff9cd83d60)
>>>>>>      at src/mpi/pt2pt/wait.c:168
>>>>>> #6  0x00007fde602dd635 in pzgstrf (options=0x9202f0, m=4900, n=4900,
>>>>>>      anorm=13.738475134194639, LUstruct=0x9203c8, grid=0x9202c8,
>>>>>>      stat=0x7fff9cd84880, info=0x7fff9cd848bc) at pzgstrf.c:1308
>>>>>>
>>>>>>                  if (recv_req[0] != MPI_REQUEST_NULL) {
>>>>>>   -->                   MPI_Wait (&recv_req[0], &status);
>>>>>>
>>>>>> We will update petsc interface to superlu_dist v4.1.
>>>>>>
>>>>>> Hong
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 27, 2015 at 11:33 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>>>>>>
>>>>>>> Hong,
>>>>>>> Thanks for trying out.
>>>>>>> The extra printings are not properly guarded by the print level.  I
>>>> will
>>>>>>> fix that.   I will look into the hang problem soon.
>>>>>>>
>>>>>>> Sherry
>>>>>>> 
>>>>>>>
>>>>>>> On Mon, Jul 27, 2015 at 7:50 PM, Hong <hzhang at mcs.anl.gov> wrote:
>>>>>>>
>>>>>>>> Sherry,
>>>>>>>>
>>>>>>>> I can repeat hang using petsc/src/ksp/ksp/examples/tutorials/ex10.c:
>>>>>>>> mpiexec -n 4 ./ex10 -f0 /homes/hzhang/tmp/Amat_binary.m -rhs 0
>>>> -pc_type
>>>>>>>> lu -pc_factor_mat_solver_package superlu_dist
>>>> -mat_superlu_dist_parsymbfact
>>>>>>>> ...
>>>>>>>> .. Starting with 1 OpenMP threads
>>>>>>>> [0] .. BIG U size 1342464
>>>>>>>> [0] .. BIG V size 131072
>>>>>>>>    Max row size is 1311
>>>>>>>>    Using buffer_size of 5000000
>>>>>>>>    Threads per process 1
>>>>>>>> ...
>>>>>>>>
>>>>>>>> using a debugger (with petsc option '-start_in_debugger'), I find
>>>> that
>>>>>>>> hang occurs at
>>>>>>>> #0  0x00007f117d870998 in __GI___poll (fds=0x20da750, nfds=4,
>>>>>>>>      timeout=<optimized out>, timeout at entry=-1)
>>>>>>>>      at ../sysdeps/unix/sysv/linux/poll.c:83
>>>>>>>> #1  0x00007f117de9f7de in MPIDU_Sock_wait (sock_set=0x20da550,
>>>>>>>>      millisecond_timeout=millisecond_timeout at entry=-1,
>>>>>>>>      eventp=eventp at entry=0x7fff654930b0)
>>>>>>>>      at src/mpid/common/sock/poll/sock_wait.i:123
>>>>>>>> #2  0x00007f117de898b8 in MPIDI_CH3i_Progress_wait (
>>>>>>>>      progress_state=0x7fff65493120)
>>>>>>>>      at src/mpid/ch3/channels/sock/src/ch3_progress.c:218
>>>>>>>> #3  MPIDI_CH3I_Progress (blocking=blocking at entry=1,
>>>>>>>>      state=state at entry=0x7fff65493120)
>>>>>>>>      at src/mpid/ch3/channels/sock/src/ch3_progress.c:921
>>>>>>>> #4  0x00007f117de1a559 in MPIR_Wait_impl (request=request at entry
>>>>>>>> =0x262df90,
>>>>>>>>      status=status at entry=0x7fff65493390) at src/mpi/pt2pt/wait.c:67
>>>>>>>> #5  0x00007f117de1a818 in PMPI_Wait (request=0x262df90,
>>>>>>>> status=0x7fff65493390)
>>>>>>>>      at src/mpi/pt2pt/wait.c:168
>>>>>>>> #6  0x00007f11831da557 in pzgstrf (options=0x23dfda0, m=4900,
>>>> n=4900,
>>>>>>>>      anorm=13.738475134194639, LUstruct=0x23dfe78, grid=0x23dfd78,
>>>>>>>>      stat=0x7fff65493ea0, info=0x7fff65493edc) at pzgstrf.c:1308
>>>>>>>>
>>>>>>>> #7  0x00007f11831bf3bd in pzgssvx (options=0x23dfda0, A=0x23dfe30,
>>>>>>>>      ScalePermstruct=0x23dfe50, B=0x0, ldb=1225, nrhs=0,
>>>> grid=0x23dfd78,
>>>>>>>>      LUstruct=0x23dfe78, SOLVEstruct=0x23dfe98, berr=0x0,
>>>>>>>> stat=0x7fff65493ea0,
>>>>>>>> ---Type <return> to continue, or q <return> to quit---
>>>>>>>>      info=0x7fff65493edc) at pzgssvx.c:1063
>>>>>>>>
>>>>>>>> #8  0x00007f11825c2340 in MatLUFactorNumeric_SuperLU_DIST
>>>> (F=0x23a0110,
>>>>>>>>      A=0x21bb7e0, info=0x2355068)
>>>>>>>>      at
>>>>>>>>
>>>> /sandbox/hzhang/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:411
>>>>>>>> #9  0x00007f1181c6c567 in MatLUFactorNumeric (fact=0x23a0110,
>>>>>>>> mat=0x21bb7e0,
>>>>>>>>      info=0x2355068) at
>>>>>>>> /sandbox/hzhang/petsc/src/mat/interface/matrix.c:2946
>>>>>>>> #10 0x00007f1182a56489 in PCSetUp_LU (pc=0x2353a10)
>>>>>>>>      at /sandbox/hzhang/petsc/src/ksp/pc/impls/factor/lu/lu.c:152
>>>>>>>> #11 0x00007f1182b16f24 in PCSetUp (pc=0x2353a10)
>>>>>>>>      at /sandbox/hzhang/petsc/src/ksp/pc/interface/precon.c:983
>>>>>>>> #12 0x00007f1182be61b5 in KSPSetUp (ksp=0x232c2a0)
>>>>>>>>      at /sandbox/hzhang/petsc/src/ksp/ksp/interface/itfunc.c:332
>>>>>>>> #13 0x0000000000405a31 in main (argc=11, args=0x7fff65499578)
>>>>>>>>      at
>>>> /sandbox/hzhang/petsc/src/ksp/ksp/examples/tutorials/ex10.c:312
>>>>>>>> You may take a look at it. Sequential symbolic factorization works
>>>> fine.
>>>>>>>> Why superlu_dist (v4.0) in complex precision displays
>>>>>>>>
>>>>>>>> .. Starting with 1 OpenMP threads
>>>>>>>> [0] .. BIG U size 1342464
>>>>>>>> [0] .. BIG V size 131072
>>>>>>>>    Max row size is 1311
>>>>>>>>    Using buffer_size of 5000000
>>>>>>>>    Threads per process 1
>>>>>>>> ...
>>>>>>>>
>>>>>>>> I realize that I use superlu_dist v4.0. Would v4.1 works? I'll give
>>>> it a
>>>>>>>> try tomorrow.
>>>>>>>>
>>>>>>>> Hong
>>>>>>>>
>>>>>>>> On Mon, Jul 27, 2015 at 1:25 PM, Anthony Paul Haas <
>>>>>>>> aph at email.arizona.edu> wrote:
>>>>>>>>
>>>>>>>>> Hi Hong,
>>>>>>>>>
>>>>>>>>> No that is not the correct matrix. Note that I forgot to mention
>>>> that
>>>>>>>>> it is a complex matrix. I tried loading the matrix I sent you this
>>>> morning
>>>>>>>>> with:
>>>>>>>>>
>>>>>>>>> !...Load a Matrix in Binary Format
>>>>>>>>>        call
>>>>>>>>>
>>>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr)
>>>>>>>>>        call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr)
>>>>>>>>>        call MatSetType(DLOAD,MATAIJ,ierr)
>>>>>>>>>        call MatLoad(DLOAD,viewer,ierr)
>>>>>>>>>        call PetscViewerDestroy(viewer,ierr)
>>>>>>>>>
>>>>>>>>>        call MatView(DLOAD,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>>>>>
>>>>>>>>> The first 37 rows should look like this:
>>>>>>>>>
>>>>>>>>> Mat Object: 2 MPI processes
>>>>>>>>>    type: mpiaij
>>>>>>>>> row 0: (0, 1)
>>>>>>>>> row 1: (1, 1)
>>>>>>>>> row 2: (2, 1)
>>>>>>>>> row 3: (3, 1)
>>>>>>>>> row 4: (4, 1)
>>>>>>>>> row 5: (5, 1)
>>>>>>>>> row 6: (6, 1)
>>>>>>>>> row 7: (7, 1)
>>>>>>>>> row 8: (8, 1)
>>>>>>>>> row 9: (9, 1)
>>>>>>>>> row 10: (10, 1)
>>>>>>>>> row 11: (11, 1)
>>>>>>>>> row 12: (12, 1)
>>>>>>>>> row 13: (13, 1)
>>>>>>>>> row 14: (14, 1)
>>>>>>>>> row 15: (15, 1)
>>>>>>>>> row 16: (16, 1)
>>>>>>>>> row 17: (17, 1)
>>>>>>>>> row 18: (18, 1)
>>>>>>>>> row 19: (19, 1)
>>>>>>>>> row 20: (20, 1)
>>>>>>>>> row 21: (21, 1)
>>>>>>>>> row 22: (22, 1)
>>>>>>>>> row 23: (23, 1)
>>>>>>>>> row 24: (24, 1)
>>>>>>>>> row 25: (25, 1)
>>>>>>>>> row 26: (26, 1)
>>>>>>>>> row 27: (27, 1)
>>>>>>>>> row 28: (28, 1)
>>>>>>>>> row 29: (29, 1)
>>>>>>>>> row 30: (30, 1)
>>>>>>>>> row 31: (31, 1)
>>>>>>>>> row 32: (32, 1)
>>>>>>>>> row 33: (33, 1)
>>>>>>>>> row 34: (34, 1)
>>>>>>>>> row 35: (35, 1)
>>>>>>>>> row 36: (1, -41.2444)  (35, -41.2444)  (36, 118.049 - 0.999271 i)
>>>> (37,
>>>>>>>>> -21.447)  (38, 5.18873)  (39, -2.34856)  (40, 1.3607)  (41,
>>>> -0.898206)
>>>>>>>>> (42, 0.642715)  (43, -0.48593)  (44, 0.382471)  (45, -0.310476)
>>>> (46,
>>>>>>>>> 0.258302)  (47, -0.219268)  (48, 0.189304)  (49, -0.165815)  (50,
>>>>>>>>> 0.147076)  (51, -0.131907)  (52, 0.119478)  (53, -0.109189)  (54,
>>>> 0.1006)
>>>>>>>>> (55, -0.0933795)  (56, 0.0872779)  (57, -0.0821019)  (58,
>>>> 0.0777011)  (59,
>>>>>>>>> -0.0739575)  (60, 0.0707775)  (61, -0.0680868)  (62, 0.0658258)
>>>> (63,
>>>>>>>>> -0.0639473)  (64, 0.0624137)  (65, -0.0611954)  (66, 0.0602698)
>>>> (67,
>>>>>>>>> -0.0596202)  (68, 0.0592349)  (69, -0.0295536)  (71, -21.447)
>>>> (106,
>>>>>>>>> 5.18873)  (141, -2.34856)  (176, 1.3607)  (211, -0.898206)  (246,
>>>>>>>>> 0.642715)  (281, -0.48593)  (316, 0.382471)  (351, -0.310476)
>>>> (386,
>>>>>>>>> 0.258302)  (421, -0.219268)  (456, 0.189304)  (491, -0.165815)
>>>> (526,
>>>>>>>>> 0.147076)  (561, -0.131907)  (596, 0.119478)  (631, -0.109189)
>>>> (666,
>>>>>>>>> 0.1006)  (701, -0.0933795)  (736, 0.0872779)  (771, -0.0821019)
>>>> (806,
>>>>>>>>> 0.0777011)  (841, -0.0739575)  (876, 0.0707775)  (911,
>>>> -0.0680868)  (946,
>>>>>>>>> 0.0658258)  (981, -0.0639473)  (1016, 0.0624137)  (1051,
>>>> -0.0611954)
>>>>>>>>> (1086, 0.0602698)  (1121, -0.0596202)  (1156, 0.0592349)  (1191,
>>>>>>>>> -0.0295536)  (1261, 0)  (3676, 117.211)  (3711, -58.4801)  (3746,
>>>>>>>>> -78.3633)  (3781, 29.4911)  (3816, -15.8073)  (3851, 9.94324)
>>>> (3886,
>>>>>>>>> -6.87205)  (3921, 5.05774)  (3956, -3.89521)  (3991, 3.10522)
>>>> (4026,
>>>>>>>>> -2.54388)  (4061, 2.13082)  (4096, -1.8182)  (4131, 1.57606)
>>>> (4166,
>>>>>>>>> -1.38491)  (4201, 1.23155)  (4236, -1.10685)  (4271, 1.00428)
>>>> (4306,
>>>>>>>>> -0.919116)  (4341, 0.847829)  (4376, -0.787776)  (4411, 0.736933)
>>>> (4446,
>>>>>>>>> -0.693735)  (4481, 0.656958)  (4516, -0.625638)  (4551, 0.599007)
>>>> (4586,
>>>>>>>>> -0.576454)  (4621, 0.557491)  (4656, -0.541726)  (4691, 0.528849)
>>>> (4726,
>>>>>>>>> -0.518617)  (4761, 0.51084)  (4796, -0.50538)  (4831, 0.502142)
>>>> (4866,
>>>>>>>>> -0.250534)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Anthony
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 24, 2015 at 7:56 PM, Hong <hzhang at mcs.anl.gov> wrote:
>>>>>>>>>
>>>>>>>>>> Anthony:
>>>>>>>>>> I test your Amat_binary.m
>>>>>>>>>> using petsc/src/ksp/ksp/examples/tutorials/ex10.c.
>>>>>>>>>> Your matrix has many zero rows:
>>>>>>>>>> ./ex10 -f0 ~/tmp/Amat_binary.m -rhs 0 -mat_view |more
>>>>>>>>>> Mat Object: 1 MPI processes
>>>>>>>>>>    type: seqaij
>>>>>>>>>> row 0: (0, 1)
>>>>>>>>>> row 1: (1, 0)
>>>>>>>>>> row 2: (2, 1)
>>>>>>>>>> row 3: (3, 0)
>>>>>>>>>> row 4: (4, 1)
>>>>>>>>>> row 5: (5, 0)
>>>>>>>>>> row 6: (6, 1)
>>>>>>>>>> row 7: (7, 0)
>>>>>>>>>> row 8: (8, 1)
>>>>>>>>>> row 9: (9, 0)
>>>>>>>>>> ...
>>>>>>>>>> row 36: (1, 1)  (35, 0)  (36, 1)  (37, 0)  (38, 1)  (39, 0)  (40,
>>>> 1)
>>>>>>>>>>   (41, 0)  (42, 1)  (43, 0)  (44, 1)  (45,
>>>>>>>>>> 0)  (46, 1)  (47, 0)  (48, 1)  (49, 0)  (50, 1)  (51, 0)  (52, 1)
>>>>>>>>>>   (53, 0)  (54, 1)  (55, 0)  (56, 1)  (57, 0)
>>>>>>>>>>   (58, 1)  (59, 0)  (60, 1)  ...
>>>>>>>>>>
>>>>>>>>>> Do you send us correct matrix?
>>>>>>>>>>
>>>>>>>>>>> I ran my code through valgrind and gdb as suggested by Barry. I
>>>> am
>>>>>>>>>>> now coming back to some problem I have had while running with
>>>> parallel
>>>>>>>>>>> symbolic factorization. I am attaching a test matrix (petsc
>>>> binary format)
>>>>>>>>>>> that I LU decompose and then use to solve a linear system (see
>>>> code below).
>>>>>>>>>>> I can run on 2 processors with parsymbfact or with 4 processors
>>>> without
>>>>>>>>>>> parsymbfact. However, if I run on 4 procs with parsymbfact, the
>>>> code is
>>>>>>>>>>> just hanging. Below is the simplified test case that I have used
>>>> to test.
>>>>>>>>>>> The matrix A and B are built somewhere else in my program. The
>>>> matrix I am
>>>>>>>>>>> attaching is A-sigma*B (see below).
>>>>>>>>>>>
>>>>>>>>>>> One thing is that I don't know for sparse matrices what is the
>>>>>>>>>>> optimum number of processors to use for a LU decomposition? Does
>>>> it depend
>>>>>>>>>>> on the total number of nonzero? Do you have an easy way to
>>>> compute it?
>>>>>>>>>> You have to experiment your matrix on a target machine to find
>>>> out.
>>>>>>>>>> Hong
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>       Subroutine HowBigLUCanBe(rank)
>>>>>>>>>>>
>>>>>>>>>>>        IMPLICIT NONE
>>>>>>>>>>>
>>>>>>>>>>>        integer(i4b),intent(in) :: rank
>>>>>>>>>>>        integer(i4b)            :: i,ct
>>>>>>>>>>>        real(dp)                :: begin,endd
>>>>>>>>>>>        complex(dpc)            :: sigma
>>>>>>>>>>>
>>>>>>>>>>>        PetscErrorCode ierr
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        if (rank==0) call cpu_time(begin)
>>>>>>>>>>>
>>>>>>>>>>>        if (rank==0) then
>>>>>>>>>>>           write(*,*)
>>>>>>>>>>>           write(*,*)'Testing How Big LU Can Be...'
>>>>>>>>>>>           write(*,*)'============================'
>>>>>>>>>>>           write(*,*)
>>>>>>>>>>>        endif
>>>>>>>>>>>
>>>>>>>>>>>        sigma = (1.0d0,0.0d0)
>>>>>>>>>>>        call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) !
>>>> on
>>>>>>>>>>> exit A = A-sigma*B
>>>>>>>>>>>
>>>>>>>>>>> !.....Write Matrix to ASCII and Binary Format
>>>>>>>>>>>        !call
>>>>>>>>>>> PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)
>>>>>>>>>>>        !call MatView(DXX,viewer,ierr)
>>>>>>>>>>>        !call PetscViewerDestroy(viewer,ierr)
>>>>>>>>>>>
>>>>>>>>>>>        call
>>>>>>>>>>>
>>>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)
>>>>>>>>>>>        call MatView(A,viewer,ierr)
>>>>>>>>>>>        call PetscViewerDestroy(viewer,ierr)
>>>>>>>>>>>
>>>>>>>>>>> !.....Create Linear Solver Context
>>>>>>>>>>>        call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>>>>>>>>>>>
>>>>>>>>>>> !.....Set operators. Here the matrix that defines the linear
>>>> system
>>>>>>>>>>> also serves as the preconditioning matrix.
>>>>>>>>>>>        !call
>>>> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)
>>>>>>>>>>> !aha commented and replaced by next line
>>>>>>>>>>>        call KSPSetOperators(ksp,A,A,ierr) ! remember: here A =
>>>>>>>>>>> A-sigma*B
>>>>>>>>>>>
>>>>>>>>>>> !.....Set Relative and Absolute Tolerances and Uses Default for
>>>>>>>>>>> Divergence Tol
>>>>>>>>>>>        tol = 1.e-10
>>>>>>>>>>>        call
>>>>>>>>>>>
>>>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
>>>>>>>>>>> !.....Set the Direct (LU) Solver
>>>>>>>>>>>        call KSPSetType(ksp,KSPPREONLY,ierr)
>>>>>>>>>>>        call KSPGetPC(ksp,pc,ierr)
>>>>>>>>>>>        call PCSetType(pc,PCLU,ierr)
>>>>>>>>>>>        call
>>>> PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr)
>>>>>>>>>>> ! MATSOLVERSUPERLU_DIST MATSOLVERMUMPS
>>>>>>>>>>>
>>>>>>>>>>> !.....Create Right-Hand-Side Vector
>>>>>>>>>>>        call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)
>>>>>>>>>>>        call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)
>>>>>>>>>>>
>>>>>>>>>>>        allocate(xwork1(IendA-IstartA))
>>>>>>>>>>>        allocate(loc(IendA-IstartA))
>>>>>>>>>>>
>>>>>>>>>>>        ct=0
>>>>>>>>>>>        do i=IstartA,IendA-1
>>>>>>>>>>>           ct=ct+1
>>>>>>>>>>>           loc(ct)=i
>>>>>>>>>>>           xwork1(ct)=(1.0d0,0.0d0)
>>>>>>>>>>>        enddo
>>>>>>>>>>>
>>>>>>>>>>>        call
>>>>>>>>>>> VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)
>>>>>>>>>>>        call VecZeroEntries(sol,ierr)
>>>>>>>>>>>
>>>>>>>>>>>        deallocate(xwork1,loc)
>>>>>>>>>>>
>>>>>>>>>>> !.....Assemble Vectors
>>>>>>>>>>>        call VecAssemblyBegin(frhs,ierr)
>>>>>>>>>>>        call VecAssemblyEnd(frhs,ierr)
>>>>>>>>>>>
>>>>>>>>>>> !.....Solve the Linear System
>>>>>>>>>>>        call KSPSolve(ksp,frhs,sol,ierr)
>>>>>>>>>>>
>>>>>>>>>>>        !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>>>>>>>
>>>>>>>>>>>        if (rank==0) then
>>>>>>>>>>>           call cpu_time(endd)
>>>>>>>>>>>           write(*,*)
>>>>>>>>>>>           print '("Total time for HowBigLUCanBe = ",f21.3,"
>>>>>>>>>>> seconds.")',endd-begin
>>>>>>>>>>>        endif
>>>>>>>>>>>
>>>>>>>>>>>        call SlepcFinalize(ierr)
>>>>>>>>>>>
>>>>>>>>>>>        STOP
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>      end Subroutine HowBigLUCanBe
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 07/08/2015 11:23 AM, Xiaoye S. Li wrote:
>>>>>>>>>>>
>>>>>>>>>>>   Indeed, the parallel symbolic factorization routine needs power
>>>> of
>>>>>>>>>>> 2 processes, however, you can use however many processes you
>>>> need;
>>>>>>>>>>> internally, we redistribute matrix to nearest power of 2
>>>> processes, do
>>>>>>>>>>> symbolic, then redistribute back to all the processes to do
>>>> factorization,
>>>>>>>>>>> triangular solve etc.  So, there is no  restriction from the
>>>> users
>>>>>>>>>>> viewpoint.
>>>>>>>>>>>
>>>>>>>>>>>   It's difficult to tell what the problem is.  Do you think you
>>>> can
>>>>>>>>>>> print your matrix, then, I can do some debugging by running
>>>> superlu_dist
>>>>>>>>>>> standalone?
>>>>>>>>>>>
>>>>>>>>>>>   Sherry
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 8, 2015 at 10:34 AM, Anthony Paul Haas <
>>>>>>>>>>> aph at email.arizona.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>    Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>   I have used the switch -mat_superlu_dist_parsymbfact in my pbs
>>>>>>>>>>>> script. However, although my program worked fine with
>>>> sequential symbolic
>>>>>>>>>>>> factorization, I get one of the following 2 behaviors when I
>>>> run with
>>>>>>>>>>>> parallel symbolic factorization (depending on the number of
>>>> processors that
>>>>>>>>>>>> I use):
>>>>>>>>>>>>
>>>>>>>>>>>>   1) the program just hangs (it seems stuck in some subroutine
>>>> ==>
>>>>>>>>>>>> see test.out-hangs)
>>>>>>>>>>>>   2) I get a floating point exception ==> see
>>>>>>>>>>>> test.out-floating-point-exception
>>>>>>>>>>>>
>>>>>>>>>>>>   Note that as suggested in the Superlu manual, I use a power of
>>>> 2
>>>>>>>>>>>> number of procs. Are there any tunable parameters for the
>>>> parallel symbolic
>>>>>>>>>>>> factorization? Note that when I build my sparse matrix, most
>>>> elements I add
>>>>>>>>>>>> are nonzero of course but to simplify the programming, I also
>>>> add a few
>>>>>>>>>>>> zero elements in the sparse matrix. I was thinking that maybe
>>>> if the
>>>>>>>>>>>> parallel symbolic factorization proceed by block, there could
>>>> be some
>>>>>>>>>>>> blocks where the pivot would be zero, hence creating the FPE??
>>>>>>>>>>>>
>>>>>>>>>>>>   Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>>   Anthony
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <xsli at lbl.gov>
>>>> wrote:
>>>>>>>>>>>>>   Did you find out how to change option to use parallel symbolic
>>>>>>>>>>>>> factorization?  Perhaps PETSc team can help.
>>>>>>>>>>>>>
>>>>>>>>>>>>>   Sherry
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <xsli at lbl.gov>
>>>> wrote:
>>>>>>>>>>>>>>   Is there an inquiry function that tells you all the available
>>>>>>>>>>>>>> options?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   Sherry
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas <
>>>>>>>>>>>>>> aph at email.arizona.edu> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Hi Sherry,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   Thanks for your message. I have used superlu_dist default
>>>>>>>>>>>>>>> options. I did not realize that I was doing serial symbolic
>>>> factorization.
>>>>>>>>>>>>>>> That is probably the cause of my problem.
>>>>>>>>>>>>>>>   Each node on Garnet has 60GB usable memory and I can run
>>>> with
>>>>>>>>>>>>>>> 1,2,4,8,16 or 32 core per node.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   So I should use:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -mat_superlu_dist_r 20
>>>>>>>>>>>>>>> -mat_superlu_dist_c 32
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   How do you specify the parallel symbolic factorization
>>>> option?
>>>>>>>>>>>>>>> is it -mat_superlu_dist_matinput 1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   Anthony
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <xsli at lbl.gov>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   For superlu_dist failure, this occurs during symbolic
>>>>>>>>>>>>>>>> factorization.  Since you are using serial symbolic
>>>> factorization, it
>>>>>>>>>>>>>>>> requires the entire graph of A to be available in the
>>>> memory of one MPI
>>>>>>>>>>>>>>>> task. How much memory do you have for each MPI task?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   It won't help even if you use more processes.  You should
>>>> try
>>>>>>>>>>>>>>>> to use parallel symbolic factorization option.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   Another point.  You set up process grid as:
>>>>>>>>>>>>>>>>         Process grid nprow 32 x npcol 20
>>>>>>>>>>>>>>>>   For better performance, you show swap the grid dimension.
>>>> That
>>>>>>>>>>>>>>>> is, it's better to use 20 x 32, never gives nprow larger
>>>> than npcol.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   Sherry
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <
>>>> bsmith at mcs.anl.gov>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     I would suggest running a sequence of problems, 101 by
>>>> 101
>>>>>>>>>>>>>>>>> 111 by 111 etc and get the memory usage in each case (when
>>>> you run out of
>>>>>>>>>>>>>>>>> memory you can get NO useful information out about memory
>>>> needs). You can
>>>>>>>>>>>>>>>>> then plot memory usage as a function of problem size to
>>>> get a handle on how
>>>>>>>>>>>>>>>>> much memory it is using.  You can also run on more and
>>>> more processes
>>>>>>>>>>>>>>>>> (which have a total of more memory) to see how large a
>>>> problem you may be
>>>>>>>>>>>>>>>>> able to reach.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     MUMPS also has an "out of core" version (which we have
>>>> never
>>>>>>>>>>>>>>>>> used) that could in theory anyways let you get to large
>>>> problems if you
>>>>>>>>>>>>>>>>> have lots of disk space, but you are on your own figuring
>>>> out how to use it.
>>>>>>>>>>>>>>>>>    Barry
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <
>>>>>>>>>>>>>>>>> aph at email.arizona.edu> wrote:
>>>>>>>>>>>>>>>>>> Hi Jose,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In my code, I use once PETSc to solve a linear system to
>>>> get
>>>>>>>>>>>>>>>>> the baseflow (without using SLEPc) and then I use SLEPc to
>>>> do the stability
>>>>>>>>>>>>>>>>> analysis of that baseflow. This is why, there are some
>>>> SLEPc options that
>>>>>>>>>>>>>>>>> are not used in test.out-superlu_dist-151x151 (when I am
>>>> solving for the
>>>>>>>>>>>>>>>>> baseflow with PETSc only). I have attached a 101x101 case
>>>> for which I get
>>>>>>>>>>>>>>>>> the eigenvalues. That case works fine. However If i
>>>> increase to 151x151, I
>>>>>>>>>>>>>>>>> get the error that you can see in
>>>> test.out-superlu_dist-151x151 (similar
>>>>>>>>>>>>>>>>> error with mumps: see test.out-mumps-151x151 line 2918 ).
>>>> If you look a the
>>>>>>>>>>>>>>>>> very end of the files test.out-superlu_dist-151x151 and
>>>>>>>>>>>>>>>>> test.out-mumps-151x151, you will see that the last info
>>>> message printed is:
>>>>>>>>>>>>>>>>>> On Processor (after EPSSetFromOptions)  0    memory:
>>>>>>>>>>>>>>>>> 0.65073152000E+08          =====>  (see line 807 of
>>>> module_petsc.F90)
>>>>>>>>>>>>>>>>>> This means that the memory error probably occurs in the
>>>> call
>>>>>>>>>>>>>>>>> to EPSSolve (see module_petsc.F90 line 810). I would like
>>>> to evaluate how
>>>>>>>>>>>>>>>>> much memory is required by the most memory intensive
>>>> operation within
>>>>>>>>>>>>>>>>> EPSSolve. Since I am solving a generalized EVP, I would
>>>> imagine that it
>>>>>>>>>>>>>>>>> would be the LU decomposition. But is there an accurate
>>>> way of doing it?
>>>>>>>>>>>>>>>>>> Before starting with iterative solvers, I would like to
>>>>>>>>>>>>>>>>> exploit as much as I can direct solvers. I tried GMRES
>>>> with default
>>>>>>>>>>>>>>>>> preconditioner at some point but I had convergence
>>>> problem. What
>>>>>>>>>>>>>>>>> solver/preconditioner would you recommend for a
>>>> generalized non-Hermitian
>>>>>>>>>>>>>>>>> (EPS_GNHEP) EVP?
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Anthony
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <
>>>>>>>>>>>>>>>>> jroman at dsic.upv.es> wrote:
>>>>>>>>>>>>>>>>>> El 07/07/2015, a las 02:33, Anthony Haas escribió:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I am computing eigenvalues using PETSc/SLEPc and
>>>>>>>>>>>>>>>>> superlu_dist for the LU decomposition (my problem is a
>>>> generalized
>>>>>>>>>>>>>>>>> eigenvalue problem). The code runs fine for a grid with
>>>> 101x101 but when I
>>>>>>>>>>>>>>>>> increase to 151x151, I get the following error:
>>>>>>>>>>>>>>>>>>> Can't expand MemType 1: jcol 16104   (and then [NID
>>>> 00037]
>>>>>>>>>>>>>>>>> 2015-07-06 19:19:17 Apid 31025976: OOM killer terminated
>>>> this process.)
>>>>>>>>>>>>>>>>>>> It seems to be a memory problem. I monitor the memory
>>>> usage
>>>>>>>>>>>>>>>>> as far as I can and it seems that memory usage is pretty
>>>> low. The most
>>>>>>>>>>>>>>>>> memory intensive part of the program is probably the LU
>>>> decomposition in
>>>>>>>>>>>>>>>>> the context of the generalized EVP. Is there a way to
>>>> evaluate how much
>>>>>>>>>>>>>>>>> memory will be required for that step? I am currently
>>>> running the debug
>>>>>>>>>>>>>>>>> version of the code which I would assume would use more
>>>> memory?
>>>>>>>>>>>>>>>>>>> I have attached the output of the job. Note that the
>>>>>>>>>>>>>>>>> program uses twice PETSc: 1) to solve a linear system for
>>>> which no problem
>>>>>>>>>>>>>>>>> occurs, and, 2) to solve the Generalized EVP with SLEPc,
>>>> where I get the
>>>>>>>>>>>>>>>>> error.
>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Anthony
>>>>>>>>>>>>>>>>>>> <test.out-superlu_dist-151x151>
>>>>>>>>>>>>>>>>>> In the output you are attaching there are no SLEPc
>>>> objects in
>>>>>>>>>>>>>>>>> the report and SLEPc options are not used. It seems that
>>>> SLEPc calls are
>>>>>>>>>>>>>>>>> skipped?
>>>>>>>>>>>>>>>>>> Do you get the same error with MUMPS? Have you tried to
>>>> solve
>>>>>>>>>>>>>>>>> linear systems with a preconditioned iterative solver?
>>>>>>>>>>>>>>>>>> Jose
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   >
>>>>>>>>>>>>>>>>>
>>>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>