From knepley at gmail.com Wed Nov 1 03:34:38 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 1 Nov 2017 04:34:38 -0400 Subject: [petsc-users] petsc4py sparse matrix construction time In-Reply-To: References: Message-ID: On Tue, Oct 31, 2017 at 10:00 PM, Cetinbas, Cankur Firat wrote: > Hi, > > Thanks a lot. Based on both of your suggestions I modified the code using > Mat.createAIJ() and csr option. The computation time decreased > significantly after using this method. Still if there is a better option > please let me know after seeing the modified code below. > > At first trial with 1000x1000 matrix with 96019 non-zeros in the matrix, > the computation time did not scale with the number of cores : Single core > python @ 0.0035s, single core petsc @ 0.0024s, 2 cores petsc @ 0.0036s, 4 > cores petsc @ 0.0032, 8 cores petsc @ 0.0030s. > > Then I tried with larger matrix 181797x181797 with more non-zeros and I > got the following results: Single core python @ 0.021, single core petsc @ > 0.031, 2 cores petsc @ 0.024s, 4 cores petsc @ 0.014, 8 cores petsc @ > 0.009s, 16 cores petsc @ 0.0087s. > > I think the optimum number of nodes is highly dependent on matrix size and > the number of non-zeros. In the real code matrix size (and so the number of > non-zero elements) will grow at every iteration starting with very small > matrices growing to very big ones. Is it possible to set the number process > from the code dynamically? > I am not sure how you are interpreting these measurements. Normally, I would say 1) Everything timed below is "parallel overhead". This is not intended to scale with P, instead it will look like a constant, as you observe 2) The time to compute the matrix entires should far outstrip the time below to figure the nonzero structure. Is this true? 3) Solve time is often larger than matrix calculation. Is it? Thus, we deciding on parallelism, you need to look at the largest costs, and how they scale with P. Thanks, Matt Another question is about the data types; mpi4py only let me transfer float > type data, and petsc4py only lets me use int32 type indices. Besides keep > converting the data, is there any solution for this? > > The modified code for matrix creation part: > > comm = MPI.COMM_WORLD > rank = comm.Get_rank() > size = comm.Get_size() > > if rank==0: > row = np.loadtxt('row1000.out').astype(dtype='int32') > col = np.loadtxt('col1000.out').astype(dtype='int32') > val = np.loadtxt('val1000.out').astype(dtype='int32') > m = 1000 # 1000 x 1000 matrix > if size>1: > rbc = np.bincount(row)*1.0 > ieq = int(np.floor(m/size)) > a = [ieq]*size > ix = int(np.mod(m,size)) > if ix>0: > for i in range(0,ix): > a[i]= a[i]+1 > a = np.array([0]+a).cumsum() > b = np.zeros(a.shape[0]-1) > for i in range(0,a.shape[0]-1): > b[i]=rbc[a[i]:a[i+1]].sum() # b is the send counts for > Scatterv > row = row.astype(dtype=float) > col = col.astype(dtype=float) > val = val.astype(dtype=float) > else: > row=None > col=None > val=None > indpt=None > b=None > m=None > > if size>1: > ml = comm.bcast(m,root=0) > bl = comm.bcast(b,root=0) > row_lcl = np.zeros(bl[rank]) > col_lcl = row_lcl.copy() > val_lcl = row_lcl.copy() > > comm.Scatterv([row,b],row_lcl) > comm.Scatterv([col,b],col_lcl) > comm.Scatterv([val,b],val_lcl) > comm.Barrier() > > row_lcl = row_lcl.astype(dtype='int32') > col_lcl = col_lcl.astype(dtype='int32') > val_lcl = val_lcl.astype(dtype='int32') > > indptr = np.bincount(row_lcl) > indptr = indptr[indptr>0] > indptr = np.insert(indptr,0,0).cumsum() > indptr = indptr.astype(dtype='int32') > comm.Barrier() > > pA = PETSc.Mat().createAIJ(size=(ml,ml),csr=(indptr, col_lcl, > val_lcl)) # Matrix generation > > else: > indptr = np.bincount(row) > indptr = np.insert(indptr,0,0).cumsum() > indptr = indptr.astype(dtype='int32') > st=time.time() > pA = PETSc.Mat().createAIJ(size=(m,m),csr=(indptr, col, val)) > print('dt:',time.time()-st) > > > Regards, > > Firat > > > -----Original Message----- > From: Smith, Barry F. > Sent: Tuesday, October 31, 2017 10:18 AM > To: Matthew Knepley > Cc: Cetinbas, Cankur Firat; petsc-users at mcs.anl.gov; Ahluwalia, Rajesh K. > Subject: Re: [petsc-users] petsc4py sparse matrix construction time > > > You also need to make sure that most matrix entries are generated on > the process that they will belong on. > > Barry > > > On Oct 30, 2017, at 8:01 PM, Matthew Knepley wrote: > > > > On Mon, Oct 30, 2017 at 8:06 PM, Cetinbas, Cankur Firat < > ccetinbas at anl.gov> wrote: > > Hello, > > > > > > > > I am a beginner both in PETSc and mpi4py. I have been working on > parallelizing our water transport code (where we solve linear system of > equations) and I started with the toy code below. > > > > > > > > The toy code reads right hand size (rh), row, column, value vectors to > construct sparse coefficient matrix and then scatters them to construct > parallel PETSc coefficient matrix and right hand side vector. > > > > > > > > The sparse matrix generation time is extremely high in comparison to > sps.csr_matrix((val, (row, col)), shape=(n,n)) in python. For instance > python generates 181197x181197 sparse matrix in 0.06 seconds and this code > with 32 cores:1.19s, 16 cores:6.98s and 8 cores:29.5 s. I was wondering if > I am making a mistake in generating sparse matrix? Is there a more > efficient way? > > > > > > It looks like you do not preallocate the matrix. There is a chapter on > this in the manual. > > > > Matt > > > > Thanks for your help in advance. > > > > > > > > Regards, > > > > > > > > Firat > > > > > > > > from petsc4py import PETSc > > > > from mpi4py import MPI > > > > import numpy as np > > > > import time > > > > > > > > comm = MPI.COMM_WORLD > > > > rank = comm.Get_rank() > > > > size = comm.Get_size() > > > > > > > > if rank==0: > > > > # proc 0 loads tomo image and does fast calculations to append row, > col, val, rh lists > > > > # in the real code this vectors will be available on proc 0 no txt > files are read > > > > row = np.loadtxt('row.out') # indices of non-zero rows > > > > col = np.loadtxt('col.out') # indices of non-zero columns > > > > val = np.loadtxt('vs.out') # values in the sparse matrix > > > > rh = np.loadtxt('RHS.out') # right hand side vector > > > > n = row.shape[0] #1045699 > > > > m = rh.shape[0] #181197 square sparse matrix size > > > > else: > > > > n = None > > > > m = None > > > > row = None > > > > col = None > > > > val = None > > > > rh = None > > > > rh_ind = None > > > > > > > > m_lcl = comm.bcast(m,root=0) > > > > n_lcl = comm.bcast(n,root=0) > > > > neq = n_lcl//size > > > > meq = m_lcl//size > > > > nx = np.mod(n_lcl,size) > > > > mx = np.mod(m_lcl,size) > > > > row_lcl = np.zeros(neq) > > > > col_lcl = np.zeros(neq) > > > > val_lcl = np.zeros(neq) > > > > rh_lcl = np.zeros(meq) > > > > a = [neq]*size #send counts for Scatterv > > > > am = [meq]*size #send counts for Scatterv > > > > > > > > if nx>0: > > > > for i in range(0,nx): > > > > if rank==i: > > > > row_lcl = np.zeros(neq+1) > > > > col_lcl = np.zeros(neq+1) > > > > val_lcl = np.zeros(neq+1) > > > > a[i] = a[i]+1 > > > > if mx>0: > > > > for ii in range(0,mx): > > > > if rank==ii: > > > > rh_lcl = np.zeros(meq+1) > > > > am[ii] = am[ii]+1 > > > > > > > > comm.Scatterv([row,a],row_lcl) > > > > comm.Scatterv([col,a],col_lcl) > > > > comm.Scatterv([val,a],val_lcl) > > > > comm.Scatterv([rh,am],rh_lcl) > > > > comm.Barrier() > > > > > > > > A = PETSc.Mat() > > > > A.create() > > > > A.setSizes([m_lcl,m_lcl]) > > > > A.setType('aij') > > > > A.setUp() > > > > lr = row_lcl.shape[0] > > > > for i in range(0,lr): > > > > A[row_lcl[i],col_lcl[i]] = val_lcl[i] > > > > A.assemblyBegin() > > > > A.assemblyEnd() > > > > > > > > if size>1: # to get the range for scattered vectors > > > > ami = [0] > > > > ami = np.array([0]+am).cumsum() > > > > for kk in range(0,size): > > > > if rank==kk: > > > > Is = ami[kk] > > > > Ie = ami[kk+1] > > > > else: > > > > Is=0; Ie=m_lcl > > > > > > > > b= PETSc.Vec() > > > > b.create() > > > > b.setSizes(m_lcl) > > > > b.setFromOptions() > > > > b.setUp() > > > > b.setValues(list(range(Is,Ie)),rh_lcl) > > > > b.assemblyBegin() > > > > b.assemblyEnd() > > > > > > > > # solution vector > > > > x = b.duplicate() > > > > x.assemblyBegin() > > > > x.assemblyEnd() > > > > > > > > # create linear solver > > > > ksp = PETSc.KSP() > > > > ksp.create() > > > > ksp.setOperators(A) > > > > ksp.setType('cg') > > > > #ksp.getPC().setType('icc') # only sequential > > > > ksp.getPC().setType('jacobi') > > > > print('solving with:', ksp.getType()) > > > > > > > > #solve > > > > st=time.time() > > > > ksp.solve(b,x) > > > > et=time.time() > > > > print(et-st) > > > > > > > > if size>1: > > > > #gather > > > > if rank==0: > > > > xGthr = np.zeros(m) > > > > else: > > > > xGthr = None > > > > comm.Gatherv(x,[xGthr,am]) > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Nov 1 08:34:16 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 1 Nov 2017 08:34:16 -0500 Subject: [petsc-users] petsc4py sparse matrix construction time In-Reply-To: References: Message-ID: Should add: To make sure your matrix assembly is efficient - you should run the code with the option -info, and make sure there are no mallocs during assembly. I'm not sure how petsc4py processes options. One way is to set such options with PETSC_OPTIONS env variable - and then run the code. Satish On Wed, 1 Nov 2017, Matthew Knepley wrote: > On Tue, Oct 31, 2017 at 10:00 PM, Cetinbas, Cankur Firat > wrote: > > > Hi, > > > > Thanks a lot. Based on both of your suggestions I modified the code using > > Mat.createAIJ() and csr option. The computation time decreased > > significantly after using this method. Still if there is a better option > > please let me know after seeing the modified code below. > > > > At first trial with 1000x1000 matrix with 96019 non-zeros in the matrix, > > the computation time did not scale with the number of cores : Single core > > python @ 0.0035s, single core petsc @ 0.0024s, 2 cores petsc @ 0.0036s, 4 > > cores petsc @ 0.0032, 8 cores petsc @ 0.0030s. > > > > Then I tried with larger matrix 181797x181797 with more non-zeros and I > > got the following results: Single core python @ 0.021, single core petsc @ > > 0.031, 2 cores petsc @ 0.024s, 4 cores petsc @ 0.014, 8 cores petsc @ > > 0.009s, 16 cores petsc @ 0.0087s. > > > > I think the optimum number of nodes is highly dependent on matrix size and > > the number of non-zeros. In the real code matrix size (and so the number of > > non-zero elements) will grow at every iteration starting with very small > > matrices growing to very big ones. Is it possible to set the number process > > from the code dynamically? > > > > I am not sure how you are interpreting these measurements. Normally, I > would say > > 1) Everything timed below is "parallel overhead". This is not intended to > scale with P, instead it will look like a constant, as you observe > > 2) The time to compute the matrix entires should far outstrip the time > below to figure the nonzero structure. Is this true? > > 3) Solve time is often larger than matrix calculation. Is it? > > Thus, we deciding on parallelism, you need to look at the largest costs, > and how they scale with P. > > Thanks, > > Matt > > Another question is about the data types; mpi4py only let me transfer float > > type data, and petsc4py only lets me use int32 type indices. Besides keep > > converting the data, is there any solution for this? > > > > The modified code for matrix creation part: > > > > comm = MPI.COMM_WORLD > > rank = comm.Get_rank() > > size = comm.Get_size() > > > > if rank==0: > > row = np.loadtxt('row1000.out').astype(dtype='int32') > > col = np.loadtxt('col1000.out').astype(dtype='int32') > > val = np.loadtxt('val1000.out').astype(dtype='int32') > > m = 1000 # 1000 x 1000 matrix > > if size>1: > > rbc = np.bincount(row)*1.0 > > ieq = int(np.floor(m/size)) > > a = [ieq]*size > > ix = int(np.mod(m,size)) > > if ix>0: > > for i in range(0,ix): > > a[i]= a[i]+1 > > a = np.array([0]+a).cumsum() > > b = np.zeros(a.shape[0]-1) > > for i in range(0,a.shape[0]-1): > > b[i]=rbc[a[i]:a[i+1]].sum() # b is the send counts for > > Scatterv > > row = row.astype(dtype=float) > > col = col.astype(dtype=float) > > val = val.astype(dtype=float) > > else: > > row=None > > col=None > > val=None > > indpt=None > > b=None > > m=None > > > > if size>1: > > ml = comm.bcast(m,root=0) > > bl = comm.bcast(b,root=0) > > row_lcl = np.zeros(bl[rank]) > > col_lcl = row_lcl.copy() > > val_lcl = row_lcl.copy() > > > > comm.Scatterv([row,b],row_lcl) > > comm.Scatterv([col,b],col_lcl) > > comm.Scatterv([val,b],val_lcl) > > comm.Barrier() > > > > row_lcl = row_lcl.astype(dtype='int32') > > col_lcl = col_lcl.astype(dtype='int32') > > val_lcl = val_lcl.astype(dtype='int32') > > > > indptr = np.bincount(row_lcl) > > indptr = indptr[indptr>0] > > indptr = np.insert(indptr,0,0).cumsum() > > indptr = indptr.astype(dtype='int32') > > comm.Barrier() > > > > pA = PETSc.Mat().createAIJ(size=(ml,ml),csr=(indptr, col_lcl, > > val_lcl)) # Matrix generation > > > > else: > > indptr = np.bincount(row) > > indptr = np.insert(indptr,0,0).cumsum() > > indptr = indptr.astype(dtype='int32') > > st=time.time() > > pA = PETSc.Mat().createAIJ(size=(m,m),csr=(indptr, col, val)) > > print('dt:',time.time()-st) > > > > > > Regards, > > > > Firat > > > > > > -----Original Message----- > > From: Smith, Barry F. > > Sent: Tuesday, October 31, 2017 10:18 AM > > To: Matthew Knepley > > Cc: Cetinbas, Cankur Firat; petsc-users at mcs.anl.gov; Ahluwalia, Rajesh K. > > Subject: Re: [petsc-users] petsc4py sparse matrix construction time > > > > > > You also need to make sure that most matrix entries are generated on > > the process that they will belong on. > > > > Barry > > > > > On Oct 30, 2017, at 8:01 PM, Matthew Knepley wrote: > > > > > > On Mon, Oct 30, 2017 at 8:06 PM, Cetinbas, Cankur Firat < > > ccetinbas at anl.gov> wrote: > > > Hello, > > > > > > > > > > > > I am a beginner both in PETSc and mpi4py. I have been working on > > parallelizing our water transport code (where we solve linear system of > > equations) and I started with the toy code below. > > > > > > > > > > > > The toy code reads right hand size (rh), row, column, value vectors to > > construct sparse coefficient matrix and then scatters them to construct > > parallel PETSc coefficient matrix and right hand side vector. > > > > > > > > > > > > The sparse matrix generation time is extremely high in comparison to > > sps.csr_matrix((val, (row, col)), shape=(n,n)) in python. For instance > > python generates 181197x181197 sparse matrix in 0.06 seconds and this code > > with 32 cores:1.19s, 16 cores:6.98s and 8 cores:29.5 s. I was wondering if > > I am making a mistake in generating sparse matrix? Is there a more > > efficient way? > > > > > > > > > It looks like you do not preallocate the matrix. There is a chapter on > > this in the manual. > > > > > > Matt > > > > > > Thanks for your help in advance. > > > > > > > > > > > > Regards, > > > > > > > > > > > > Firat > > > > > > > > > > > > from petsc4py import PETSc > > > > > > from mpi4py import MPI > > > > > > import numpy as np > > > > > > import time > > > > > > > > > > > > comm = MPI.COMM_WORLD > > > > > > rank = comm.Get_rank() > > > > > > size = comm.Get_size() > > > > > > > > > > > > if rank==0: > > > > > > # proc 0 loads tomo image and does fast calculations to append row, > > col, val, rh lists > > > > > > # in the real code this vectors will be available on proc 0 no txt > > files are read > > > > > > row = np.loadtxt('row.out') # indices of non-zero rows > > > > > > col = np.loadtxt('col.out') # indices of non-zero columns > > > > > > val = np.loadtxt('vs.out') # values in the sparse matrix > > > > > > rh = np.loadtxt('RHS.out') # right hand side vector > > > > > > n = row.shape[0] #1045699 > > > > > > m = rh.shape[0] #181197 square sparse matrix size > > > > > > else: > > > > > > n = None > > > > > > m = None > > > > > > row = None > > > > > > col = None > > > > > > val = None > > > > > > rh = None > > > > > > rh_ind = None > > > > > > > > > > > > m_lcl = comm.bcast(m,root=0) > > > > > > n_lcl = comm.bcast(n,root=0) > > > > > > neq = n_lcl//size > > > > > > meq = m_lcl//size > > > > > > nx = np.mod(n_lcl,size) > > > > > > mx = np.mod(m_lcl,size) > > > > > > row_lcl = np.zeros(neq) > > > > > > col_lcl = np.zeros(neq) > > > > > > val_lcl = np.zeros(neq) > > > > > > rh_lcl = np.zeros(meq) > > > > > > a = [neq]*size #send counts for Scatterv > > > > > > am = [meq]*size #send counts for Scatterv > > > > > > > > > > > > if nx>0: > > > > > > for i in range(0,nx): > > > > > > if rank==i: > > > > > > row_lcl = np.zeros(neq+1) > > > > > > col_lcl = np.zeros(neq+1) > > > > > > val_lcl = np.zeros(neq+1) > > > > > > a[i] = a[i]+1 > > > > > > if mx>0: > > > > > > for ii in range(0,mx): > > > > > > if rank==ii: > > > > > > rh_lcl = np.zeros(meq+1) > > > > > > am[ii] = am[ii]+1 > > > > > > > > > > > > comm.Scatterv([row,a],row_lcl) > > > > > > comm.Scatterv([col,a],col_lcl) > > > > > > comm.Scatterv([val,a],val_lcl) > > > > > > comm.Scatterv([rh,am],rh_lcl) > > > > > > comm.Barrier() > > > > > > > > > > > > A = PETSc.Mat() > > > > > > A.create() > > > > > > A.setSizes([m_lcl,m_lcl]) > > > > > > A.setType('aij') > > > > > > A.setUp() > > > > > > lr = row_lcl.shape[0] > > > > > > for i in range(0,lr): > > > > > > A[row_lcl[i],col_lcl[i]] = val_lcl[i] > > > > > > A.assemblyBegin() > > > > > > A.assemblyEnd() > > > > > > > > > > > > if size>1: # to get the range for scattered vectors > > > > > > ami = [0] > > > > > > ami = np.array([0]+am).cumsum() > > > > > > for kk in range(0,size): > > > > > > if rank==kk: > > > > > > Is = ami[kk] > > > > > > Ie = ami[kk+1] > > > > > > else: > > > > > > Is=0; Ie=m_lcl > > > > > > > > > > > > b= PETSc.Vec() > > > > > > b.create() > > > > > > b.setSizes(m_lcl) > > > > > > b.setFromOptions() > > > > > > b.setUp() > > > > > > b.setValues(list(range(Is,Ie)),rh_lcl) > > > > > > b.assemblyBegin() > > > > > > b.assemblyEnd() > > > > > > > > > > > > # solution vector > > > > > > x = b.duplicate() > > > > > > x.assemblyBegin() > > > > > > x.assemblyEnd() > > > > > > > > > > > > # create linear solver > > > > > > ksp = PETSc.KSP() > > > > > > ksp.create() > > > > > > ksp.setOperators(A) > > > > > > ksp.setType('cg') > > > > > > #ksp.getPC().setType('icc') # only sequential > > > > > > ksp.getPC().setType('jacobi') > > > > > > print('solving with:', ksp.getType()) > > > > > > > > > > > > #solve > > > > > > st=time.time() > > > > > > ksp.solve(b,x) > > > > > > et=time.time() > > > > > > print(et-st) > > > > > > > > > > > > if size>1: > > > > > > #gather > > > > > > if rank==0: > > > > > > xGthr = np.zeros(m) > > > > > > else: > > > > > > xGthr = None > > > > > > comm.Gatherv(x,[xGthr,am]) > > > > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > From t.appel17 at imperial.ac.uk Wed Nov 1 10:23:53 2017 From: t.appel17 at imperial.ac.uk (Thibaut Appel) Date: Wed, 1 Nov 2017 15:23:53 +0000 Subject: [petsc-users] Problem with encapsulation of PETSc/SLEPc in Fortran Message-ID: <6a6cc504-a264-01a8-0903-2f34309d0c44@imperial.ac.uk> Dear PETSc/SLEPc users, I am encountering a problem when I try to isolate my calls to PETSc/SLEPc routines in a module. When I have a single file everything works fine, but when I have, say: - modA.f90 (Independant modules) - modB.F90 (Contains all the calls to PETSc and SLEPc) - main.f90 or main.F90 When calling EPSSolve, I keep having the error "[1]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero" plus "User provided function() line 0 in? unknown file." With gdb: "Thread 1 "main" received signal SIGFPE, Arithmetic exception. 0x00007ffff1662cf6 in dlamch_ () from /home/linuxbrew/.linuxbrew/lib/libopenblas.so.0" The PETSc/SLEPc module looks like ? USE SlepcEps ? IMPLICIT NONE #include which is the only preprocessed directive used. I tried to add more (petscsys, petscmat, petscvec, slepseps), tried to change main.f90 to main.F90 and incorporate the preprocessed directives there as well without any effect. My code calls CHKERRQ(ierr) systematically. My makefile looks like ??? include ${SLEPC_DIR}/lib/slepc/conf/slepc_common ??? INCL = -I$(PETSC_DIR)/include/ -I$(SLEPC_DIR)/include/ ??? %.o: %.f90 ??? ??? $(FC) $(FLAGS) $(INCL) -c $< -o $@ $(SLEPC_EPS_LIB) ??? %.o: %.F90 ??? ??? $(FC) $(FLAGS) $(INCL) -c $< -o $@ $(SLEPC_EPS_LIB) ??? $(EXEC): $(OBJS) ??? ??? $(FC) $(FLAGS) $(INCL) $(OBJS) -o $@ $(SLEPC_EPS_LIB) Could you spot anything I am doing wrong or dangerous? Furthermore, do you know how to avoid the warnings "Same actual argument associated with INTENT(IN) argument 'errorcode' and INTENT(OUT) argument 'ierror' at (1)" when calling CHKERRQ(ierr) when ierr is declared as INTEGER? Thanks in advance for your continued support, Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: From dnolte at dim.uchile.cl Wed Nov 1 16:45:47 2017 From: dnolte at dim.uchile.cl (David Nolte) Date: Wed, 1 Nov 2017 18:45:47 -0300 Subject: [petsc-users] GAMG advice In-Reply-To: <6169118C-34FE-491C-BCB4-A86BECCFBAA9@mcs.anl.gov> References: <47a47b6b-ce8c-10f6-0ded-bf87e9af1bbd@dim.uchile.cl> <991cd7c4-bb92-ed2c-193d-7232c1ff6199@dim.uchile.cl> <6169118C-34FE-491C-BCB4-A86BECCFBAA9@mcs.anl.gov> Message-ID: <0779aa51-17c8-0ef3-fd01-1413ee1225ea@dim.uchile.cl> Thanks Barry. By simply replacing chebychev by richardson I get similar performance with GAMG and ML (GAMG even slightly faster): -pc_type gamg??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -pc_gamg_type agg??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -pc_gamg_threshold 0.03????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -pc_gamg_square_graph 10 -pc_gamg_sym_graph -mg_levels_ksp_type richardson?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -mg_levels_pc_type sor Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix is asymmetric? For serial runs it doesn't seem to matter, but in parallel the PC setup hangs (after calls of PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set. David On 10/21/2017 12:10 AM, Barry Smith wrote: > David, > > GAMG picks the number of levels based on how the coarsening process etc proceeds. You cannot hardwire it to a particular value. You can run with -info to get more info potentially on the decisions GAMG is making. > > Barry > >> On Oct 20, 2017, at 2:06 PM, David Nolte wrote: >> >> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option >> was not taken into account: >> type: gamg >> MG: type is MULTIPLICATIVE, levels=1 cycles=v >> >> >> >> On 10/20/2017 03:32 PM, David Nolte wrote: >>> Dear all, >>> >>> I have some problems using GAMG as a preconditioner for (F)GMRES. >>> Background: I am solving the incompressible, unsteady Navier-Stokes >>> equations with a coupled mixed FEM approach, using P1/P1 elements for >>> velocity and pressure on an unstructured tetrahedron mesh with about >>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, >>> hence, no zeros on the diagonal of the pressure block. Time >>> discretization with semi-implicit backward Euler. The flow is a >>> convection dominated flow through a nozzle. >>> >>> So far, for this setup, I have been quite happy with a simple FGMRES/ML >>> solver for the full system (rather bruteforce, I admit, but much faster >>> than any block/Schur preconditioners I tried): >>> >>> -ksp_converged_reason >>> -ksp_monitor_true_residual >>> -ksp_type fgmres >>> -ksp_rtol 1.0e-6 >>> -ksp_initial_guess_nonzero >>> >>> -pc_type ml >>> -pc_ml_Threshold 0.03 >>> -pc_ml_maxNlevels 3 >>> >>> This setup converges in ~100 iterations (see below the ksp_view output) >>> to rtol: >>> >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 >>> >>> >>> Now I'd like to try GAMG instead of ML. However, I don't know how to set >>> it up to get similar performance. >>> The obvious/naive >>> >>> -pc_type gamg >>> -pc_gamg_type agg >>> >>> # with and without >>> -pc_gamg_threshold 0.03 >>> -pc_mg_levels 3 >>> >>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per >>> proc), for instance: >>> np = 1: >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 >>> >>> np = 8: >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 >>> >>> A very high threshold seems to improve the GAMG PC, for instance with >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations. >>> What else should I try? >>> >>> I would very much appreciate any advice on configuring GAMG and >>> differences w.r.t ML to be taken into account (not a multigrid expert >>> though). >>> >>> Thanks, best wishes >>> David >>> >>> >>> ------ >>> ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3 >>> >>> KSP Object: 1 MPI processes >>> type: fgmres >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>> Orthogonalization with no iterative refinement >>> GMRES: happy breakdown tolerance 1e-30 >>> maximum iterations=10000 >>> tolerances: relative=1e-06, absolute=1e-50, divergence=10000. >>> right preconditioning >>> using nonzero initial guess >>> using UNPRECONDITIONED norm type for convergence test >>> PC Object: 1 MPI processes >>> type: gamg >>> MG: type is MULTIPLICATIVE, levels=1 cycles=v >>> Cycles per PCApply=1 >>> Using Galerkin computed coarse grid matrices >>> GAMG specific options >>> Threshold for dropping small values from graph 0.75 >>> AGG specific options >>> Symmetric graph false >>> Coarse grid solver -- level ------------------------------- >>> KSP Object: (mg_levels_0_) 1 MPI processes >>> type: preonly >>> maximum iterations=2, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (mg_levels_0_) 1 MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local iterations = >>> 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=1745224, cols=1745224 >>> total: nonzeros=99452608, allocated nonzeros=99452608 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node routines: found 1037847 nodes, limit used is 5 >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=1745224, cols=1745224 >>> total: nonzeros=99452608, allocated nonzeros=99452608 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node routines: found 1037847 nodes, limit used is 5 >>> >>> >>> ------ >>> ksp_view for -pc_type ml: >>> >>> KSP Object: 8 MPI processes >>> type: fgmres >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>> Orthogonalization with no iterative refinement >>> GMRES: happy breakdown tolerance 1e-30 >>> maximum iterations=10000 >>> tolerances: relative=1e-06, absolute=1e-50, divergence=10000. >>> right preconditioning >>> using nonzero initial guess >>> using UNPRECONDITIONED norm type for convergence test >>> PC Object: 8 MPI processes >>> type: ml >>> MG: type is MULTIPLICATIVE, levels=3 cycles=v >>> Cycles per PCApply=1 >>> Using Galerkin computed coarse grid matrices >>> Coarse grid solver -- level ------------------------------- >>> KSP Object: (mg_coarse_) 8 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (mg_coarse_) 8 MPI processes >>> type: redundant >>> Redundant preconditioner: First (color=0) of 8 PCs follows >>> KSP Object: (mg_coarse_redundant_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (mg_coarse_redundant_) 1 MPI processes >>> type: lu >>> LU: out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> using diagonal shift on blocks to prevent zero pivot [INBLOCKS] >>> matrix ordering: nd >>> factor fill ratio given 5., needed 10.4795 >>> Factored matrix follows: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=6822, cols=6822 >>> package used to perform factorization: petsc >>> total: nonzeros=9575688, allocated nonzeros=9575688 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=6822, cols=6822 >>> total: nonzeros=913758, allocated nonzeros=913758 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 8 MPI processes >>> type: mpiaij >>> rows=6822, cols=6822 >>> total: nonzeros=913758, allocated nonzeros=913758 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node (on process 0) routines >>> Down solver (pre-smoother) on level 1 ------------------------------- >>> KSP Object: (mg_levels_1_) 8 MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: (mg_levels_1_) 8 MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local iterations = >>> 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 8 MPI processes >>> type: mpiaij >>> rows=67087, cols=67087 >>> total: nonzeros=9722749, allocated nonzeros=9722749 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node (on process 0) routines >>> Up solver (post-smoother) same as down solver (pre-smoother) >>> Down solver (pre-smoother) on level 2 ------------------------------- >>> KSP Object: (mg_levels_2_) 8 MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: (mg_levels_2_) 8 MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local iterations = >>> 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 8 MPI processes >>> type: mpiaij >>> rows=1745224, cols=1745224 >>> total: nonzeros=99452608, allocated nonzeros=99452608 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node (on process 0) routines: found 126690 nodes, >>> limit used is 5 >>> Up solver (post-smoother) same as down solver (pre-smoother) >>> linear system matrix = precond matrix: >>> Mat Object: 8 MPI processes >>> type: mpiaij >>> rows=1745224, cols=1745224 >>> total: nonzeros=99452608, allocated nonzeros=99452608 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node (on process 0) routines: found 126690 nodes, limit >>> used is 5 >>> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From rchurchi at pppl.gov Wed Nov 1 16:36:17 2017 From: rchurchi at pppl.gov (Randy Michael Churchill) Date: Wed, 1 Nov 2017 16:36:17 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: OK, this might not be completely satisfactory, because it doesn't show the partitioning or how the matrix is created, but this reproduces the problem. I wrote out my matrix, Amat, from the larger simulation, and load it in this script. This must be run with MPI rank greater than 1. This may be some combination of my petsc.rc, because when I use the PetscInitialize with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) it runs fine. On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: > Randy: > It could be a bug or a missing feature in our new > MatCreateSubMatrix_MPIAIJ_SameRowDist(). > It would be helpful if you can provide us a simple example that produces > this example. > Hong > > I'm running a Fortran code that was just changed over to using petsc 3.8 >> (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. The >> error is "unsorted iscol_local is not implemented yet" (see full error >> below). I tried to trace down the difference in the source files, but where >> the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't seem >> to have existed in v3.7.6, so I'm unsure how to compare. It seems the error >> is that the order of the columns locally are unsorted, though I don't think >> I specify a column order in the creation of the matrix: >> call MatCreate(this%comm,AA,ierr) >> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >> call MatSetType(AA,MATAIJ,ierr) >> call MatSetup(AA,ierr) >> call MatGetOwnershipRange(AA,low,high,ierr) >> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >> call getNNZ(grid,npetscloc,low,high,d_nnz,o_nnz,this%xgc_petsc,nr >> eal,ierr) >> call MatSeqAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,ierr) >> call MatMPIAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,PETSC_ >> NULL_INTEGER,o_nnz,ierr) >> deallocate(d_nnz,o_nnz) >> call MatSetOption(AA,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE,ierr) >> call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr) >> call MatSetup(AA,ierr) >> >> >> [62]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [62]PETSC ERROR: No support for this operation for this object type >> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC ERROR: #1 >> MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >> [62]PETSC ERROR: #6 PCSetUp() line 924 in /global/u1/r/rchurchi/petsc/3. >> 8.0/src/ksp/pc/interface/precon.c >> [62]PETSC ERROR: #7 KSPSetUp() line 378 in /global/u1/r/rchurchi/petsc/3. >> 8.0/src/ksp/ksp/interface/itfunc.c >> >> -- >> R. Michael Churchill >> > > -- R. Michael Churchill -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Amat.dat Type: application/octet-stream Size: 7795408 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Amat.dat.info Type: application/octet-stream Size: 21 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc.rc Type: application/octet-stream Size: 10457 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_petsc38.F90 Type: application/octet-stream Size: 1407 bytes Desc: not available URL: From hzhang at mcs.anl.gov Wed Nov 1 20:23:23 2017 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 1 Nov 2017 20:23:23 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Randy: Thanks, I'll check it tomorrow. Hong > OK, this might not be completely satisfactory, because it doesn't show the > partitioning or how the matrix is created, but this reproduces the problem. > I wrote out my matrix, Amat, from the larger simulation, and load it in > this script. This must be run with MPI rank greater than 1. This may be > some combination of my petsc.rc, because when I use the PetscInitialize > with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) > it runs fine. > > > On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: > >> Randy: >> It could be a bug or a missing feature in our new >> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >> It would be helpful if you can provide us a simple example that produces >> this example. >> Hong >> >> I'm running a Fortran code that was just changed over to using petsc 3.8 >>> (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. The >>> error is "unsorted iscol_local is not implemented yet" (see full error >>> below). I tried to trace down the difference in the source files, but where >>> the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't seem >>> to have existed in v3.7.6, so I'm unsure how to compare. It seems the error >>> is that the order of the columns locally are unsorted, though I don't think >>> I specify a column order in the creation of the matrix: >>> call MatCreate(this%comm,AA,ierr) >>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>> call MatSetType(AA,MATAIJ,ierr) >>> call MatSetup(AA,ierr) >>> call MatGetOwnershipRange(AA,low,high,ierr) >>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>> call getNNZ(grid,npetscloc,low,high,d_nnz,o_nnz,this%xgc_petsc,nr >>> eal,ierr) >>> call MatSeqAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,ierr) >>> call MatMPIAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,PETSC_ >>> NULL_INTEGER,o_nnz,ierr) >>> deallocate(d_nnz,o_nnz) >>> call MatSetOption(AA,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE,ierr) >>> call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr) >>> call MatSetup(AA,ierr) >>> >>> >>> [62]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [62]PETSC ERROR: No support for this operation for this object type >>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC ERROR: #1 >>> MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>> [62]PETSC ERROR: #6 PCSetUp() line 924 in /global/u1/r/rchurchi/petsc/3. >>> 8.0/src/ksp/pc/interface/precon.c >>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in /global/u1/r/rchurchi/petsc/3. >>> 8.0/src/ksp/ksp/interface/itfunc.c >>> >>> -- >>> R. Michael Churchill >>> >> >> > > > -- > R. Michael Churchill > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rchurchi at pppl.gov Wed Nov 1 20:36:42 2017 From: rchurchi at pppl.gov (Randy Michael Churchill) Date: Wed, 1 Nov 2017 20:36:42 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Doing some additional testing, the issue goes away when removing the gamg preconditioner line from the petsc.rc: -pc_type gamg On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: > Randy: > Thanks, I'll check it tomorrow. > Hong > > OK, this might not be completely satisfactory, because it doesn't show the >> partitioning or how the matrix is created, but this reproduces the problem. >> I wrote out my matrix, Amat, from the larger simulation, and load it in >> this script. This must be run with MPI rank greater than 1. This may be >> some combination of my petsc.rc, because when I use the PetscInitialize >> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >> it runs fine. >> >> >> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >> >>> Randy: >>> It could be a bug or a missing feature in our new >>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>> It would be helpful if you can provide us a simple example that produces >>> this example. >>> Hong >>> >>> I'm running a Fortran code that was just changed over to using petsc 3.8 >>>> (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. The >>>> error is "unsorted iscol_local is not implemented yet" (see full error >>>> below). I tried to trace down the difference in the source files, but where >>>> the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't >>>> seem to have existed in v3.7.6, so I'm unsure how to compare. It seems the >>>> error is that the order of the columns locally are unsorted, though I don't >>>> think I specify a column order in the creation of the matrix: >>>> call MatCreate(this%comm,AA,ierr) >>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>> call MatSetType(AA,MATAIJ,ierr) >>>> call MatSetup(AA,ierr) >>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>> call getNNZ(grid,npetscloc,low,high,d_nnz,o_nnz,this%xgc_petsc,nr >>>> eal,ierr) >>>> call MatSeqAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,ierr) >>>> call MatMPIAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,PETSC_ >>>> NULL_INTEGER,o_nnz,ierr) >>>> deallocate(d_nnz,o_nnz) >>>> call MatSetOption(AA,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE,ierr) >>>> call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr) >>>> call MatSetup(AA,ierr) >>>> >>>> >>>> [62]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [62]PETSC ERROR: No support for this operation for this object type >>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>> ocumentation/faq.html for trouble shooting. >>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC ERROR: >>>> #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in /global/u1/r/rchurchi/petsc/3. >>>> 8.0/src/ksp/pc/interface/precon.c >>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c >>>> >>>> -- >>>> R. Michael Churchill >>>> >>> >>> >> >> >> -- >> R. Michael Churchill >> > > -- R. Michael Churchill -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Wed Nov 1 22:32:07 2017 From: zakaryah at gmail.com (zakaryah .) Date: Wed, 1 Nov 2017 23:32:07 -0400 Subject: [petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods In-Reply-To: References: <87zi979alu.fsf@jedbrown.org> <2E811513-A851-4F84-A93F-BE83D56584BB@mcs.anl.gov> <6FA17D2F-1EB3-4EC0-B13F-B19922011797@glasgow.ac.uk> <877evnyyty.fsf@jedbrown.org> <29550785-0F7A-488B-A159-DD42DC29A228@mcs.anl.gov> <87inf4vsld.fsf@jedbrown.org> <87k1zisl6a.fsf@jedbrown.org> <1553F760-492C-4394-BDBF-19B2A69A8517@mcs.anl.gov> <40201D61-0CA9-4BE0-9075-2651CF66CDC3@mcs.anl.gov> <7897599A-B802-4CEB-B188-DC28FB79482C@mcs.anl.gov> Message-ID: I worked on the assumptions in my previous email and I at least partially implemented the function to assign the couplings. For row 0, which is the redundant field, I set dnz[0] to end-start, and onz[0] to the size of the matrix minus dnz[0]. For all other rows, I just increment the existing values of dnz[i] and onz[i], since the coupling to the redundant field adds one extra element beyond what's allocated for the DMDA stencil. I see in the source that the FormCoupleLocations function is called once if the DM has PREALLOC_ONLY set to true, but twice otherwise. I assume that the second call is for setting the nonzero structure. Do I need to do this? In any case, something is still not right. Even with the extra elements preallocated, the first assembly of the matrix is very slow. I ran a test problem on a single process with -info, and got this: 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 18018 unneeded,629704 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446774208 30818112 [0] DMGetDMSNES(): Creating new DMSNES [0] DMGetDMKSP(): Creating new DMKSP [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 0 SNES Function norm 2.302381528359e+00 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 126132 unneeded,647722 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 9610 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0 unneeded,647722 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. The 9610 mallocs during MatSetValues seem suspicious and are probably what's taking so long with larger problems. 601 of them are apparently in the call to set Jbh, and 9009 are in the call to set Jhb (b is the redundant field, h is the DMDA field). If I run with more than one process, I get a segfault when a process which has rank greater than 0 sets dnz or onz in the FormCoupleLocations call. On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . wrote: > Thanks Barry, that looks like exactly what I need. I'm looking at pack.c > and packm.c and I want to check my understanding of what my coupling > function should do. The relevant line in *DMCreateMatrix_Composite_AIJ *seems > to be: > > (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__ > nrows,__start,__end); > > and I infer that dnz and onz are the number of nonzero elements in the > diagonal and off-diagonal submatrices, for each row of the DMComposite > matrix. I suppose I can just set each of these in a for loop, but can I > use the arguments to FormCoupleLocations as the range for the loop? Which > ones - __rstart to __rstart+__nrows? How can I determine the number of > rows on each processor from within the function that I pass? From the > preallocation macros it looks like __start to __end describe the range of > the columns of the diagonal submatrix - is that right? It looks like the > ranges will be specific to each processor. Do I just set the values in dnz > and onz, or do I need to reduce them? > > Thanks for all the help! Maybe if I get things working I can carve out > the core of the code to make an example program for DMRedundant/Composite. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Nov 2 08:51:25 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 2 Nov 2017 09:51:25 -0400 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill wrote: > Doing some additional testing, the issue goes away when removing the gamg > preconditioner line from the petsc.rc: > -pc_type gamg > Yea, this is GAMG setup. This is the code. findices is create with ISCreateStride, so it is sorted ... Michael is repartitioning the coarse grids. Maybe we don't have a regression test with this... I will try to reproduce this. Michael: you can use hypre for now, or turn repartitioning off (eg, -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm not sure this will fix this. You don't have hypre parameters for all of your all of your solvers. I think 'boomeramg' is the default pc_hypre_type. That should be good enough for you. { IS findices; PetscInt Istart,Iend; Mat Pnew; ierr = MatGetOwnershipRange(Pold, &Istart, &Iend);CHKERRQ(ierr); #if defined PETSC_GAMG_USE_LOG ierr = PetscLogEventBegin(petsc_gamg_setup_events[SET15],0,0,0,0);CHKERRQ(ierr); #endif ierr = ISCreateStride(comm,Iend-Istart,Istart,1,&findices);CHKERRQ(ierr); ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); ierr = ISDestroy(&findices);CHKERRQ(ierr); #if defined PETSC_GAMG_USE_LOG ierr = PetscLogEventEnd(petsc_gamg_setup_events[SET15],0,0,0,0);CHKERRQ(ierr); #endif ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); /* output - repartitioned */ *a_P_inout = Pnew; } > > On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: > >> Randy: >> Thanks, I'll check it tomorrow. >> Hong >> >> OK, this might not be completely satisfactory, because it doesn't show >>> the partitioning or how the matrix is created, but this reproduces the >>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>> it in this script. This must be run with MPI rank greater than 1. This may >>> be some combination of my petsc.rc, because when I use the PetscInitialize >>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>> it runs fine. >>> >>> >>> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >>> >>>> Randy: >>>> It could be a bug or a missing feature in our new >>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>> It would be helpful if you can provide us a simple example that >>>> produces this example. >>>> Hong >>>> >>>> I'm running a Fortran code that was just changed over to using petsc >>>>> 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. >>>>> The error is "unsorted iscol_local is not implemented yet" (see full error >>>>> below). I tried to trace down the difference in the source files, but where >>>>> the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't >>>>> seem to have existed in v3.7.6, so I'm unsure how to compare. It seems the >>>>> error is that the order of the columns locally are unsorted, though I don't >>>>> think I specify a column order in the creation of the matrix: >>>>> call MatCreate(this%comm,AA,ierr) >>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>> call MatSetType(AA,MATAIJ,ierr) >>>>> call MatSetup(AA,ierr) >>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>> call getNNZ(grid,npetscloc,low,high,d_nnz,o_nnz,this%xgc_petsc,nr >>>>> eal,ierr) >>>>> call MatSeqAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,ierr) >>>>> call MatMPIAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,PETSC_ >>>>> NULL_INTEGER,o_nnz,ierr) >>>>> deallocate(d_nnz,o_nnz) >>>>> call MatSetOption(AA,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE,ierr) >>>>> call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr) >>>>> call MatSetup(AA,ierr) >>>>> >>>>> >>>>> [62]PETSC ERROR: --------------------- Error Message >>>>> -------------------------------------------------------------- >>>>> [62]PETSC ERROR: No support for this operation for this object type >>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>> ocumentation/faq.html for trouble shooting. >>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC ERROR: >>>>> #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c >>>>> >>>>> -- >>>>> R. Michael Churchill >>>>> >>>> >>>> >>> >>> >>> -- >>> R. Michael Churchill >>> >> >> > > > -- > R. Michael Churchill > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rchurchi at pppl.gov Thu Nov 2 09:27:31 2017 From: rchurchi at pppl.gov (Randy Michael Churchill) Date: Thu, 2 Nov 2017 09:27:31 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Thanks Mark, both your suggestions of using pc hypre or turning off repartitioning does indeed make the error go away. On Thu, Nov 2, 2017 at 8:51 AM, Mark Adams wrote: > > > On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill > wrote: > >> Doing some additional testing, the issue goes away when removing the gamg >> preconditioner line from the petsc.rc: >> -pc_type gamg >> > > Yea, this is GAMG setup. > > This is the code. findices is create with ISCreateStride, so it is sorted > ... > > Michael is repartitioning the coarse grids. Maybe we don't have a > regression test with this... > > I will try to reproduce this. > > Michael: you can use hypre for now, or turn repartitioning off (eg, > -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm not sure > this will fix this. > > You don't have hypre parameters for all of your all of your solvers. I > think 'boomeramg' is the default pc_hypre_type. That should be good enough > for you. > > > { > IS findices; > PetscInt Istart,Iend; > Mat Pnew; > > ierr = MatGetOwnershipRange(Pold, &Istart, &Iend);CHKERRQ(ierr); > #if defined PETSC_GAMG_USE_LOG > ierr = PetscLogEventBegin(petsc_gamg_setup_events[SET15],0,0,0,0); > CHKERRQ(ierr); > #endif > ierr = ISCreateStride(comm,Iend-Istart,Istart,1,&findices); > CHKERRQ(ierr); > ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); > ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, > MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); > ierr = ISDestroy(&findices);CHKERRQ(ierr); > > #if defined PETSC_GAMG_USE_LOG > ierr = PetscLogEventEnd(petsc_gamg_setup_events[SET15],0,0,0,0); > CHKERRQ(ierr); > #endif > ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); > > /* output - repartitioned */ > *a_P_inout = Pnew; > } > > >> >> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >> >>> Randy: >>> Thanks, I'll check it tomorrow. >>> Hong >>> >>> OK, this might not be completely satisfactory, because it doesn't show >>>> the partitioning or how the matrix is created, but this reproduces the >>>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>>> it in this script. This must be run with MPI rank greater than 1. This may >>>> be some combination of my petsc.rc, because when I use the PetscInitialize >>>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>>> it runs fine. >>>> >>>> >>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >>>> >>>>> Randy: >>>>> It could be a bug or a missing feature in our new >>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>> It would be helpful if you can provide us a simple example that >>>>> produces this example. >>>>> Hong >>>>> >>>>> I'm running a Fortran code that was just changed over to using petsc >>>>>> 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. >>>>>> The error is "unsorted iscol_local is not implemented yet" (see full error >>>>>> below). I tried to trace down the difference in the source files, but where >>>>>> the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't >>>>>> seem to have existed in v3.7.6, so I'm unsure how to compare. It seems the >>>>>> error is that the order of the columns locally are unsorted, though I don't >>>>>> think I specify a column order in the creation of the matrix: >>>>>> call MatCreate(this%comm,AA,ierr) >>>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>> call MatSetup(AA,ierr) >>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>> call getNNZ(grid,npetscloc,low,high >>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>> call MatSeqAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,ierr) >>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>> deallocate(d_nnz,o_nnz) >>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>> call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr) >>>>>> call MatSetup(AA,ierr) >>>>>> >>>>>> >>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>> -------------------------------------------------------------- >>>>>> [62]PETSC ERROR: No support for this operation for this object type >>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>> ocumentation/faq.html for trouble shooting. >>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC ERROR: >>>>>> #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c >>>>>> >>>>>> -- >>>>>> R. Michael Churchill >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> R. Michael Churchill >>>> >>> >>> >> >> >> -- >> R. Michael Churchill >> > > -- R. Michael Churchill -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Nov 2 09:28:47 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 2 Nov 2017 10:28:47 -0400 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: I am able to reproduce this with snes ex56 with 2 processors and adding -pc_gamg_repartition true I'm not sure how to fix it. 10:26 1 knepley/feature-plex-boxmesh-create *= ~/Codes/petsc/src/snes/examples/tutorials$ make PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-g runex /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason -snes_monitor_short -ksp_monitor_short -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi -petscpartitioner_type simple -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true [0] 27 global equations, 9 vertices [0] 27 equations in vector, 9 vertices 0 SNES Function norm 122.396 0 KSP Residual norm 122.396 1 KSP Residual norm 20.4696 2 KSP Residual norm 3.95009 3 KSP Residual norm 0.176181 4 KSP Residual norm 0.0208781 5 KSP Residual norm 0.00278873 6 KSP Residual norm 0.000482741 7 KSP Residual norm 4.68085e-05 8 KSP Residual norm 5.42381e-06 9 KSP Residual norm 5.12785e-07 10 KSP Residual norm 2.60389e-08 11 KSP Residual norm 4.96201e-09 12 KSP Residual norm 1.989e-10 Linear solve converged due to CONVERGED_RTOL iterations 12 1 SNES Function norm 1.990e-10 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 DM Object: Mesh (ex56_) 2 MPI processes type: plex Mesh in 3 dimensions: 0-cells: 12 12 1-cells: 20 20 2-cells: 11 11 3-cells: 2 2 Labels: boundary: 1 strata with value/size (1 (39)) Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), 6 (1)) marker: 1 strata with value/size (1 (27)) depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) [0] 441 global equations, 147 vertices [0] 441 equations in vector, 147 vertices 0 SNES Function norm 49.7106 0 KSP Residual norm 49.7106 1 KSP Residual norm 12.9252 2 KSP Residual norm 2.38019 3 KSP Residual norm 0.426307 4 KSP Residual norm 0.0692155 5 KSP Residual norm 0.0123092 6 KSP Residual norm 0.00184874 7 KSP Residual norm 0.000320761 8 KSP Residual norm 5.48957e-05 9 KSP Residual norm 9.90089e-06 10 KSP Residual norm 1.5127e-06 11 KSP Residual norm 2.82192e-07 12 KSP Residual norm 4.62364e-08 13 KSP Residual norm 7.99573e-09 14 KSP Residual norm 1.3028e-09 15 KSP Residual norm 2.174e-10 Linear solve converged due to CONVERGED_RTOL iterations 15 1 SNES Function norm 2.174e-10 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 DM Object: Mesh (ex56_) 2 MPI processes type: plex Mesh in 3 dimensions: 0-cells: 45 45 1-cells: 96 96 2-cells: 68 68 3-cells: 16 16 Labels: marker: 1 strata with value/size (1 (129)) Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 (9), 6 (9)) boundary: 1 strata with value/size (1 (141)) depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) [0] 4725 global equations, 1575 vertices [0] 4725 equations in vector, 1575 vertices 0 SNES Function norm 17.9091 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: unsorted iscol_local is not implemented yet [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: No support for this operation for this object type [1]PETSC ERROR: unsorted iscol_local is not implemented yet On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams wrote: > > > On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill > wrote: > >> Doing some additional testing, the issue goes away when removing the gamg >> preconditioner line from the petsc.rc: >> -pc_type gamg >> > > Yea, this is GAMG setup. > > This is the code. findices is create with ISCreateStride, so it is sorted > ... > > Michael is repartitioning the coarse grids. Maybe we don't have a > regression test with this... > > I will try to reproduce this. > > Michael: you can use hypre for now, or turn repartitioning off (eg, > -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm not sure > this will fix this. > > You don't have hypre parameters for all of your all of your solvers. I > think 'boomeramg' is the default pc_hypre_type. That should be good enough > for you. > > > { > IS findices; > PetscInt Istart,Iend; > Mat Pnew; > > ierr = MatGetOwnershipRange(Pold, &Istart, &Iend);CHKERRQ(ierr); > #if defined PETSC_GAMG_USE_LOG > ierr = PetscLogEventBegin(petsc_gamg_setup_events[SET15],0,0,0,0); > CHKERRQ(ierr); > #endif > ierr = ISCreateStride(comm,Iend-Istart,Istart,1,&findices); > CHKERRQ(ierr); > ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); > ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, > MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); > ierr = ISDestroy(&findices);CHKERRQ(ierr); > > #if defined PETSC_GAMG_USE_LOG > ierr = PetscLogEventEnd(petsc_gamg_setup_events[SET15],0,0,0,0); > CHKERRQ(ierr); > #endif > ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); > > /* output - repartitioned */ > *a_P_inout = Pnew; > } > > >> >> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >> >>> Randy: >>> Thanks, I'll check it tomorrow. >>> Hong >>> >>> OK, this might not be completely satisfactory, because it doesn't show >>>> the partitioning or how the matrix is created, but this reproduces the >>>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>>> it in this script. This must be run with MPI rank greater than 1. This may >>>> be some combination of my petsc.rc, because when I use the PetscInitialize >>>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>>> it runs fine. >>>> >>>> >>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >>>> >>>>> Randy: >>>>> It could be a bug or a missing feature in our new >>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>> It would be helpful if you can provide us a simple example that >>>>> produces this example. >>>>> Hong >>>>> >>>>> I'm running a Fortran code that was just changed over to using petsc >>>>>> 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. >>>>>> The error is "unsorted iscol_local is not implemented yet" (see full error >>>>>> below). I tried to trace down the difference in the source files, but where >>>>>> the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't >>>>>> seem to have existed in v3.7.6, so I'm unsure how to compare. It seems the >>>>>> error is that the order of the columns locally are unsorted, though I don't >>>>>> think I specify a column order in the creation of the matrix: >>>>>> call MatCreate(this%comm,AA,ierr) >>>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>> call MatSetup(AA,ierr) >>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>> call getNNZ(grid,npetscloc,low,high >>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>> call MatSeqAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,ierr) >>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>> deallocate(d_nnz,o_nnz) >>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>> call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr) >>>>>> call MatSetup(AA,ierr) >>>>>> >>>>>> >>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>> -------------------------------------------------------------- >>>>>> [62]PETSC ERROR: No support for this operation for this object type >>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>> ocumentation/faq.html for trouble shooting. >>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC ERROR: >>>>>> #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c >>>>>> >>>>>> -- >>>>>> R. Michael Churchill >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> R. Michael Churchill >>>> >>> >>> >> >> >> -- >> R. Michael Churchill >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Nov 2 10:02:40 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 2 Nov 2017 15:02:40 +0000 Subject: [petsc-users] Problem with encapsulation of PETSc/SLEPc in Fortran In-Reply-To: <6a6cc504-a264-01a8-0903-2f34309d0c44@imperial.ac.uk> References: <6a6cc504-a264-01a8-0903-2f34309d0c44@imperial.ac.uk> Message-ID: > On Nov 1, 2017, at 10:23 AM, Thibaut Appel wrote: > > Dear PETSc/SLEPc users, > > I am encountering a problem when I try to isolate my calls to PETSc/SLEPc routines in a module. When I have a single file everything works fine, but when I have, say: > - modA.f90 (Independant modules) > - modB.F90 (Contains all the calls to PETSc and SLEPc) > - main.f90 or main.F90 > When calling EPSSolve, I keep having the error "[1]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero" plus "User provided function() line 0 in unknown file." > With gdb: "Thread 1 "main" received signal SIGFPE, Arithmetic exception. 0x00007ffff1662cf6 in dlamch_ () from /home/linuxbrew/.linuxbrew/lib/libopenblas.so.0" > > The PETSc/SLEPc module looks like > USE SlepcEps > IMPLICIT NONE > #include > > which is the only preprocessed directive used. I tried to add more (petscsys, petscmat, petscvec, slepseps), tried to change main.f90 to main.F90 and incorporate the preprocessed directives there as well without any effect. My code calls CHKERRQ(ierr) systematically. My makefile looks like > > include ${SLEPC_DIR}/lib/slepc/conf/slepc_common > INCL = -I$(PETSC_DIR)/include/ -I$(SLEPC_DIR)/include/ > %.o: %.f90 > $(FC) $(FLAGS) $(INCL) -c $< -o $@ $(SLEPC_EPS_LIB) > %.o: %.F90 > $(FC) $(FLAGS) $(INCL) -c $< -o $@ $(SLEPC_EPS_LIB) > $(EXEC): $(OBJS) > $(FC) $(FLAGS) $(INCL) $(OBJS) -o $@ $(SLEPC_EPS_LIB) > > Could you spot anything I am doing wrong or dangerous? No idea. Run in the debugger to find out where the problem occurs. > Furthermore, do you know how to avoid the warnings "Same actual argument associated with INTENT(IN) argument 'errorcode' and INTENT(OUT) argument 'ierror' at (1)" when calling CHKERRQ(ierr) when ierr is declared as INTEGER? Upgrade PETSc to remove this error. > > Thanks in advance for your continued support, > > Thibaut From hzhang at mcs.anl.gov Thu Nov 2 10:07:28 2017 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 2 Nov 2017 10:07:28 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Mark, I can reproduce this in an old branch, but not in current maint and master. Which branch are you using to produce this error? Hong On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams wrote: > I am able to reproduce this with snes ex56 with 2 processors and adding > -pc_gamg_repartition true > > I'm not sure how to fix it. > > 10:26 1 knepley/feature-plex-boxmesh-create *= ~/Codes/petsc/src/snes/examples/tutorials$ > make PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-g > runex > /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 ./ex56 > -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it > 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned > -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 > -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true > -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > -ksp_converged_reason -snes_monitor_short -ksp_monitor_short > -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig > 0,0.05,0,1.05 -mg_levels_pc_type jacobi -petscpartitioner_type simple > -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 > -pc_gamg_repartition true > [0] 27 global equations, 9 vertices > [0] 27 equations in vector, 9 vertices > 0 SNES Function norm 122.396 > 0 KSP Residual norm 122.396 > 1 KSP Residual norm 20.4696 > 2 KSP Residual norm 3.95009 > 3 KSP Residual norm 0.176181 > 4 KSP Residual norm 0.0208781 > 5 KSP Residual norm 0.00278873 > 6 KSP Residual norm 0.000482741 > 7 KSP Residual norm 4.68085e-05 > 8 KSP Residual norm 5.42381e-06 > 9 KSP Residual norm 5.12785e-07 > 10 KSP Residual norm 2.60389e-08 > 11 KSP Residual norm 4.96201e-09 > 12 KSP Residual norm 1.989e-10 > Linear solve converged due to CONVERGED_RTOL iterations 12 > 1 SNES Function norm 1.990e-10 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > DM Object: Mesh (ex56_) 2 MPI processes > type: plex > Mesh in 3 dimensions: > 0-cells: 12 12 > 1-cells: 20 20 > 2-cells: 11 11 > 3-cells: 2 2 > Labels: > boundary: 1 strata with value/size (1 (39)) > Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), 6 (1)) > marker: 1 strata with value/size (1 (27)) > depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) > [0] 441 global equations, 147 vertices > [0] 441 equations in vector, 147 vertices > 0 SNES Function norm 49.7106 > 0 KSP Residual norm 49.7106 > 1 KSP Residual norm 12.9252 > 2 KSP Residual norm 2.38019 > 3 KSP Residual norm 0.426307 > 4 KSP Residual norm 0.0692155 > 5 KSP Residual norm 0.0123092 > 6 KSP Residual norm 0.00184874 > 7 KSP Residual norm 0.000320761 > 8 KSP Residual norm 5.48957e-05 > 9 KSP Residual norm 9.90089e-06 > 10 KSP Residual norm 1.5127e-06 > 11 KSP Residual norm 2.82192e-07 > 12 KSP Residual norm 4.62364e-08 > 13 KSP Residual norm 7.99573e-09 > 14 KSP Residual norm 1.3028e-09 > 15 KSP Residual norm 2.174e-10 > Linear solve converged due to CONVERGED_RTOL iterations 15 > 1 SNES Function norm 2.174e-10 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > DM Object: Mesh (ex56_) 2 MPI processes > type: plex > Mesh in 3 dimensions: > 0-cells: 45 45 > 1-cells: 96 96 > 2-cells: 68 68 > 3-cells: 16 16 > Labels: > marker: 1 strata with value/size (1 (129)) > Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 (9), 6 > (9)) > boundary: 1 strata with value/size (1 (141)) > depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) > [0] 4725 global equations, 1575 vertices > [0] 4725 equations in vector, 1575 vertices > 0 SNES Function norm 17.9091 > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: unsorted iscol_local is not implemented yet > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: No support for this operation for this object type > [1]PETSC ERROR: unsorted iscol_local is not implemented yet > > > On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams wrote: > >> >> >> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >> rchurchi at pppl.gov> wrote: >> >>> Doing some additional testing, the issue goes away when removing the >>> gamg preconditioner line from the petsc.rc: >>> -pc_type gamg >>> >> >> Yea, this is GAMG setup. >> >> This is the code. findices is create with ISCreateStride, so it is >> sorted ... >> >> Michael is repartitioning the coarse grids. Maybe we don't have a >> regression test with this... >> >> I will try to reproduce this. >> >> Michael: you can use hypre for now, or turn repartitioning off (eg, >> -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm not >> sure this will fix this. >> >> You don't have hypre parameters for all of your all of your solvers. I >> think 'boomeramg' is the default pc_hypre_type. That should be good enough >> for you. >> >> >> { >> IS findices; >> PetscInt Istart,Iend; >> Mat Pnew; >> >> ierr = MatGetOwnershipRange(Pold, &Istart, &Iend);CHKERRQ(ierr); >> #if defined PETSC_GAMG_USE_LOG >> ierr = PetscLogEventBegin(petsc_gamg_setup_events[SET15],0,0,0,0);C >> HKERRQ(ierr); >> #endif >> ierr = ISCreateStride(comm,Iend-Istart,Istart,1,&findices);CHKERRQ( >> ierr); >> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >> ierr = ISDestroy(&findices);CHKERRQ(ierr); >> >> #if defined PETSC_GAMG_USE_LOG >> ierr = PetscLogEventEnd(petsc_gamg_setup_events[SET15],0,0,0,0);CHK >> ERRQ(ierr); >> #endif >> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >> >> /* output - repartitioned */ >> *a_P_inout = Pnew; >> } >> >> >>> >>> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >>> >>>> Randy: >>>> Thanks, I'll check it tomorrow. >>>> Hong >>>> >>>> OK, this might not be completely satisfactory, because it doesn't show >>>>> the partitioning or how the matrix is created, but this reproduces the >>>>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>>>> it in this script. This must be run with MPI rank greater than 1. This may >>>>> be some combination of my petsc.rc, because when I use the PetscInitialize >>>>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>>>> it runs fine. >>>>> >>>>> >>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >>>>> >>>>>> Randy: >>>>>> It could be a bug or a missing feature in our new >>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>> It would be helpful if you can provide us a simple example that >>>>>> produces this example. >>>>>> Hong >>>>>> >>>>>> I'm running a Fortran code that was just changed over to using petsc >>>>>>> 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. >>>>>>> The error is "unsorted iscol_local is not implemented yet" (see full error >>>>>>> below). I tried to trace down the difference in the source files, but where >>>>>>> the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't >>>>>>> seem to have existed in v3.7.6, so I'm unsure how to compare. It seems the >>>>>>> error is that the order of the columns locally are unsorted, though I don't >>>>>>> think I specify a column order in the creation of the matrix: >>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>> call MatSetup(AA,ierr) >>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>> deallocate(d_nnz,o_nnz) >>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>> call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr) >>>>>>> call MatSetup(AA,ierr) >>>>>>> >>>>>>> >>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>> -------------------------------------------------------------- >>>>>>> [62]PETSC ERROR: No support for this operation for this object type >>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC >>>>>>> ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c >>>>>>> >>>>>>> -- >>>>>>> R. Michael Churchill >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> R. Michael Churchill >>>>> >>>> >>>> >>> >>> >>> -- >>> R. Michael Churchill >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rchurchi at pppl.gov Thu Nov 2 11:07:27 2017 From: rchurchi at pppl.gov (Randy Michael Churchill) Date: Thu, 2 Nov 2017 11:07:27 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: I used: git checkout v3.8 On Thu, Nov 2, 2017 at 10:07 AM, Hong wrote: > Mark, > I can reproduce this in an old branch, but not in current maint and master. > Which branch are you using to produce this error? > Hong > > > On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams wrote: > >> I am able to reproduce this with snes ex56 with 2 processors and adding >> -pc_gamg_repartition true >> >> I'm not sure how to fix it. >> >> 10:26 1 knepley/feature-plex-boxmesh-create *= >> ~/Codes/petsc/src/snes/examples/tutorials$ make >> PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-g >> runex >> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 ./ex56 >> -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it >> 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned >> -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >> -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true >> -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 >> -ksp_converged_reason -snes_monitor_short -ksp_monitor_short >> -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 >> -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig >> 0,0.05,0,1.05 -mg_levels_pc_type jacobi -petscpartitioner_type simple >> -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 >> -pc_gamg_repartition true >> [0] 27 global equations, 9 vertices >> [0] 27 equations in vector, 9 vertices >> 0 SNES Function norm 122.396 >> 0 KSP Residual norm 122.396 >> 1 KSP Residual norm 20.4696 >> 2 KSP Residual norm 3.95009 >> 3 KSP Residual norm 0.176181 >> 4 KSP Residual norm 0.0208781 >> 5 KSP Residual norm 0.00278873 >> 6 KSP Residual norm 0.000482741 >> 7 KSP Residual norm 4.68085e-05 >> 8 KSP Residual norm 5.42381e-06 >> 9 KSP Residual norm 5.12785e-07 >> 10 KSP Residual norm 2.60389e-08 >> 11 KSP Residual norm 4.96201e-09 >> 12 KSP Residual norm 1.989e-10 >> Linear solve converged due to CONVERGED_RTOL iterations 12 >> 1 SNES Function norm 1.990e-10 >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >> DM Object: Mesh (ex56_) 2 MPI processes >> type: plex >> Mesh in 3 dimensions: >> 0-cells: 12 12 >> 1-cells: 20 20 >> 2-cells: 11 11 >> 3-cells: 2 2 >> Labels: >> boundary: 1 strata with value/size (1 (39)) >> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), 6 (1)) >> marker: 1 strata with value/size (1 (27)) >> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >> [0] 441 global equations, 147 vertices >> [0] 441 equations in vector, 147 vertices >> 0 SNES Function norm 49.7106 >> 0 KSP Residual norm 49.7106 >> 1 KSP Residual norm 12.9252 >> 2 KSP Residual norm 2.38019 >> 3 KSP Residual norm 0.426307 >> 4 KSP Residual norm 0.0692155 >> 5 KSP Residual norm 0.0123092 >> 6 KSP Residual norm 0.00184874 >> 7 KSP Residual norm 0.000320761 >> 8 KSP Residual norm 5.48957e-05 >> 9 KSP Residual norm 9.90089e-06 >> 10 KSP Residual norm 1.5127e-06 >> 11 KSP Residual norm 2.82192e-07 >> 12 KSP Residual norm 4.62364e-08 >> 13 KSP Residual norm 7.99573e-09 >> 14 KSP Residual norm 1.3028e-09 >> 15 KSP Residual norm 2.174e-10 >> Linear solve converged due to CONVERGED_RTOL iterations 15 >> 1 SNES Function norm 2.174e-10 >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >> DM Object: Mesh (ex56_) 2 MPI processes >> type: plex >> Mesh in 3 dimensions: >> 0-cells: 45 45 >> 1-cells: 96 96 >> 2-cells: 68 68 >> 3-cells: 16 16 >> Labels: >> marker: 1 strata with value/size (1 (129)) >> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 (9), 6 >> (9)) >> boundary: 1 strata with value/size (1 (141)) >> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) >> [0] 4725 global equations, 1575 vertices >> [0] 4725 equations in vector, 1575 vertices >> 0 SNES Function norm 17.9091 >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: No support for this operation for this object type >> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >> [1]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [1]PETSC ERROR: No support for this operation for this object type >> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >> >> >> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams wrote: >> >>> >>> >>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>> rchurchi at pppl.gov> wrote: >>> >>>> Doing some additional testing, the issue goes away when removing the >>>> gamg preconditioner line from the petsc.rc: >>>> -pc_type gamg >>>> >>> >>> Yea, this is GAMG setup. >>> >>> This is the code. findices is create with ISCreateStride, so it is >>> sorted ... >>> >>> Michael is repartitioning the coarse grids. Maybe we don't have a >>> regression test with this... >>> >>> I will try to reproduce this. >>> >>> Michael: you can use hypre for now, or turn repartitioning off (eg, >>> -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm not >>> sure this will fix this. >>> >>> You don't have hypre parameters for all of your all of your solvers. I >>> think 'boomeramg' is the default pc_hypre_type. That should be good enough >>> for you. >>> >>> >>> { >>> IS findices; >>> PetscInt Istart,Iend; >>> Mat Pnew; >>> >>> ierr = MatGetOwnershipRange(Pold, &Istart, &Iend);CHKERRQ(ierr); >>> #if defined PETSC_GAMG_USE_LOG >>> ierr = PetscLogEventBegin(petsc_gamg_ >>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>> #endif >>> ierr = ISCreateStride(comm,Iend-Istar >>> t,Istart,1,&findices);CHKERRQ(ierr); >>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>> >>> #if defined PETSC_GAMG_USE_LOG >>> ierr = PetscLogEventEnd(petsc_gamg_se >>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>> #endif >>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>> >>> /* output - repartitioned */ >>> *a_P_inout = Pnew; >>> } >>> >>> >>>> >>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >>>> >>>>> Randy: >>>>> Thanks, I'll check it tomorrow. >>>>> Hong >>>>> >>>>> OK, this might not be completely satisfactory, because it doesn't show >>>>>> the partitioning or how the matrix is created, but this reproduces the >>>>>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>>>>> it in this script. This must be run with MPI rank greater than 1. This may >>>>>> be some combination of my petsc.rc, because when I use the PetscInitialize >>>>>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>>>>> it runs fine. >>>>>> >>>>>> >>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >>>>>> >>>>>>> Randy: >>>>>>> It could be a bug or a missing feature in our new >>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>> It would be helpful if you can provide us a simple example that >>>>>>> produces this example. >>>>>>> Hong >>>>>>> >>>>>>> I'm running a Fortran code that was just changed over to using petsc >>>>>>>> 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. >>>>>>>> The error is "unsorted iscol_local is not implemented yet" (see full error >>>>>>>> below). I tried to trace down the difference in the source files, but where >>>>>>>> the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't >>>>>>>> seem to have existed in v3.7.6, so I'm unsure how to compare. It seems the >>>>>>>> error is that the order of the columns locally are unsorted, though I don't >>>>>>>> think I specify a column order in the creation of the matrix: >>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>> call MatSetup(AA,ierr) >>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr) >>>>>>>> call MatSetup(AA,ierr) >>>>>>>> >>>>>>>> >>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>> -------------------------------------------------------------- >>>>>>>> [62]PETSC ERROR: No support for this operation for this object type >>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC >>>>>>>> ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>> >>>>>>>> -- >>>>>>>> R. Michael Churchill >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> R. Michael Churchill >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> R. Michael Churchill >>>> >>> >>> >> > -- R. Michael Churchill -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Nov 2 11:31:07 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 2 Nov 2017 16:31:07 +0000 Subject: [petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods In-Reply-To: References: <87zi979alu.fsf@jedbrown.org> <2E811513-A851-4F84-A93F-BE83D56584BB@mcs.anl.gov> <6FA17D2F-1EB3-4EC0-B13F-B19922011797@glasgow.ac.uk> <877evnyyty.fsf@jedbrown.org> <29550785-0F7A-488B-A159-DD42DC29A228@mcs.anl.gov> <87inf4vsld.fsf@jedbrown.org> <87k1zisl6a.fsf@jedbrown.org> <1553F760-492C-4394-BDBF-19B2A69A8517@mcs.anl.gov> <40201D61-0CA9-4BE0-9075-2651CF66CDC3@mcs.anl.gov> <7897599A-B802-4CEB-B188-DC28FB79482C@mcs.anl.gov> Message-ID: <653286EB-C0DC-4522-8D0B-902148799A1F@mcs.anl.gov> > On Nov 1, 2017, at 10:32 PM, zakaryah . wrote: > > I worked on the assumptions in my previous email and I at least partially implemented the function to assign the couplings. For row 0, which is the redundant field, I set dnz[0] to end-start, and onz[0] to the size of the matrix minus dnz[0]. For all other rows, I just increment the existing values of dnz[i] and onz[i], since the coupling to the redundant field adds one extra element beyond what's allocated for the DMDA stencil. Sounds reasonable. > > I see in the source that the FormCoupleLocations function is called once if the DM has PREALLOC_ONLY set to true, but twice otherwise. I assume that the second call is for setting the nonzero structure. Yes > Do I need to do this? You probably should. I would do a MatView() small DMDA on the matrix you obtain and then again after you add the numerical values. This will show you what values are not being properly allocated/put in when the matrix is created by the DM. > > In any case, something is still not right. Even with the extra elements preallocated, the first assembly of the matrix is very slow. I ran a test problem on a single process with -info, and got this: > > 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 18018 unneeded,629704 used Yes, really bad. > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446774208 30818112 > > [0] DMGetDMSNES(): Creating new DMSNES > > [0] DMGetDMKSP(): Creating new DMKSP > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > 0 SNES Function norm 2.302381528359e+00 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 126132 unneeded,647722 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 9610 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0 unneeded,647722 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > The 9610 mallocs during MatSetValues seem suspicious and are probably what's taking so long with larger problems. 601 of them are apparently in the call to set Jbh, and 9009 are in the call to set Jhb (b is the redundant field, h is the DMDA field). If I run with more than one process, I get a segfault when a process which has rank greater than 0 sets dnz or onz in the FormCoupleLocations call. > > > On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . wrote: > Thanks Barry, that looks like exactly what I need. I'm looking at pack.c and packm.c and I want to check my understanding of what my coupling function should do. The relevant line in DMCreateMatrix_Composite_AIJ seems to be: > > (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__nrows,__start,__end); > > and I infer that dnz and onz are the number of nonzero elements in the diagonal and off-diagonal submatrices, for each row of the DMComposite matrix. I suppose I can just set each of these in a for loop, but can I use the arguments to FormCoupleLocations as the range for the loop? Which ones - __rstart to __rstart+__nrows? How can I determine the number of rows on each processor from within the function that I pass? From the preallocation macros it looks like __start to __end describe the range of the columns of the diagonal submatrix - is that right? It looks like the ranges will be specific to each processor. Do I just set the values in dnz and onz, or do I need to reduce them? > > Thanks for all the help! Maybe if I get things working I can carve out the core of the code to make an example program for DMRedundant/Composite. > From hzhang at mcs.anl.gov Thu Nov 2 12:23:29 2017 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 2 Nov 2017 12:23:29 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Randy: I can reproduce it in maint branch (=v3.8) with option '-pc_gamg_mat_partitioning_type current'. My petsc is built with parmetis, thus it uses '-pc_gamg_mat_partitioning_type parmetis' by default and works well. Replacing it with 'current', I am able to see the error -- will fix it. Meanwhile, you can use '-pc_gamg_mat_partitioning_type average' to get your work done. I'll let you know the fix. Hong I used: > git checkout v3.8 > > On Thu, Nov 2, 2017 at 10:07 AM, Hong wrote: > >> Mark, >> I can reproduce this in an old branch, but not in current maint and >> master. >> Which branch are you using to produce this error? >> Hong >> >> >> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams wrote: >> >>> I am able to reproduce this with snes ex56 with 2 processors and adding >>> -pc_gamg_repartition true >>> >>> I'm not sure how to fix it. >>> >>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>> PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-g >>> runex >>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 ./ex56 >>> -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it >>> 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned >>> -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >>> -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true >>> -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 >>> -ksp_converged_reason -snes_monitor_short -ksp_monitor_short >>> -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 >>> -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg >>> -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig >>> 0,0.05,0,1.05 -mg_levels_pc_type jacobi -petscpartitioner_type simple >>> -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 >>> -pc_gamg_repartition true >>> [0] 27 global equations, 9 vertices >>> [0] 27 equations in vector, 9 vertices >>> 0 SNES Function norm 122.396 >>> 0 KSP Residual norm 122.396 >>> 1 KSP Residual norm 20.4696 >>> 2 KSP Residual norm 3.95009 >>> 3 KSP Residual norm 0.176181 >>> 4 KSP Residual norm 0.0208781 >>> 5 KSP Residual norm 0.00278873 >>> 6 KSP Residual norm 0.000482741 >>> 7 KSP Residual norm 4.68085e-05 >>> 8 KSP Residual norm 5.42381e-06 >>> 9 KSP Residual norm 5.12785e-07 >>> 10 KSP Residual norm 2.60389e-08 >>> 11 KSP Residual norm 4.96201e-09 >>> 12 KSP Residual norm 1.989e-10 >>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>> 1 SNES Function norm 1.990e-10 >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>> DM Object: Mesh (ex56_) 2 MPI processes >>> type: plex >>> Mesh in 3 dimensions: >>> 0-cells: 12 12 >>> 1-cells: 20 20 >>> 2-cells: 11 11 >>> 3-cells: 2 2 >>> Labels: >>> boundary: 1 strata with value/size (1 (39)) >>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), 6 (1)) >>> marker: 1 strata with value/size (1 (27)) >>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>> [0] 441 global equations, 147 vertices >>> [0] 441 equations in vector, 147 vertices >>> 0 SNES Function norm 49.7106 >>> 0 KSP Residual norm 49.7106 >>> 1 KSP Residual norm 12.9252 >>> 2 KSP Residual norm 2.38019 >>> 3 KSP Residual norm 0.426307 >>> 4 KSP Residual norm 0.0692155 >>> 5 KSP Residual norm 0.0123092 >>> 6 KSP Residual norm 0.00184874 >>> 7 KSP Residual norm 0.000320761 >>> 8 KSP Residual norm 5.48957e-05 >>> 9 KSP Residual norm 9.90089e-06 >>> 10 KSP Residual norm 1.5127e-06 >>> 11 KSP Residual norm 2.82192e-07 >>> 12 KSP Residual norm 4.62364e-08 >>> 13 KSP Residual norm 7.99573e-09 >>> 14 KSP Residual norm 1.3028e-09 >>> 15 KSP Residual norm 2.174e-10 >>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>> 1 SNES Function norm 2.174e-10 >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>> DM Object: Mesh (ex56_) 2 MPI processes >>> type: plex >>> Mesh in 3 dimensions: >>> 0-cells: 45 45 >>> 1-cells: 96 96 >>> 2-cells: 68 68 >>> 3-cells: 16 16 >>> Labels: >>> marker: 1 strata with value/size (1 (129)) >>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 (9), 6 >>> (9)) >>> boundary: 1 strata with value/size (1 (141)) >>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) >>> [0] 4725 global equations, 1575 vertices >>> [0] 4725 equations in vector, 1575 vertices >>> 0 SNES Function norm 17.9091 >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: No support for this operation for this object type >>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>> [1]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [1]PETSC ERROR: No support for this operation for this object type >>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>> >>> >>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams wrote: >>> >>>> >>>> >>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>> rchurchi at pppl.gov> wrote: >>>> >>>>> Doing some additional testing, the issue goes away when removing the >>>>> gamg preconditioner line from the petsc.rc: >>>>> -pc_type gamg >>>>> >>>> >>>> Yea, this is GAMG setup. >>>> >>>> This is the code. findices is create with ISCreateStride, so it is >>>> sorted ... >>>> >>>> Michael is repartitioning the coarse grids. Maybe we don't have a >>>> regression test with this... >>>> >>>> I will try to reproduce this. >>>> >>>> Michael: you can use hypre for now, or turn repartitioning off (eg, >>>> -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm not >>>> sure this will fix this. >>>> >>>> You don't have hypre parameters for all of your all of your solvers. I >>>> think 'boomeramg' is the default pc_hypre_type. That should be good enough >>>> for you. >>>> >>>> >>>> { >>>> IS findices; >>>> PetscInt Istart,Iend; >>>> Mat Pnew; >>>> >>>> ierr = MatGetOwnershipRange(Pold, &Istart, &Iend);CHKERRQ(ierr); >>>> #if defined PETSC_GAMG_USE_LOG >>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>> #endif >>>> ierr = ISCreateStride(comm,Iend-Istar >>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>> >>>> #if defined PETSC_GAMG_USE_LOG >>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>> #endif >>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>> >>>> /* output - repartitioned */ >>>> *a_P_inout = Pnew; >>>> } >>>> >>>> >>>>> >>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >>>>> >>>>>> Randy: >>>>>> Thanks, I'll check it tomorrow. >>>>>> Hong >>>>>> >>>>>> OK, this might not be completely satisfactory, because it doesn't >>>>>>> show the partitioning or how the matrix is created, but this reproduces the >>>>>>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>>>>>> it in this script. This must be run with MPI rank greater than 1. This may >>>>>>> be some combination of my petsc.rc, because when I use the PetscInitialize >>>>>>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>>>>>> it runs fine. >>>>>>> >>>>>>> >>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >>>>>>> >>>>>>>> Randy: >>>>>>>> It could be a bug or a missing feature in our new >>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>> It would be helpful if you can provide us a simple example that >>>>>>>> produces this example. >>>>>>>> Hong >>>>>>>> >>>>>>>> I'm running a Fortran code that was just changed over to using >>>>>>>>> petsc 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() >>>>>>>>> call. The error is "unsorted iscol_local is not implemented yet" (see full >>>>>>>>> error below). I tried to trace down the difference in the source files, but >>>>>>>>> where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>> >>>>>>>>> >>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> [62]PETSC ERROR: No support for this operation for this object type >>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC >>>>>>>>> ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>>> >>>>>>>>> -- >>>>>>>>> R. Michael Churchill >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> R. Michael Churchill >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> R. Michael Churchill >>>>> >>>> >>>> >>> >> > > > -- > R. Michael Churchill > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Nov 2 12:25:25 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 2 Nov 2017 13:25:25 -0400 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: On Thu, Nov 2, 2017 at 11:07 AM, Hong wrote: > Mark, > I can reproduce this in an old branch, but not in current maint and master. > Which branch are you using to produce this error? > I am using a branch from Matt. Let me try to merge it with master. > Hong > > > On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams wrote: > >> I am able to reproduce this with snes ex56 with 2 processors and adding >> -pc_gamg_repartition true >> >> I'm not sure how to fix it. >> >> 10:26 1 knepley/feature-plex-boxmesh-create *= >> ~/Codes/petsc/src/snes/examples/tutorials$ make >> PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-g >> runex >> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 ./ex56 >> -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it >> 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned >> -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >> -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true >> -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 >> -ksp_converged_reason -snes_monitor_short -ksp_monitor_short >> -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 >> -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig >> 0,0.05,0,1.05 -mg_levels_pc_type jacobi -petscpartitioner_type simple >> -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 >> -pc_gamg_repartition true >> [0] 27 global equations, 9 vertices >> [0] 27 equations in vector, 9 vertices >> 0 SNES Function norm 122.396 >> 0 KSP Residual norm 122.396 >> 1 KSP Residual norm 20.4696 >> 2 KSP Residual norm 3.95009 >> 3 KSP Residual norm 0.176181 >> 4 KSP Residual norm 0.0208781 >> 5 KSP Residual norm 0.00278873 >> 6 KSP Residual norm 0.000482741 >> 7 KSP Residual norm 4.68085e-05 >> 8 KSP Residual norm 5.42381e-06 >> 9 KSP Residual norm 5.12785e-07 >> 10 KSP Residual norm 2.60389e-08 >> 11 KSP Residual norm 4.96201e-09 >> 12 KSP Residual norm 1.989e-10 >> Linear solve converged due to CONVERGED_RTOL iterations 12 >> 1 SNES Function norm 1.990e-10 >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >> DM Object: Mesh (ex56_) 2 MPI processes >> type: plex >> Mesh in 3 dimensions: >> 0-cells: 12 12 >> 1-cells: 20 20 >> 2-cells: 11 11 >> 3-cells: 2 2 >> Labels: >> boundary: 1 strata with value/size (1 (39)) >> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), 6 (1)) >> marker: 1 strata with value/size (1 (27)) >> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >> [0] 441 global equations, 147 vertices >> [0] 441 equations in vector, 147 vertices >> 0 SNES Function norm 49.7106 >> 0 KSP Residual norm 49.7106 >> 1 KSP Residual norm 12.9252 >> 2 KSP Residual norm 2.38019 >> 3 KSP Residual norm 0.426307 >> 4 KSP Residual norm 0.0692155 >> 5 KSP Residual norm 0.0123092 >> 6 KSP Residual norm 0.00184874 >> 7 KSP Residual norm 0.000320761 >> 8 KSP Residual norm 5.48957e-05 >> 9 KSP Residual norm 9.90089e-06 >> 10 KSP Residual norm 1.5127e-06 >> 11 KSP Residual norm 2.82192e-07 >> 12 KSP Residual norm 4.62364e-08 >> 13 KSP Residual norm 7.99573e-09 >> 14 KSP Residual norm 1.3028e-09 >> 15 KSP Residual norm 2.174e-10 >> Linear solve converged due to CONVERGED_RTOL iterations 15 >> 1 SNES Function norm 2.174e-10 >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >> DM Object: Mesh (ex56_) 2 MPI processes >> type: plex >> Mesh in 3 dimensions: >> 0-cells: 45 45 >> 1-cells: 96 96 >> 2-cells: 68 68 >> 3-cells: 16 16 >> Labels: >> marker: 1 strata with value/size (1 (129)) >> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 (9), 6 >> (9)) >> boundary: 1 strata with value/size (1 (141)) >> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) >> [0] 4725 global equations, 1575 vertices >> [0] 4725 equations in vector, 1575 vertices >> 0 SNES Function norm 17.9091 >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: No support for this operation for this object type >> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >> [1]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [1]PETSC ERROR: No support for this operation for this object type >> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >> >> >> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams wrote: >> >>> >>> >>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>> rchurchi at pppl.gov> wrote: >>> >>>> Doing some additional testing, the issue goes away when removing the >>>> gamg preconditioner line from the petsc.rc: >>>> -pc_type gamg >>>> >>> >>> Yea, this is GAMG setup. >>> >>> This is the code. findices is create with ISCreateStride, so it is >>> sorted ... >>> >>> Michael is repartitioning the coarse grids. Maybe we don't have a >>> regression test with this... >>> >>> I will try to reproduce this. >>> >>> Michael: you can use hypre for now, or turn repartitioning off (eg, >>> -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm not >>> sure this will fix this. >>> >>> You don't have hypre parameters for all of your all of your solvers. I >>> think 'boomeramg' is the default pc_hypre_type. That should be good enough >>> for you. >>> >>> >>> { >>> IS findices; >>> PetscInt Istart,Iend; >>> Mat Pnew; >>> >>> ierr = MatGetOwnershipRange(Pold, &Istart, &Iend);CHKERRQ(ierr); >>> #if defined PETSC_GAMG_USE_LOG >>> ierr = PetscLogEventBegin(petsc_gamg_ >>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>> #endif >>> ierr = ISCreateStride(comm,Iend-Istar >>> t,Istart,1,&findices);CHKERRQ(ierr); >>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>> >>> #if defined PETSC_GAMG_USE_LOG >>> ierr = PetscLogEventEnd(petsc_gamg_se >>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>> #endif >>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>> >>> /* output - repartitioned */ >>> *a_P_inout = Pnew; >>> } >>> >>> >>>> >>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >>>> >>>>> Randy: >>>>> Thanks, I'll check it tomorrow. >>>>> Hong >>>>> >>>>> OK, this might not be completely satisfactory, because it doesn't show >>>>>> the partitioning or how the matrix is created, but this reproduces the >>>>>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>>>>> it in this script. This must be run with MPI rank greater than 1. This may >>>>>> be some combination of my petsc.rc, because when I use the PetscInitialize >>>>>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>>>>> it runs fine. >>>>>> >>>>>> >>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >>>>>> >>>>>>> Randy: >>>>>>> It could be a bug or a missing feature in our new >>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>> It would be helpful if you can provide us a simple example that >>>>>>> produces this example. >>>>>>> Hong >>>>>>> >>>>>>> I'm running a Fortran code that was just changed over to using petsc >>>>>>>> 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. >>>>>>>> The error is "unsorted iscol_local is not implemented yet" (see full error >>>>>>>> below). I tried to trace down the difference in the source files, but where >>>>>>>> the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't >>>>>>>> seem to have existed in v3.7.6, so I'm unsure how to compare. It seems the >>>>>>>> error is that the order of the columns locally are unsorted, though I don't >>>>>>>> think I specify a column order in the creation of the matrix: >>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>> call MatSetup(AA,ierr) >>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr) >>>>>>>> call MatSetup(AA,ierr) >>>>>>>> >>>>>>>> >>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>> -------------------------------------------------------------- >>>>>>>> [62]PETSC ERROR: No support for this operation for this object type >>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC >>>>>>>> ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>> >>>>>>>> -- >>>>>>>> R. Michael Churchill >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> R. Michael Churchill >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> R. Michael Churchill >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Nov 2 12:35:28 2017 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 2 Nov 2017 12:35:28 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Mark : I realize that using maint or master branch, I cannot reproduce the same error. For this example, you must use a parallel partitioner, e.g.,'current' gives me following error: [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it currently only supports one domain per processor use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch for more than one subdomain per processor Please rebase your branch with maint or master, then see if you still have problem. Hong > > On Thu, Nov 2, 2017 at 11:07 AM, Hong wrote: > >> Mark, >> I can reproduce this in an old branch, but not in current maint and >> master. >> Which branch are you using to produce this error? >> > > I am using a branch from Matt. Let me try to merge it with master. > > >> Hong >> >> >> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams wrote: >> >>> I am able to reproduce this with snes ex56 with 2 processors and adding >>> -pc_gamg_repartition true >>> >>> I'm not sure how to fix it. >>> >>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>> PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-g >>> runex >>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 ./ex56 >>> -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it >>> 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned >>> -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >>> -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true >>> -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 >>> -ksp_converged_reason -snes_monitor_short -ksp_monitor_short >>> -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 >>> -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg >>> -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig >>> 0,0.05,0,1.05 -mg_levels_pc_type jacobi -petscpartitioner_type simple >>> -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 >>> -pc_gamg_repartition true >>> [0] 27 global equations, 9 vertices >>> [0] 27 equations in vector, 9 vertices >>> 0 SNES Function norm 122.396 >>> 0 KSP Residual norm 122.396 >>> 1 KSP Residual norm 20.4696 >>> 2 KSP Residual norm 3.95009 >>> 3 KSP Residual norm 0.176181 >>> 4 KSP Residual norm 0.0208781 >>> 5 KSP Residual norm 0.00278873 >>> 6 KSP Residual norm 0.000482741 >>> 7 KSP Residual norm 4.68085e-05 >>> 8 KSP Residual norm 5.42381e-06 >>> 9 KSP Residual norm 5.12785e-07 >>> 10 KSP Residual norm 2.60389e-08 >>> 11 KSP Residual norm 4.96201e-09 >>> 12 KSP Residual norm 1.989e-10 >>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>> 1 SNES Function norm 1.990e-10 >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>> DM Object: Mesh (ex56_) 2 MPI processes >>> type: plex >>> Mesh in 3 dimensions: >>> 0-cells: 12 12 >>> 1-cells: 20 20 >>> 2-cells: 11 11 >>> 3-cells: 2 2 >>> Labels: >>> boundary: 1 strata with value/size (1 (39)) >>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), 6 (1)) >>> marker: 1 strata with value/size (1 (27)) >>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>> [0] 441 global equations, 147 vertices >>> [0] 441 equations in vector, 147 vertices >>> 0 SNES Function norm 49.7106 >>> 0 KSP Residual norm 49.7106 >>> 1 KSP Residual norm 12.9252 >>> 2 KSP Residual norm 2.38019 >>> 3 KSP Residual norm 0.426307 >>> 4 KSP Residual norm 0.0692155 >>> 5 KSP Residual norm 0.0123092 >>> 6 KSP Residual norm 0.00184874 >>> 7 KSP Residual norm 0.000320761 >>> 8 KSP Residual norm 5.48957e-05 >>> 9 KSP Residual norm 9.90089e-06 >>> 10 KSP Residual norm 1.5127e-06 >>> 11 KSP Residual norm 2.82192e-07 >>> 12 KSP Residual norm 4.62364e-08 >>> 13 KSP Residual norm 7.99573e-09 >>> 14 KSP Residual norm 1.3028e-09 >>> 15 KSP Residual norm 2.174e-10 >>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>> 1 SNES Function norm 2.174e-10 >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>> DM Object: Mesh (ex56_) 2 MPI processes >>> type: plex >>> Mesh in 3 dimensions: >>> 0-cells: 45 45 >>> 1-cells: 96 96 >>> 2-cells: 68 68 >>> 3-cells: 16 16 >>> Labels: >>> marker: 1 strata with value/size (1 (129)) >>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 (9), 6 >>> (9)) >>> boundary: 1 strata with value/size (1 (141)) >>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) >>> [0] 4725 global equations, 1575 vertices >>> [0] 4725 equations in vector, 1575 vertices >>> 0 SNES Function norm 17.9091 >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: No support for this operation for this object type >>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>> [1]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [1]PETSC ERROR: No support for this operation for this object type >>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>> >>> >>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams wrote: >>> >>>> >>>> >>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>> rchurchi at pppl.gov> wrote: >>>> >>>>> Doing some additional testing, the issue goes away when removing the >>>>> gamg preconditioner line from the petsc.rc: >>>>> -pc_type gamg >>>>> >>>> >>>> Yea, this is GAMG setup. >>>> >>>> This is the code. findices is create with ISCreateStride, so it is >>>> sorted ... >>>> >>>> Michael is repartitioning the coarse grids. Maybe we don't have a >>>> regression test with this... >>>> >>>> I will try to reproduce this. >>>> >>>> Michael: you can use hypre for now, or turn repartitioning off (eg, >>>> -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm not >>>> sure this will fix this. >>>> >>>> You don't have hypre parameters for all of your all of your solvers. I >>>> think 'boomeramg' is the default pc_hypre_type. That should be good enough >>>> for you. >>>> >>>> >>>> { >>>> IS findices; >>>> PetscInt Istart,Iend; >>>> Mat Pnew; >>>> >>>> ierr = MatGetOwnershipRange(Pold, &Istart, &Iend);CHKERRQ(ierr); >>>> #if defined PETSC_GAMG_USE_LOG >>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>> #endif >>>> ierr = ISCreateStride(comm,Iend-Istar >>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>> >>>> #if defined PETSC_GAMG_USE_LOG >>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>> #endif >>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>> >>>> /* output - repartitioned */ >>>> *a_P_inout = Pnew; >>>> } >>>> >>>> >>>>> >>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >>>>> >>>>>> Randy: >>>>>> Thanks, I'll check it tomorrow. >>>>>> Hong >>>>>> >>>>>> OK, this might not be completely satisfactory, because it doesn't >>>>>>> show the partitioning or how the matrix is created, but this reproduces the >>>>>>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>>>>>> it in this script. This must be run with MPI rank greater than 1. This may >>>>>>> be some combination of my petsc.rc, because when I use the PetscInitialize >>>>>>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>>>>>> it runs fine. >>>>>>> >>>>>>> >>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >>>>>>> >>>>>>>> Randy: >>>>>>>> It could be a bug or a missing feature in our new >>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>> It would be helpful if you can provide us a simple example that >>>>>>>> produces this example. >>>>>>>> Hong >>>>>>>> >>>>>>>> I'm running a Fortran code that was just changed over to using >>>>>>>>> petsc 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() >>>>>>>>> call. The error is "unsorted iscol_local is not implemented yet" (see full >>>>>>>>> error below). I tried to trace down the difference in the source files, but >>>>>>>>> where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>> >>>>>>>>> >>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> [62]PETSC ERROR: No support for this operation for this object type >>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC >>>>>>>>> ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>>> >>>>>>>>> -- >>>>>>>>> R. Michael Churchill >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> R. Michael Churchill >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> R. Michael Churchill >>>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Thu Nov 2 16:03:37 2017 From: zakaryah at gmail.com (zakaryah .) Date: Thu, 2 Nov 2017 17:03:37 -0400 Subject: [petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods In-Reply-To: <653286EB-C0DC-4522-8D0B-902148799A1F@mcs.anl.gov> References: <87zi979alu.fsf@jedbrown.org> <2E811513-A851-4F84-A93F-BE83D56584BB@mcs.anl.gov> <6FA17D2F-1EB3-4EC0-B13F-B19922011797@glasgow.ac.uk> <877evnyyty.fsf@jedbrown.org> <29550785-0F7A-488B-A159-DD42DC29A228@mcs.anl.gov> <87inf4vsld.fsf@jedbrown.org> <87k1zisl6a.fsf@jedbrown.org> <1553F760-492C-4394-BDBF-19B2A69A8517@mcs.anl.gov> <40201D61-0CA9-4BE0-9075-2651CF66CDC3@mcs.anl.gov> <7897599A-B802-4CEB-B188-DC28FB79482C@mcs.anl.gov> <653286EB-C0DC-4522-8D0B-902148799A1F@mcs.anl.gov> Message-ID: I ran MatView with the smallest possible grid size. The elements which are nonzero after the first assembly but not after the matrix is created are exactly the couplings, i.e. row and column 0 off the diagonal, which I'm trying to preallocate, and eventually place in the second call to FormCoupleLocations, although right now it's only being called once - not sure why prealloc_only is set - maybe because I have MAT_NEW_NONZERO_ALLOCATION_ERR set to false? I'm not sure why the Jbh terms result in so few additional mallocs - are columns allocated in chunks on the fly? For example, when my DMDA size is 375, adding values to Jhb results in 375 additional mallocs, but adding values to Jbh only adds 25 mallocs. I guess my FormCoupleLocations is pretty incomplete, because incrementing dnz and onz doesn't seem to have any effect. I also don't know how this function should behave - as I said before, writing to dnz[i] and onz[i] where i goes from rstart to rstart+nrows-1 causes a segfault on processors >0, and if I inspect those values of dnz[i] and onz[i] by writing them to a file, they are nonsense when processor >0. Are dnz and onz local to the processor, i.e. should my iterations go from 0 to nrows-1? What is the general idea behind setting the structure from within FormCoupleLocations? I'm currently not doing that at all. Thanks for all the help! On Thu, Nov 2, 2017 at 12:31 PM, Smith, Barry F. wrote: > > > > On Nov 1, 2017, at 10:32 PM, zakaryah . wrote: > > > > I worked on the assumptions in my previous email and I at least > partially implemented the function to assign the couplings. For row 0, > which is the redundant field, I set dnz[0] to end-start, and onz[0] to the > size of the matrix minus dnz[0]. For all other rows, I just increment the > existing values of dnz[i] and onz[i], since the coupling to the redundant > field adds one extra element beyond what's allocated for the DMDA stencil. > > Sounds reasonable. > > > > I see in the source that the FormCoupleLocations function is called once > if the DM has PREALLOC_ONLY set to true, but twice otherwise. I assume > that the second call is for setting the nonzero structure. > > Yes > > Do I need to do this? > > You probably should. > > I would do a MatView() small DMDA on the matrix you obtain and then > again after you add the numerical values. This will show you what values > are not being properly allocated/put in when the matrix is created by the > DM. > > > > In any case, something is still not right. Even with the extra elements > preallocated, the first assembly of the matrix is very slow. I ran a test > problem on a single process with -info, and got this: > > > > 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 > unneeded,1 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode > routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 > unneeded,629703 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. > Using Inode routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 > unneeded,1 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode > routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 > unneeded,629703 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. > Using Inode routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: > 18018 unneeded,629704 used > > Yes, really bad. > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. > Using Inode routines > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446774208 30818112 > > > > [0] DMGetDMSNES(): Creating new DMSNES > > > > [0] DMGetDMKSP(): Creating new DMKSP > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > 0 SNES Function norm 2.302381528359e+00 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: > 126132 unneeded,647722 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is > 9610 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. > Using Inode routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0 > unneeded,647722 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > > > > > The 9610 mallocs during MatSetValues seem suspicious and are probably > what's taking so long with larger problems. 601 of them are apparently in > the call to set Jbh, and 9009 are in the call to set Jhb (b is the > redundant field, h is the DMDA field). If I run with more than one > process, I get a segfault when a process which has rank greater than 0 sets > dnz or onz in the FormCoupleLocations call. > > > > > > On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . wrote: > > Thanks Barry, that looks like exactly what I need. I'm looking at > pack.c and packm.c and I want to check my understanding of what my coupling > function should do. The relevant line in DMCreateMatrix_Composite_AIJ > seems to be: > > > > (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__ > nrows,__start,__end); > > > > and I infer that dnz and onz are the number of nonzero elements in the > diagonal and off-diagonal submatrices, for each row of the DMComposite > matrix. I suppose I can just set each of these in a for loop, but can I > use the arguments to FormCoupleLocations as the range for the loop? Which > ones - __rstart to __rstart+__nrows? How can I determine the number of > rows on each processor from within the function that I pass? From the > preallocation macros it looks like __start to __end describe the range of > the columns of the diagonal submatrix - is that right? It looks like the > ranges will be specific to each processor. Do I just set the values in dnz > and onz, or do I need to reduce them? > > > > Thanks for all the help! Maybe if I get things working I can carve out > the core of the code to make an example program for DMRedundant/Composite. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Nov 2 16:36:43 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 2 Nov 2017 21:36:43 +0000 Subject: [petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods In-Reply-To: Message-ID: <8254EC3C-8500-43BB-BDBB-92BF3DCBA8B5@mcs.anl.gov> Please send your "simple" code that attempts to do the appropriate preallocation and we'll look at why it is failing. Barry > On Nov 2, 2017, at 4:03 PM, zakaryah . wrote: > > I ran MatView with the smallest possible grid size. The elements which are nonzero after the first assembly but not after the matrix is created are exactly the couplings, i.e. row and column 0 off the diagonal, which I'm trying to preallocate, and eventually place in the second call to FormCoupleLocations, although right now it's only being called once - not sure why prealloc_only is set - maybe because I have MAT_NEW_NONZERO_ALLOCATION_ERR set to false? I'm not sure why the Jbh terms result in so few additional mallocs - are columns allocated in chunks on the fly? For example, when my DMDA size is 375, adding values to Jhb results in 375 additional mallocs, but adding values to Jbh only adds 25 mallocs. > > I guess my FormCoupleLocations is pretty incomplete, because incrementing dnz and onz doesn't seem to have any effect. I also don't know how this function should behave - as I said before, writing to dnz[i] and onz[i] where i goes from rstart to rstart+nrows-1 causes a segfault on processors >0, and if I inspect those values of dnz[i] and onz[i] by writing them to a file, they are nonsense when processor >0. Are dnz and onz local to the processor, i.e. should my iterations go from 0 to nrows-1? What is the general idea behind setting the structure from within FormCoupleLocations? I'm currently not doing that at all. > > Thanks for all the help! > > On Thu, Nov 2, 2017 at 12:31 PM, Smith, Barry F. wrote: > > > > On Nov 1, 2017, at 10:32 PM, zakaryah . wrote: > > > > I worked on the assumptions in my previous email and I at least partially implemented the function to assign the couplings. For row 0, which is the redundant field, I set dnz[0] to end-start, and onz[0] to the size of the matrix minus dnz[0]. For all other rows, I just increment the existing values of dnz[i] and onz[i], since the coupling to the redundant field adds one extra element beyond what's allocated for the DMDA stencil. > > Sounds reasonable. > > > > I see in the source that the FormCoupleLocations function is called once if the DM has PREALLOC_ONLY set to true, but twice otherwise. I assume that the second call is for setting the nonzero structure. > > Yes > > Do I need to do this? > > You probably should. > > I would do a MatView() small DMDA on the matrix you obtain and then again after you add the numerical values. This will show you what values are not being properly allocated/put in when the matrix is created by the DM. > > > > In any case, something is still not right. Even with the extra elements preallocated, the first assembly of the matrix is very slow. I ran a test problem on a single process with -info, and got this: > > > > 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 18018 unneeded,629704 used > > Yes, really bad. > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446774208 30818112 > > > > [0] DMGetDMSNES(): Creating new DMSNES > > > > [0] DMGetDMKSP(): Creating new DMKSP > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > > > 0 SNES Function norm 2.302381528359e+00 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 126132 unneeded,647722 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 9610 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0 unneeded,647722 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > > > > > The 9610 mallocs during MatSetValues seem suspicious and are probably what's taking so long with larger problems. 601 of them are apparently in the call to set Jbh, and 9009 are in the call to set Jhb (b is the redundant field, h is the DMDA field). If I run with more than one process, I get a segfault when a process which has rank greater than 0 sets dnz or onz in the FormCoupleLocations call. > > > > > > On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . wrote: > > Thanks Barry, that looks like exactly what I need. I'm looking at pack.c and packm.c and I want to check my understanding of what my coupling function should do. The relevant line in DMCreateMatrix_Composite_AIJ seems to be: > > > > (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__nrows,__start,__end); > > > > and I infer that dnz and onz are the number of nonzero elements in the diagonal and off-diagonal submatrices, for each row of the DMComposite matrix. I suppose I can just set each of these in a for loop, but can I use the arguments to FormCoupleLocations as the range for the loop? Which ones - __rstart to __rstart+__nrows? How can I determine the number of rows on each processor from within the function that I pass? From the preallocation macros it looks like __start to __end describe the range of the columns of the diagonal submatrix - is that right? It looks like the ranges will be specific to each processor. Do I just set the values in dnz and onz, or do I need to reduce them? > > > > Thanks for all the help! Maybe if I get things working I can carve out the core of the code to make an example program for DMRedundant/Composite. > > > > From mfadams at lbl.gov Thu Nov 2 16:42:55 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 2 Nov 2017 17:42:55 -0400 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Hong, I've tested with master and I get the same error. Maybe the partitioning parameters are wrong. -pc_gamg_mat_partitioning_type is new to me. Can you run this (snes ex56) w/o the error? 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ make runex /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason -snes_monitor_short -ksp_monitor_short -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true [0] 27 global equations, 9 vertices [0] 27 equations in vector, 9 vertices 0 SNES Function norm 122.396 0 KSP Residual norm 122.396 depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) [0] 4725 global equations, 1575 vertices [0] 4725 equations in vector, 1575 vertices 0 SNES Function norm 17.9091 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: unsorted iscol_local is not implemented yet [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: > Mark : > I realize that using maint or master branch, I cannot reproduce the same > error. > For this example, you must use a parallel partitioner, e.g.,'current' > gives me following error: > [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it currently only > supports one domain per processor > use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch for more > than one subdomain per processor > > Please rebase your branch with maint or master, then see if you still have > problem. > > Hong > > >> >> On Thu, Nov 2, 2017 at 11:07 AM, Hong wrote: >> >>> Mark, >>> I can reproduce this in an old branch, but not in current maint and >>> master. >>> Which branch are you using to produce this error? >>> >> >> I am using a branch from Matt. Let me try to merge it with master. >> >> >>> Hong >>> >>> >>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams wrote: >>> >>>> I am able to reproduce this with snes ex56 with 2 processors and adding >>>> -pc_gamg_repartition true >>>> >>>> I'm not sure how to fix it. >>>> >>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>> PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-g >>>> runex >>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 >>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>> jacobi -petscpartitioner_type simple -mat_block_size 3 -matrap 0 >>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>> [0] 27 global equations, 9 vertices >>>> [0] 27 equations in vector, 9 vertices >>>> 0 SNES Function norm 122.396 >>>> 0 KSP Residual norm 122.396 >>>> 1 KSP Residual norm 20.4696 >>>> 2 KSP Residual norm 3.95009 >>>> 3 KSP Residual norm 0.176181 >>>> 4 KSP Residual norm 0.0208781 >>>> 5 KSP Residual norm 0.00278873 >>>> 6 KSP Residual norm 0.000482741 >>>> 7 KSP Residual norm 4.68085e-05 >>>> 8 KSP Residual norm 5.42381e-06 >>>> 9 KSP Residual norm 5.12785e-07 >>>> 10 KSP Residual norm 2.60389e-08 >>>> 11 KSP Residual norm 4.96201e-09 >>>> 12 KSP Residual norm 1.989e-10 >>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>> 1 SNES Function norm 1.990e-10 >>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>> DM Object: Mesh (ex56_) 2 MPI processes >>>> type: plex >>>> Mesh in 3 dimensions: >>>> 0-cells: 12 12 >>>> 1-cells: 20 20 >>>> 2-cells: 11 11 >>>> 3-cells: 2 2 >>>> Labels: >>>> boundary: 1 strata with value/size (1 (39)) >>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), 6 >>>> (1)) >>>> marker: 1 strata with value/size (1 (27)) >>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>> [0] 441 global equations, 147 vertices >>>> [0] 441 equations in vector, 147 vertices >>>> 0 SNES Function norm 49.7106 >>>> 0 KSP Residual norm 49.7106 >>>> 1 KSP Residual norm 12.9252 >>>> 2 KSP Residual norm 2.38019 >>>> 3 KSP Residual norm 0.426307 >>>> 4 KSP Residual norm 0.0692155 >>>> 5 KSP Residual norm 0.0123092 >>>> 6 KSP Residual norm 0.00184874 >>>> 7 KSP Residual norm 0.000320761 >>>> 8 KSP Residual norm 5.48957e-05 >>>> 9 KSP Residual norm 9.90089e-06 >>>> 10 KSP Residual norm 1.5127e-06 >>>> 11 KSP Residual norm 2.82192e-07 >>>> 12 KSP Residual norm 4.62364e-08 >>>> 13 KSP Residual norm 7.99573e-09 >>>> 14 KSP Residual norm 1.3028e-09 >>>> 15 KSP Residual norm 2.174e-10 >>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>> 1 SNES Function norm 2.174e-10 >>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>> DM Object: Mesh (ex56_) 2 MPI processes >>>> type: plex >>>> Mesh in 3 dimensions: >>>> 0-cells: 45 45 >>>> 1-cells: 96 96 >>>> 2-cells: 68 68 >>>> 3-cells: 16 16 >>>> Labels: >>>> marker: 1 strata with value/size (1 (129)) >>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 (9), 6 >>>> (9)) >>>> boundary: 1 strata with value/size (1 (141)) >>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) >>>> [0] 4725 global equations, 1575 vertices >>>> [0] 4725 equations in vector, 1575 vertices >>>> 0 SNES Function norm 17.9091 >>>> [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [0]PETSC ERROR: No support for this operation for this object type >>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>> [1]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [1]PETSC ERROR: No support for this operation for this object type >>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>> >>>> >>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams wrote: >>>> >>>>> >>>>> >>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>> rchurchi at pppl.gov> wrote: >>>>> >>>>>> Doing some additional testing, the issue goes away when removing the >>>>>> gamg preconditioner line from the petsc.rc: >>>>>> -pc_type gamg >>>>>> >>>>> >>>>> Yea, this is GAMG setup. >>>>> >>>>> This is the code. findices is create with ISCreateStride, so it is >>>>> sorted ... >>>>> >>>>> Michael is repartitioning the coarse grids. Maybe we don't have a >>>>> regression test with this... >>>>> >>>>> I will try to reproduce this. >>>>> >>>>> Michael: you can use hypre for now, or turn repartitioning off (eg, >>>>> -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm not >>>>> sure this will fix this. >>>>> >>>>> You don't have hypre parameters for all of your all of your solvers. I >>>>> think 'boomeramg' is the default pc_hypre_type. That should be good enough >>>>> for you. >>>>> >>>>> >>>>> { >>>>> IS findices; >>>>> PetscInt Istart,Iend; >>>>> Mat Pnew; >>>>> >>>>> ierr = MatGetOwnershipRange(Pold, &Istart, &Iend);CHKERRQ(ierr); >>>>> #if defined PETSC_GAMG_USE_LOG >>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>> #endif >>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>>>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>> >>>>> #if defined PETSC_GAMG_USE_LOG >>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>> #endif >>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>> >>>>> /* output - repartitioned */ >>>>> *a_P_inout = Pnew; >>>>> } >>>>> >>>>> >>>>>> >>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >>>>>> >>>>>>> Randy: >>>>>>> Thanks, I'll check it tomorrow. >>>>>>> Hong >>>>>>> >>>>>>> OK, this might not be completely satisfactory, because it doesn't >>>>>>>> show the partitioning or how the matrix is created, but this reproduces the >>>>>>>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>>>>>>> it in this script. This must be run with MPI rank greater than 1. This may >>>>>>>> be some combination of my petsc.rc, because when I use the PetscInitialize >>>>>>>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>>>>>>> it runs fine. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >>>>>>>> >>>>>>>>> Randy: >>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>> It would be helpful if you can provide us a simple example that >>>>>>>>> produces this example. >>>>>>>>> Hong >>>>>>>>> >>>>>>>>> I'm running a Fortran code that was just changed over to using >>>>>>>>>> petsc 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() >>>>>>>>>> call. The error is "unsorted iscol_local is not implemented yet" (see full >>>>>>>>>> error below). I tried to trace down the difference in the source files, but >>>>>>>>>> where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>> [62]PETSC ERROR: No support for this operation for this object >>>>>>>>>> type >>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC >>>>>>>>>> ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> R. Michael Churchill >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> R. Michael Churchill >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> R. Michael Churchill >>>>>> >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Nov 2 17:35:34 2017 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 2 Nov 2017 17:35:34 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Mark: I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( Now testing src/snes/examples/tutorials/ex56.c with your options, I can reproduce the error. I'll fix it. Hong Hong, > > I've tested with master and I get the same error. Maybe the partitioning > parameters are wrong. -pc_gamg_mat_partitioning_type is new to me. > > Can you run this (snes ex56) w/o the error? > > > 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ make > runex > /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 ./ex56 > -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it > 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned > -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 > -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true > -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > -ksp_converged_reason -snes_monitor_short -ksp_monitor_short > -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig > 0,0.05,0,1.05 -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type > parmetis -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view > -run_type 1 -pc_gamg_repartition true > [0] 27 global equations, 9 vertices > [0] 27 equations in vector, 9 vertices > 0 SNES Function norm 122.396 > 0 KSP Residual norm 122.396 > > depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) > [0] 4725 global equations, 1575 vertices > [0] 4725 equations in vector, 1575 vertices > 0 SNES Function norm 17.9091 > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: unsorted iscol_local is not implemented yet > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > > On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: > >> Mark : >> I realize that using maint or master branch, I cannot reproduce the same >> error. >> For this example, you must use a parallel partitioner, e.g.,'current' >> gives me following error: >> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it currently only >> supports one domain per processor >> use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch for more >> than one subdomain per processor >> >> Please rebase your branch with maint or master, then see if you still >> have problem. >> >> Hong >> >> >>> >>> On Thu, Nov 2, 2017 at 11:07 AM, Hong wrote: >>> >>>> Mark, >>>> I can reproduce this in an old branch, but not in current maint and >>>> master. >>>> Which branch are you using to produce this error? >>>> >>> >>> I am using a branch from Matt. Let me try to merge it with master. >>> >>> >>>> Hong >>>> >>>> >>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams wrote: >>>> >>>>> I am able to reproduce this with snes ex56 with 2 processors and >>>>> adding -pc_gamg_repartition true >>>>> >>>>> I'm not sure how to fix it. >>>>> >>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>> PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-g >>>>> runex >>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 >>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>> jacobi -petscpartitioner_type simple -mat_block_size 3 -matrap 0 >>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>> [0] 27 global equations, 9 vertices >>>>> [0] 27 equations in vector, 9 vertices >>>>> 0 SNES Function norm 122.396 >>>>> 0 KSP Residual norm 122.396 >>>>> 1 KSP Residual norm 20.4696 >>>>> 2 KSP Residual norm 3.95009 >>>>> 3 KSP Residual norm 0.176181 >>>>> 4 KSP Residual norm 0.0208781 >>>>> 5 KSP Residual norm 0.00278873 >>>>> 6 KSP Residual norm 0.000482741 >>>>> 7 KSP Residual norm 4.68085e-05 >>>>> 8 KSP Residual norm 5.42381e-06 >>>>> 9 KSP Residual norm 5.12785e-07 >>>>> 10 KSP Residual norm 2.60389e-08 >>>>> 11 KSP Residual norm 4.96201e-09 >>>>> 12 KSP Residual norm 1.989e-10 >>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>> 1 SNES Function norm 1.990e-10 >>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>> type: plex >>>>> Mesh in 3 dimensions: >>>>> 0-cells: 12 12 >>>>> 1-cells: 20 20 >>>>> 2-cells: 11 11 >>>>> 3-cells: 2 2 >>>>> Labels: >>>>> boundary: 1 strata with value/size (1 (39)) >>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), 6 >>>>> (1)) >>>>> marker: 1 strata with value/size (1 (27)) >>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>>> [0] 441 global equations, 147 vertices >>>>> [0] 441 equations in vector, 147 vertices >>>>> 0 SNES Function norm 49.7106 >>>>> 0 KSP Residual norm 49.7106 >>>>> 1 KSP Residual norm 12.9252 >>>>> 2 KSP Residual norm 2.38019 >>>>> 3 KSP Residual norm 0.426307 >>>>> 4 KSP Residual norm 0.0692155 >>>>> 5 KSP Residual norm 0.0123092 >>>>> 6 KSP Residual norm 0.00184874 >>>>> 7 KSP Residual norm 0.000320761 >>>>> 8 KSP Residual norm 5.48957e-05 >>>>> 9 KSP Residual norm 9.90089e-06 >>>>> 10 KSP Residual norm 1.5127e-06 >>>>> 11 KSP Residual norm 2.82192e-07 >>>>> 12 KSP Residual norm 4.62364e-08 >>>>> 13 KSP Residual norm 7.99573e-09 >>>>> 14 KSP Residual norm 1.3028e-09 >>>>> 15 KSP Residual norm 2.174e-10 >>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>> 1 SNES Function norm 2.174e-10 >>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>> type: plex >>>>> Mesh in 3 dimensions: >>>>> 0-cells: 45 45 >>>>> 1-cells: 96 96 >>>>> 2-cells: 68 68 >>>>> 3-cells: 16 16 >>>>> Labels: >>>>> marker: 1 strata with value/size (1 (129)) >>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 (9), >>>>> 6 (9)) >>>>> boundary: 1 strata with value/size (1 (141)) >>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) >>>>> [0] 4725 global equations, 1575 vertices >>>>> [0] 4725 equations in vector, 1575 vertices >>>>> 0 SNES Function norm 17.9091 >>>>> [0]PETSC ERROR: --------------------- Error Message >>>>> -------------------------------------------------------------- >>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>> [1]PETSC ERROR: --------------------- Error Message >>>>> -------------------------------------------------------------- >>>>> [1]PETSC ERROR: No support for this operation for this object type >>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>> >>>>> >>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>> rchurchi at pppl.gov> wrote: >>>>>> >>>>>>> Doing some additional testing, the issue goes away when removing the >>>>>>> gamg preconditioner line from the petsc.rc: >>>>>>> -pc_type gamg >>>>>>> >>>>>> >>>>>> Yea, this is GAMG setup. >>>>>> >>>>>> This is the code. findices is create with ISCreateStride, so it is >>>>>> sorted ... >>>>>> >>>>>> Michael is repartitioning the coarse grids. Maybe we don't have a >>>>>> regression test with this... >>>>>> >>>>>> I will try to reproduce this. >>>>>> >>>>>> Michael: you can use hypre for now, or turn repartitioning off (eg, >>>>>> -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm not >>>>>> sure this will fix this. >>>>>> >>>>>> You don't have hypre parameters for all of your all of your solvers. >>>>>> I think 'boomeramg' is the default pc_hypre_type. That should be good >>>>>> enough for you. >>>>>> >>>>>> >>>>>> { >>>>>> IS findices; >>>>>> PetscInt Istart,Iend; >>>>>> Mat Pnew; >>>>>> >>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, &Iend);CHKERRQ(ierr); >>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>> #endif >>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>>>>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>> >>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>> #endif >>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>> >>>>>> /* output - repartitioned */ >>>>>> *a_P_inout = Pnew; >>>>>> } >>>>>> >>>>>> >>>>>>> >>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >>>>>>> >>>>>>>> Randy: >>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>> Hong >>>>>>>> >>>>>>>> OK, this might not be completely satisfactory, because it doesn't >>>>>>>>> show the partitioning or how the matrix is created, but this reproduces the >>>>>>>>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>>>>>>>> it in this script. This must be run with MPI rank greater than 1. This may >>>>>>>>> be some combination of my petsc.rc, because when I use the PetscInitialize >>>>>>>>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>>>>>>>> it runs fine. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >>>>>>>>> >>>>>>>>>> Randy: >>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>> It would be helpful if you can provide us a simple example that >>>>>>>>>> produces this example. >>>>>>>>>> Hong >>>>>>>>>> >>>>>>>>>> I'm running a Fortran code that was just changed over to using >>>>>>>>>>> petsc 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() >>>>>>>>>>> call. The error is "unsorted iscol_local is not implemented yet" (see full >>>>>>>>>>> error below). I tried to trace down the difference in the source files, but >>>>>>>>>>> where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>> [62]PETSC ERROR: No support for this operation for this object >>>>>>>>>>> type >>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC >>>>>>>>>>> ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>> >>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>> >>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfu >>>>>>>>>>> nc.c >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> R. Michael Churchill >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> R. Michael Churchill >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> R. Michael Churchill >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Nov 2 17:44:59 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 2 Nov 2017 18:44:59 -0400 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Great, thanks, And could you please add these parameters to a regression test? As I recall we have with-parmetis regression test. On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: > Mark: > I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( > Now testing src/snes/examples/tutorials/ex56.c with your options, I can > reproduce the error. > I'll fix it. > > Hong > > Hong, >> >> I've tested with master and I get the same error. Maybe the partitioning >> parameters are wrong. -pc_gamg_mat_partitioning_type is new to me. >> >> Can you run this (snes ex56) w/o the error? >> >> >> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ make >> runex >> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 ./ex56 >> -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it >> 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned >> -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >> -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true >> -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 >> -ksp_converged_reason -snes_monitor_short -ksp_monitor_short >> -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 >> -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig >> 0,0.05,0,1.05 -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type >> parmetis -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view >> -run_type 1 -pc_gamg_repartition true >> [0] 27 global equations, 9 vertices >> [0] 27 equations in vector, 9 vertices >> 0 SNES Function norm 122.396 >> 0 KSP Residual norm 122.396 >> >> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) >> [0] 4725 global equations, 1575 vertices >> [0] 4725 equations in vector, 1575 vertices >> 0 SNES Function norm 17.9091 >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: No support for this operation for this object type >> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> >> >> On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: >> >>> Mark : >>> I realize that using maint or master branch, I cannot reproduce the same >>> error. >>> For this example, you must use a parallel partitioner, e.g.,'current' >>> gives me following error: >>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it currently only >>> supports one domain per processor >>> use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch for >>> more than one subdomain per processor >>> >>> Please rebase your branch with maint or master, then see if you still >>> have problem. >>> >>> Hong >>> >>> >>>> >>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong wrote: >>>> >>>>> Mark, >>>>> I can reproduce this in an old branch, but not in current maint and >>>>> master. >>>>> Which branch are you using to produce this error? >>>>> >>>> >>>> I am using a branch from Matt. Let me try to merge it with master. >>>> >>>> >>>>> Hong >>>>> >>>>> >>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams wrote: >>>>> >>>>>> I am able to reproduce this with snes ex56 with 2 processors and >>>>>> adding -pc_gamg_repartition true >>>>>> >>>>>> I'm not sure how to fix it. >>>>>> >>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>> PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-g >>>>>> runex >>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 >>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>> jacobi -petscpartitioner_type simple -mat_block_size 3 -matrap 0 >>>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>>> [0] 27 global equations, 9 vertices >>>>>> [0] 27 equations in vector, 9 vertices >>>>>> 0 SNES Function norm 122.396 >>>>>> 0 KSP Residual norm 122.396 >>>>>> 1 KSP Residual norm 20.4696 >>>>>> 2 KSP Residual norm 3.95009 >>>>>> 3 KSP Residual norm 0.176181 >>>>>> 4 KSP Residual norm 0.0208781 >>>>>> 5 KSP Residual norm 0.00278873 >>>>>> 6 KSP Residual norm 0.000482741 >>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>> 12 KSP Residual norm 1.989e-10 >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>>> 1 SNES Function norm 1.990e-10 >>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>> type: plex >>>>>> Mesh in 3 dimensions: >>>>>> 0-cells: 12 12 >>>>>> 1-cells: 20 20 >>>>>> 2-cells: 11 11 >>>>>> 3-cells: 2 2 >>>>>> Labels: >>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), 6 >>>>>> (1)) >>>>>> marker: 1 strata with value/size (1 (27)) >>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>>>> [0] 441 global equations, 147 vertices >>>>>> [0] 441 equations in vector, 147 vertices >>>>>> 0 SNES Function norm 49.7106 >>>>>> 0 KSP Residual norm 49.7106 >>>>>> 1 KSP Residual norm 12.9252 >>>>>> 2 KSP Residual norm 2.38019 >>>>>> 3 KSP Residual norm 0.426307 >>>>>> 4 KSP Residual norm 0.0692155 >>>>>> 5 KSP Residual norm 0.0123092 >>>>>> 6 KSP Residual norm 0.00184874 >>>>>> 7 KSP Residual norm 0.000320761 >>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>> 15 KSP Residual norm 2.174e-10 >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>>> 1 SNES Function norm 2.174e-10 >>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>> type: plex >>>>>> Mesh in 3 dimensions: >>>>>> 0-cells: 45 45 >>>>>> 1-cells: 96 96 >>>>>> 2-cells: 68 68 >>>>>> 3-cells: 16 16 >>>>>> Labels: >>>>>> marker: 1 strata with value/size (1 (129)) >>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 (9), >>>>>> 6 (9)) >>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) >>>>>> [0] 4725 global equations, 1575 vertices >>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>> 0 SNES Function norm 17.9091 >>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>> -------------------------------------------------------------- >>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>> -------------------------------------------------------------- >>>>>> [1]PETSC ERROR: No support for this operation for this object type >>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>> >>>>>> >>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>> rchurchi at pppl.gov> wrote: >>>>>>> >>>>>>>> Doing some additional testing, the issue goes away when removing >>>>>>>> the gamg preconditioner line from the petsc.rc: >>>>>>>> -pc_type gamg >>>>>>>> >>>>>>> >>>>>>> Yea, this is GAMG setup. >>>>>>> >>>>>>> This is the code. findices is create with ISCreateStride, so it is >>>>>>> sorted ... >>>>>>> >>>>>>> Michael is repartitioning the coarse grids. Maybe we don't have a >>>>>>> regression test with this... >>>>>>> >>>>>>> I will try to reproduce this. >>>>>>> >>>>>>> Michael: you can use hypre for now, or turn repartitioning off (eg, >>>>>>> -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm >>>>>>> not sure this will fix this. >>>>>>> >>>>>>> You don't have hypre parameters for all of your all of your solvers. >>>>>>> I think 'boomeramg' is the default pc_hypre_type. That should be good >>>>>>> enough for you. >>>>>>> >>>>>>> >>>>>>> { >>>>>>> IS findices; >>>>>>> PetscInt Istart,Iend; >>>>>>> Mat Pnew; >>>>>>> >>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>> &Iend);CHKERRQ(ierr); >>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>> #endif >>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>>>>>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>> >>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>> #endif >>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>> >>>>>>> /* output - repartitioned */ >>>>>>> *a_P_inout = Pnew; >>>>>>> } >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >>>>>>>> >>>>>>>>> Randy: >>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>> Hong >>>>>>>>> >>>>>>>>> OK, this might not be completely satisfactory, because it doesn't >>>>>>>>>> show the partitioning or how the matrix is created, but this reproduces the >>>>>>>>>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>>>>>>>>> it in this script. This must be run with MPI rank greater than 1. This may >>>>>>>>>> be some combination of my petsc.rc, because when I use the PetscInitialize >>>>>>>>>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>>>>>>>>> it runs fine. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >>>>>>>>>> >>>>>>>>>>> Randy: >>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>> It would be helpful if you can provide us a simple example that >>>>>>>>>>> produces this example. >>>>>>>>>>> Hong >>>>>>>>>>> >>>>>>>>>>> I'm running a Fortran code that was just changed over to using >>>>>>>>>>>> petsc 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() >>>>>>>>>>>> call. The error is "unsorted iscol_local is not implemented yet" (see full >>>>>>>>>>>> error below). I tried to trace down the difference in the source files, but >>>>>>>>>>>> where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for this object >>>>>>>>>>>> type >>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC >>>>>>>>>>>> ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>> >>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>> >>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>> >>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfu >>>>>>>>>>>> nc.c >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> R. Michael Churchill >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> R. Michael Churchill >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Nov 2 17:54:39 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 2 Nov 2017 18:54:39 -0400 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Also, I have been using -petscpartition_type but now I see -pc_gamg_mat_partitioning_type. Is -petscpartition_type depreciated for GAMG? Is this some sort of auto generated portmanteau? I can not find pc_gamg_mat_partitioning_type in the source. On Thu, Nov 2, 2017 at 6:44 PM, Mark Adams wrote: > Great, thanks, > > And could you please add these parameters to a regression test? As I > recall we have with-parmetis regression test. > > On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: > >> Mark: >> I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( >> Now testing src/snes/examples/tutorials/ex56.c with your options, I can >> reproduce the error. >> I'll fix it. >> >> Hong >> >> Hong, >>> >>> I've tested with master and I get the same error. Maybe the partitioning >>> parameters are wrong. -pc_gamg_mat_partitioning_type is new to me. >>> >>> Can you run this (snes ex56) w/o the error? >>> >>> >>> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ make >>> runex >>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 ./ex56 >>> -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it >>> 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned >>> -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >>> -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true >>> -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 >>> -ksp_converged_reason -snes_monitor_short -ksp_monitor_short >>> -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 >>> -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg >>> -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig >>> 0,0.05,0,1.05 -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type >>> parmetis -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view >>> -run_type 1 -pc_gamg_repartition true >>> [0] 27 global equations, 9 vertices >>> [0] 27 equations in vector, 9 vertices >>> 0 SNES Function norm 122.396 >>> 0 KSP Residual norm 122.396 >>> >>> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) >>> [0] 4725 global equations, 1575 vertices >>> [0] 4725 equations in vector, 1575 vertices >>> 0 SNES Function norm 17.9091 >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: No support for this operation for this object type >>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> >>> >>> On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: >>> >>>> Mark : >>>> I realize that using maint or master branch, I cannot reproduce the >>>> same error. >>>> For this example, you must use a parallel partitioner, e.g.,'current' >>>> gives me following error: >>>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it currently >>>> only supports one domain per processor >>>> use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch for >>>> more than one subdomain per processor >>>> >>>> Please rebase your branch with maint or master, then see if you still >>>> have problem. >>>> >>>> Hong >>>> >>>> >>>>> >>>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong wrote: >>>>> >>>>>> Mark, >>>>>> I can reproduce this in an old branch, but not in current maint and >>>>>> master. >>>>>> Which branch are you using to produce this error? >>>>>> >>>>> >>>>> I am using a branch from Matt. Let me try to merge it with master. >>>>> >>>>> >>>>>> Hong >>>>>> >>>>>> >>>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams wrote: >>>>>> >>>>>>> I am able to reproduce this with snes ex56 with 2 processors and >>>>>>> adding -pc_gamg_repartition true >>>>>>> >>>>>>> I'm not sure how to fix it. >>>>>>> >>>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>>> PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-g >>>>>>> runex >>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 >>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>> jacobi -petscpartitioner_type simple -mat_block_size 3 -matrap 0 >>>>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>>>> [0] 27 global equations, 9 vertices >>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>> 0 SNES Function norm 122.396 >>>>>>> 0 KSP Residual norm 122.396 >>>>>>> 1 KSP Residual norm 20.4696 >>>>>>> 2 KSP Residual norm 3.95009 >>>>>>> 3 KSP Residual norm 0.176181 >>>>>>> 4 KSP Residual norm 0.0208781 >>>>>>> 5 KSP Residual norm 0.00278873 >>>>>>> 6 KSP Residual norm 0.000482741 >>>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>>> 12 KSP Residual norm 1.989e-10 >>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>>>> 1 SNES Function norm 1.990e-10 >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations >>>>>>> 1 >>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>> type: plex >>>>>>> Mesh in 3 dimensions: >>>>>>> 0-cells: 12 12 >>>>>>> 1-cells: 20 20 >>>>>>> 2-cells: 11 11 >>>>>>> 3-cells: 2 2 >>>>>>> Labels: >>>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), 6 >>>>>>> (1)) >>>>>>> marker: 1 strata with value/size (1 (27)) >>>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>>>>> [0] 441 global equations, 147 vertices >>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>> 0 SNES Function norm 49.7106 >>>>>>> 0 KSP Residual norm 49.7106 >>>>>>> 1 KSP Residual norm 12.9252 >>>>>>> 2 KSP Residual norm 2.38019 >>>>>>> 3 KSP Residual norm 0.426307 >>>>>>> 4 KSP Residual norm 0.0692155 >>>>>>> 5 KSP Residual norm 0.0123092 >>>>>>> 6 KSP Residual norm 0.00184874 >>>>>>> 7 KSP Residual norm 0.000320761 >>>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>>> 15 KSP Residual norm 2.174e-10 >>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>>>> 1 SNES Function norm 2.174e-10 >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations >>>>>>> 1 >>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>> type: plex >>>>>>> Mesh in 3 dimensions: >>>>>>> 0-cells: 45 45 >>>>>>> 1-cells: 96 96 >>>>>>> 2-cells: 68 68 >>>>>>> 3-cells: 16 16 >>>>>>> Labels: >>>>>>> marker: 1 strata with value/size (1 (129)) >>>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 >>>>>>> (9), 6 (9)) >>>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) >>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>> 0 SNES Function norm 17.9091 >>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>> -------------------------------------------------------------- >>>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>> -------------------------------------------------------------- >>>>>>> [1]PETSC ERROR: No support for this operation for this object type >>>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>> >>>>>>> >>>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>>> rchurchi at pppl.gov> wrote: >>>>>>>> >>>>>>>>> Doing some additional testing, the issue goes away when removing >>>>>>>>> the gamg preconditioner line from the petsc.rc: >>>>>>>>> -pc_type gamg >>>>>>>>> >>>>>>>> >>>>>>>> Yea, this is GAMG setup. >>>>>>>> >>>>>>>> This is the code. findices is create with ISCreateStride, so it is >>>>>>>> sorted ... >>>>>>>> >>>>>>>> Michael is repartitioning the coarse grids. Maybe we don't have a >>>>>>>> regression test with this... >>>>>>>> >>>>>>>> I will try to reproduce this. >>>>>>>> >>>>>>>> Michael: you can use hypre for now, or turn repartitioning off (eg, >>>>>>>> -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but I'm >>>>>>>> not sure this will fix this. >>>>>>>> >>>>>>>> You don't have hypre parameters for all of your all of your >>>>>>>> solvers. I think 'boomeramg' is the default pc_hypre_type. That should be >>>>>>>> good enough for you. >>>>>>>> >>>>>>>> >>>>>>>> { >>>>>>>> IS findices; >>>>>>>> PetscInt Istart,Iend; >>>>>>>> Mat Pnew; >>>>>>>> >>>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>>> &Iend);CHKERRQ(ierr); >>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>> #endif >>>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>>>>>>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>>> >>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>> #endif >>>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>>> >>>>>>>> /* output - repartitioned */ >>>>>>>> *a_P_inout = Pnew; >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >>>>>>>>> >>>>>>>>>> Randy: >>>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>>> Hong >>>>>>>>>> >>>>>>>>>> OK, this might not be completely satisfactory, because it doesn't >>>>>>>>>>> show the partitioning or how the matrix is created, but this reproduces the >>>>>>>>>>> problem. I wrote out my matrix, Amat, from the larger simulation, and load >>>>>>>>>>> it in this script. This must be run with MPI rank greater than 1. This may >>>>>>>>>>> be some combination of my petsc.rc, because when I use the PetscInitialize >>>>>>>>>>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >>>>>>>>>>> it runs fine. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Randy: >>>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>>> It would be helpful if you can provide us a simple example that >>>>>>>>>>>> produces this example. >>>>>>>>>>>> Hong >>>>>>>>>>>> >>>>>>>>>>>> I'm running a Fortran code that was just changed over to using >>>>>>>>>>>>> petsc 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() >>>>>>>>>>>>> call. The error is "unsorted iscol_local is not implemented yet" (see full >>>>>>>>>>>>> error below). I tried to trace down the difference in the source files, but >>>>>>>>>>>>> where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for this object >>>>>>>>>>>>> type >>>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC >>>>>>>>>>>>> ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 >>>>>>>>>>>>> in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>> >>>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>> >>>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>> >>>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>>> >>>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfu >>>>>>>>>>>>> nc.c >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> R. Michael Churchill >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> R. Michael Churchill >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Thu Nov 2 21:44:57 2017 From: zakaryah at gmail.com (zakaryah .) Date: Thu, 2 Nov 2017 22:44:57 -0400 Subject: [petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods In-Reply-To: <8254EC3C-8500-43BB-BDBB-92BF3DCBA8B5@mcs.anl.gov> References: <8254EC3C-8500-43BB-BDBB-92BF3DCBA8B5@mcs.anl.gov> Message-ID: OK thanks Barry! I've attached source for a minimal example which has the same structure as my problem. On Thu, Nov 2, 2017 at 5:36 PM, Smith, Barry F. wrote: > > Please send your "simple" code that attempts to do the appropriate > preallocation and we'll look at why it is failing. > > Barry > > > > On Nov 2, 2017, at 4:03 PM, zakaryah . wrote: > > > > I ran MatView with the smallest possible grid size. The elements which > are nonzero after the first assembly but not after the matrix is created > are exactly the couplings, i.e. row and column 0 off the diagonal, which > I'm trying to preallocate, and eventually place in the second call to > FormCoupleLocations, although right now it's only being called once - not > sure why prealloc_only is set - maybe because I have > MAT_NEW_NONZERO_ALLOCATION_ERR set to false? I'm not sure why the Jbh > terms result in so few additional mallocs - are columns allocated in chunks > on the fly? For example, when my DMDA size is 375, adding values to Jhb > results in 375 additional mallocs, but adding values to Jbh only adds 25 > mallocs. > > > > I guess my FormCoupleLocations is pretty incomplete, because > incrementing dnz and onz doesn't seem to have any effect. I also don't > know how this function should behave - as I said before, writing to dnz[i] > and onz[i] where i goes from rstart to rstart+nrows-1 causes a segfault on > processors >0, and if I inspect those values of dnz[i] and onz[i] by > writing them to a file, they are nonsense when processor >0. Are dnz and > onz local to the processor, i.e. should my iterations go from 0 to > nrows-1? What is the general idea behind setting the structure from within > FormCoupleLocations? I'm currently not doing that at all. > > > > Thanks for all the help! > > > > On Thu, Nov 2, 2017 at 12:31 PM, Smith, Barry F. > wrote: > > > > > > > On Nov 1, 2017, at 10:32 PM, zakaryah . wrote: > > > > > > I worked on the assumptions in my previous email and I at least > partially implemented the function to assign the couplings. For row 0, > which is the redundant field, I set dnz[0] to end-start, and onz[0] to the > size of the matrix minus dnz[0]. For all other rows, I just increment the > existing values of dnz[i] and onz[i], since the coupling to the redundant > field adds one extra element beyond what's allocated for the DMDA stencil. > > > > Sounds reasonable. > > > > > > I see in the source that the FormCoupleLocations function is called > once if the DM has PREALLOC_ONLY set to true, but twice otherwise. I > assume that the second call is for setting the nonzero structure. > > > > Yes > > > Do I need to do this? > > > > You probably should. > > > > I would do a MatView() small DMDA on the matrix you obtain and then > again after you add the numerical values. This will show you what values > are not being properly allocated/put in when the matrix is created by the > DM. > > > > > > In any case, something is still not right. Even with the extra > elements preallocated, the first assembly of the matrix is very slow. I > ran a test problem on a single process with -info, and got this: > > > > > > 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 > unneeded,1 used > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > > > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. > > > > > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using > Inode routines > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: > 0 unneeded,629703 used > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. > > > > > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. > Using Inode routines > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 > unneeded,1 used > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > > > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. > > > > > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using > Inode routines > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: > 0 unneeded,629703 used > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. > > > > > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. > Using Inode routines > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: > 18018 unneeded,629704 used > > > > Yes, really bad. > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. > Using Inode routines > > > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446774208 30818112 > > > > > > [0] DMGetDMSNES(): Creating new DMSNES > > > > > > [0] DMGetDMKSP(): Creating new DMKSP > > > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > > > 0 SNES Function norm 2.302381528359e+00 > > > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 139995446773056 30194064 > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: > 126132 unneeded,647722 used > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 9610 > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 > > > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. > Using Inode routines > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: > 0 unneeded,647722 used > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 > > > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > > > > > > > > > The 9610 mallocs during MatSetValues seem suspicious and are probably > what's taking so long with larger problems. 601 of them are apparently in > the call to set Jbh, and 9009 are in the call to set Jhb (b is the > redundant field, h is the DMDA field). If I run with more than one > process, I get a segfault when a process which has rank greater than 0 sets > dnz or onz in the FormCoupleLocations call. > > > > > > > > > On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . > wrote: > > > Thanks Barry, that looks like exactly what I need. I'm looking at > pack.c and packm.c and I want to check my understanding of what my coupling > function should do. The relevant line in DMCreateMatrix_Composite_AIJ > seems to be: > > > > > > (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__ > nrows,__start,__end); > > > > > > and I infer that dnz and onz are the number of nonzero elements in the > diagonal and off-diagonal submatrices, for each row of the DMComposite > matrix. I suppose I can just set each of these in a for loop, but can I > use the arguments to FormCoupleLocations as the range for the loop? Which > ones - __rstart to __rstart+__nrows? How can I determine the number of > rows on each processor from within the function that I pass? From the > preallocation macros it looks like __start to __end describe the range of > the columns of the diagonal submatrix - is that right? It looks like the > ranges will be specific to each processor. Do I just set the values in dnz > and onz, or do I need to reduce them? > > > > > > Thanks for all the help! Maybe if I get things working I can carve > out the core of the code to make an example program for > DMRedundant/Composite. > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: main.c Type: text/x-csrc Size: 29021 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: main.h Type: text/x-chdr Size: 716 bytes Desc: not available URL: From zakaryah at gmail.com Fri Nov 3 10:59:41 2017 From: zakaryah at gmail.com (zakaryah .) Date: Fri, 3 Nov 2017 11:59:41 -0400 Subject: [petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods In-Reply-To: References: <8254EC3C-8500-43BB-BDBB-92BF3DCBA8B5@mcs.anl.gov> Message-ID: Hi - I think I managed to get this to work. The second call to FormCoupleLocations is actually pretty simple - it just has to set the matrix values! One thing that was hard to figure out was why line 168 calls MatGetOwnershipRange (*J,&__rstart,NULL) instead of also getting the last row owned. Anyway, I did that from within FormCoupleLocations and it seems to work great! I can try to clean things up to contribute an example if it would be useful. I'd probably either replace the finite strain with infinitesimal, or go to 1D, because in my code the Jacobian function is absurdly long and that could be distracting from the main point to exemplify, which is setting up and preallocating the fully coupled redundant field. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Nov 3 11:52:15 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 3 Nov 2017 16:52:15 +0000 Subject: [petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods In-Reply-To: References: <8254EC3C-8500-43BB-BDBB-92BF3DCBA8B5@mcs.anl.gov> Message-ID: > On Nov 3, 2017, at 10:59 AM, zakaryah . wrote: > > Hi - I think I managed to get this to work. The second call to FormCoupleLocations is actually pretty simple - it just has to set the matrix values! One thing that was hard to figure out was why line 168 calls MatGetOwnershipRange(*J,&__rstart,NULL) instead of also getting the last row owned. Anyway, I did that from within FormCoupleLocations and it seems to work great! I can try to clean things up to contribute an example if it would be useful. That would be great! We'd really appreciate it. As is clear from the examples we don't do a good job explaining and helping people with this concept and your example would really help everyone who needs to do this kind of thing. Barry > I'd probably either replace the finite strain with infinitesimal, or go to 1D, because in my code the Jacobian function is absurdly long and that could be distracting from the main point to exemplify, which is setting up and preallocating the fully coupled redundant field. From bikash at umich.edu Fri Nov 3 16:14:36 2017 From: bikash at umich.edu (Bikash Kanungo) Date: Fri, 3 Nov 2017 17:14:36 -0400 Subject: [petsc-users] SNES update solution vector Message-ID: Hi, I'm trying to solve a nonlinear problem using BFGS Quasi-Newton solver. I would like to tamper the solution vector x on-the-fly, based on some criterion. Is there a way to do so? Will SNESGetSolution(SNES snes, Vec * x) allow me to do so for each SNES iteration? Thanks, Bikash -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Nov 3 16:19:06 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Nov 2017 17:19:06 -0400 Subject: [petsc-users] SNES update solution vector In-Reply-To: References: Message-ID: What do you want to do to it? Matt On Fri, Nov 3, 2017 at 5:14 PM, Bikash Kanungo wrote: > Hi, > > I'm trying to solve a nonlinear problem using BFGS Quasi-Newton solver. I > would like to tamper the solution vector x on-the-fly, based on some > criterion. Is there a way to do so? Will SNESGetSolution(SNES snes, Vec * > x) allow me to do so for each SNES iteration? > > Thanks, > Bikash > > -- > Bikash S. Kanungo > PhD Student > Computational Materials Physics Group > Mechanical Engineering > University of Michigan > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bikash at umich.edu Fri Nov 3 16:39:24 2017 From: bikash at umich.edu (Bikash Kanungo) Date: Fri, 3 Nov 2017 17:39:24 -0400 Subject: [petsc-users] SNES update solution vector In-Reply-To: References: Message-ID: Hi Matt, I want to update the Dirichlet boundary condition on the solution vector on-the-fly. One way to do it is to destroy the current snes solver and create a new one with the new Dirichlet boundary condition (which means setting a new solution vector with a different size, size = # of non-Dirichlet rows). But is it possible to work with the current snes and instead enforce the new Dirichlet boundary condition on the current solution vector? Thanks, Bikash On Fri, Nov 3, 2017 at 5:19 PM, Matthew Knepley wrote: > What do you want to do to it? > > Matt > > On Fri, Nov 3, 2017 at 5:14 PM, Bikash Kanungo wrote: > >> Hi, >> >> I'm trying to solve a nonlinear problem using BFGS Quasi-Newton solver. I >> would like to tamper the solution vector x on-the-fly, based on some >> criterion. Is there a way to do so? Will SNESGetSolution(SNES snes, Vec * >> x) allow me to do so for each SNES iteration? >> >> Thanks, >> Bikash >> >> -- >> Bikash S. Kanungo >> PhD Student >> Computational Materials Physics Group >> Mechanical Engineering >> University of Michigan >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Nov 3 17:20:41 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 3 Nov 2017 22:20:41 +0000 Subject: [petsc-users] SNES update solution vector In-Reply-To: References: Message-ID: <48A58E9A-AFB7-448A-B9DD-73813FF5566C@mcs.anl.gov> You should not need to "tamper" with the solution process to achieve this. I would just change how my FormFunction and FormJacobian behave to implement the different boundary conditions. Why would that not work? Barry > On Nov 3, 2017, at 4:39 PM, Bikash Kanungo wrote: > > Hi Matt, > > I want to update the Dirichlet boundary condition on the solution vector on-the-fly. One way to do it is to destroy the current snes solver and create a new one with the new Dirichlet boundary condition (which means setting a new solution vector with a different size, size = # of non-Dirichlet rows). But is it possible to work with the current snes and instead enforce the new Dirichlet boundary condition on the current solution vector? > > Thanks, > Bikash > > On Fri, Nov 3, 2017 at 5:19 PM, Matthew Knepley wrote: > What do you want to do to it? > > Matt > > On Fri, Nov 3, 2017 at 5:14 PM, Bikash Kanungo wrote: > Hi, > > I'm trying to solve a nonlinear problem using BFGS Quasi-Newton solver. I would like to tamper the solution vector x on-the-fly, based on some criterion. Is there a way to do so? Will SNESGetSolution(SNES snes, Vec * x) allow me to do so for each SNES iteration? > > Thanks, > Bikash > > -- > Bikash S. Kanungo > PhD Student > Computational Materials Physics Group > Mechanical Engineering > University of Michigan > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- > Bikash S. Kanungo > PhD Student > Computational Materials Physics Group > Mechanical Engineering > University of Michigan > From dazevedoef at ornl.gov Fri Nov 3 17:33:56 2017 From: dazevedoef at ornl.gov (Ed D'Azevedo) Date: Fri, 3 Nov 2017 18:33:56 -0400 Subject: [petsc-users] petsc with fortran modules Message-ID: <82cb8ff6-217b-fd4d-012b-e634699fef34@ornl.gov> Dear PETSc expert, I have a question on the correct way to use? Fortran module in petsc. In this url on "UsingFortran" http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html#UsingFortran it mentions including both the "petsc/finclude/petscXXX.h" file and the Fortran "use" statement. The example show the following: #include "petsc/finclude/petscvec.h" ??????? use petscvec ?? Vec b ?? type(tVec) x My understanding of Fortran syntax is there cannot be "parameter" statements before the module "use" statement or in other words?? "use" statement cannot follow "parameter" statement. Do I understand correctly then all ".h" include header files under "petsc/finclude/" should not have "parameter" statements but just pure? cpp macro statements such as "#define" or "#ifdef"? One may imagine substituting "parameter" statements with '#define' such as instead of parameter (NOT_SET_VALUES=0) #define NOT_SET_VALUES 0 #define not_set_values 0 In the petsc version 3.6.2? under include/petsc/finclude,? there seems to be some files such as petscvec.h and? petscmat.h that contain "parameter" statements. If there should be "parameter" statements in the petsc/finclude header files, perhaps the order of the code should be to list? all F90 module "use" statements first, then include 'petsc/finclude' header files? ! ------------------------------------------------------------------------------ ! parameter statements after the module use statement ! ------------------------------------------------------------------------------ ?? use petscvec #include "petsc/finclude/petscvec.h" ?? Vec b ?? type(tVec) x -------------- next part -------------- An HTML attachment was scrubbed... URL: From bikash at umich.edu Fri Nov 3 17:34:46 2017 From: bikash at umich.edu (Bikash Kanungo) Date: Fri, 3 Nov 2017 18:34:46 -0400 Subject: [petsc-users] SNES update solution vector In-Reply-To: <48A58E9A-AFB7-448A-B9DD-73813FF5566C@mcs.anl.gov> References: <48A58E9A-AFB7-448A-B9DD-73813FF5566C@mcs.anl.gov> Message-ID: Hi Barry, So for Newton solvers that would work by explicitly setting the boundary conditions in my gradient(function) and Jacobian vectors. But in quasi-Newton solvers where the Jacobian is built from a history of previous Jacobians and current gradient vector, I can't enforce a new boundary condition. I can change the current gradient vector appropriately but I don't see a way handle the the Jacobian. Thanks, Bikash On Fri, Nov 3, 2017 at 6:20 PM, Smith, Barry F. wrote: > > > You should not need to "tamper" with the solution process to achieve > this. > > I would just change how my FormFunction and FormJacobian behave to > implement the different boundary conditions. Why would that not work? > > Barry > > > On Nov 3, 2017, at 4:39 PM, Bikash Kanungo wrote: > > > > Hi Matt, > > > > I want to update the Dirichlet boundary condition on the solution vector > on-the-fly. One way to do it is to destroy the current snes solver and > create a new one with the new Dirichlet boundary condition (which means > setting a new solution vector with a different size, size = # of > non-Dirichlet rows). But is it possible to work with the current snes and > instead enforce the new Dirichlet boundary condition on the current > solution vector? > > > > Thanks, > > Bikash > > > > On Fri, Nov 3, 2017 at 5:19 PM, Matthew Knepley > wrote: > > What do you want to do to it? > > > > Matt > > > > On Fri, Nov 3, 2017 at 5:14 PM, Bikash Kanungo wrote: > > Hi, > > > > I'm trying to solve a nonlinear problem using BFGS Quasi-Newton solver. > I would like to tamper the solution vector x on-the-fly, based on some > criterion. Is there a way to do so? Will SNESGetSolution(SNES snes, Vec * > x) allow me to do so for each SNES iteration? > > > > Thanks, > > Bikash > > > > -- > > Bikash S. Kanungo > > PhD Student > > Computational Materials Physics Group > > Mechanical Engineering > > University of Michigan > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > Bikash S. Kanungo > > PhD Student > > Computational Materials Physics Group > > Mechanical Engineering > > University of Michigan > > > > -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Fri Nov 3 18:42:26 2017 From: zakaryah at gmail.com (zakaryah .) Date: Fri, 3 Nov 2017 19:42:26 -0400 Subject: [petsc-users] SNES update solution vector In-Reply-To: References: <48A58E9A-AFB7-448A-B9DD-73813FF5566C@mcs.anl.gov> Message-ID: Hi Bikash, I agree with Barry. If your function changes, quasi Newton methods build up the approximation to the Jacobian from sequential evaluations of the function. If your Jacobian doesn't change too much when the boundary conditions change, just keep the last approximation and update it with the new function. Otherwise I don't see what you want to do. I don't think you need to call SNESGetSolution, you should be able to just access the vector that you pass to SNESSolve. On Nov 3, 2017 6:34 PM, "Bikash Kanungo" wrote: Hi Barry, So for Newton solvers that would work by explicitly setting the boundary conditions in my gradient(function) and Jacobian vectors. But in quasi-Newton solvers where the Jacobian is built from a history of previous Jacobians and current gradient vector, I can't enforce a new boundary condition. I can change the current gradient vector appropriately but I don't see a way handle the the Jacobian. Thanks, Bikash On Fri, Nov 3, 2017 at 6:20 PM, Smith, Barry F. wrote: > > > You should not need to "tamper" with the solution process to achieve > this. > > I would just change how my FormFunction and FormJacobian behave to > implement the different boundary conditions. Why would that not work? > > Barry > > > On Nov 3, 2017, at 4:39 PM, Bikash Kanungo wrote: > > > > Hi Matt, > > > > I want to update the Dirichlet boundary condition on the solution vector > on-the-fly. One way to do it is to destroy the current snes solver and > create a new one with the new Dirichlet boundary condition (which means > setting a new solution vector with a different size, size = # of > non-Dirichlet rows). But is it possible to work with the current snes and > instead enforce the new Dirichlet boundary condition on the current > solution vector? > > > > Thanks, > > Bikash > > > > On Fri, Nov 3, 2017 at 5:19 PM, Matthew Knepley > wrote: > > What do you want to do to it? > > > > Matt > > > > On Fri, Nov 3, 2017 at 5:14 PM, Bikash Kanungo wrote: > > Hi, > > > > I'm trying to solve a nonlinear problem using BFGS Quasi-Newton solver. > I would like to tamper the solution vector x on-the-fly, based on some > criterion. Is there a way to do so? Will SNESGetSolution(SNES snes, Vec * > x) allow me to do so for each SNES iteration? > > > > Thanks, > > Bikash > > > > -- > > Bikash S. Kanungo > > PhD Student > > Computational Materials Physics Group > > Mechanical Engineering > > University of Michigan > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > Bikash S. Kanungo > > PhD Student > > Computational Materials Physics Group > > Mechanical Engineering > > University of Michigan > > > > -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Nov 3 19:04:39 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Nov 2017 20:04:39 -0400 Subject: [petsc-users] SNES update solution vector In-Reply-To: References: <48A58E9A-AFB7-448A-B9DD-73813FF5566C@mcs.anl.gov> Message-ID: On Fri, Nov 3, 2017 at 7:42 PM, zakaryah . wrote: > Hi Bikash, > > I agree with Barry. If your function changes, quasi Newton methods build > up the approximation to the Jacobian from sequential evaluations of the > function. If your Jacobian doesn't change too much when the boundary > conditions change, just keep the last approximation and update it with the > new function. Otherwise I don't see what you want to do. > > I don't think you need to call SNESGetSolution, you should be able to just > access the vector that you pass to SNESSolve. > This is one situation where my method of enforcing Dirichlet conditions, namely to eliminate them completely from the system being solved, is clearly superior. I would eliminate those constrained variables. Matt > On Nov 3, 2017 6:34 PM, "Bikash Kanungo" wrote: > > Hi Barry, > > So for Newton solvers that would work by explicitly setting the boundary > conditions in my gradient(function) and Jacobian vectors. But in > quasi-Newton solvers where the Jacobian is built from a history of previous > Jacobians and current gradient vector, I can't enforce a new boundary > condition. I can change the current gradient vector appropriately but I > don't see a way handle the the Jacobian. > > Thanks, > Bikash > > > > On Fri, Nov 3, 2017 at 6:20 PM, Smith, Barry F. > wrote: > >> >> >> You should not need to "tamper" with the solution process to achieve >> this. >> >> I would just change how my FormFunction and FormJacobian behave to >> implement the different boundary conditions. Why would that not work? >> >> Barry >> >> > On Nov 3, 2017, at 4:39 PM, Bikash Kanungo wrote: >> > >> > Hi Matt, >> > >> > I want to update the Dirichlet boundary condition on the solution >> vector on-the-fly. One way to do it is to destroy the current snes solver >> and create a new one with the new Dirichlet boundary condition (which means >> setting a new solution vector with a different size, size = # of >> non-Dirichlet rows). But is it possible to work with the current snes and >> instead enforce the new Dirichlet boundary condition on the current >> solution vector? >> > >> > Thanks, >> > Bikash >> > >> > On Fri, Nov 3, 2017 at 5:19 PM, Matthew Knepley >> wrote: >> > What do you want to do to it? >> > >> > Matt >> > >> > On Fri, Nov 3, 2017 at 5:14 PM, Bikash Kanungo >> wrote: >> > Hi, >> > >> > I'm trying to solve a nonlinear problem using BFGS Quasi-Newton solver. >> I would like to tamper the solution vector x on-the-fly, based on some >> criterion. Is there a way to do so? Will SNESGetSolution(SNES snes, Vec * >> x) allow me to do so for each SNES iteration? >> > >> > Thanks, >> > Bikash >> > >> > -- >> > Bikash S. Kanungo >> > PhD Student >> > Computational Materials Physics Group >> > Mechanical Engineering >> > University of Michigan >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > https://www.cse.buffalo.edu/~knepley/ >> > >> > >> > >> > -- >> > Bikash S. Kanungo >> > PhD Student >> > Computational Materials Physics Group >> > Mechanical Engineering >> > University of Michigan >> > >> >> > > > -- > Bikash S. Kanungo > PhD Student > Computational Materials Physics Group > Mechanical Engineering > University of Michigan > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Nov 3 19:25:46 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Nov 2017 20:25:46 -0400 Subject: [petsc-users] petsc with fortran modules In-Reply-To: <82cb8ff6-217b-fd4d-012b-e634699fef34@ornl.gov> References: <82cb8ff6-217b-fd4d-012b-e634699fef34@ornl.gov> Message-ID: On Fri, Nov 3, 2017 at 6:33 PM, Ed D'Azevedo wrote: > Dear PETSc expert, > > I have a question on the correct way to use Fortran module in petsc. > > In this url on "UsingFortran" > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/ > manualpages/Sys/UsingFortran.html#UsingFortran > > > it mentions including both the "petsc/finclude/petscXXX.h" file and the > Fortran "use" statement. > > The example show the following: > > > #include "petsc/finclude/petscvec.h" > use petscvec > > Vec b > type(tVec) x > > > My understanding of Fortran syntax is there cannot be "parameter" > statements before the module "use" statement or in other words "use" > statement cannot follow "parameter" statement. > > Do I understand correctly then all ".h" include header files under > "petsc/finclude/" should not have "parameter" statements but just pure cpp > macro statements such as "#define" or "#ifdef"? > > All the parameter statements are in the module. Matt > One may imagine substituting "parameter" statements with '#define' such as > instead of > > parameter (NOT_SET_VALUES=0) > > #define NOT_SET_VALUES 0 > > #define not_set_values 0 > > In the petsc version 3.6.2 under include/petsc/finclude, there seems to > be some files such as petscvec.h and petscmat.h that contain "parameter" > statements. > > > > If there should be "parameter" statements in the petsc/finclude header > files, perhaps the order of the code should be to list all F90 module > "use" statements first, then include 'petsc/finclude' header files? > > > ! ------------------------------------------------------------ > ------------------ > > ! parameter statements after the module use statement > > ! ------------------------------------------------------------ > ------------------ > > use petscvec > > #include "petsc/finclude/petscvec.h" > > > Vec b > type(tVec) x > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Nov 3 19:41:56 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 3 Nov 2017 19:41:56 -0500 Subject: [petsc-users] petsc with fortran modules In-Reply-To: <82cb8ff6-217b-fd4d-012b-e634699fef34@ornl.gov> References: <82cb8ff6-217b-fd4d-012b-e634699fef34@ornl.gov> Message-ID: On Fri, 3 Nov 2017, Ed D'Azevedo wrote: > Dear PETSc expert, > > I have a question on the correct way to use? Fortran module in petsc. > > In this url on "UsingFortran" > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html#UsingFortran > > > it mentions including both the "petsc/finclude/petscXXX.h" file and the > Fortran "use" statement. > > The example show the following: > > > #include "petsc/finclude/petscvec.h" > ??????? use petscvec > > ?? Vec b > ?? type(tVec) x Just to note: its either of the above 2 statements. > > > My understanding of Fortran syntax is there cannot be "parameter" statements > before the module "use" statement or in other words?? "use" statement cannot > follow "parameter" statement. > > Do I understand correctly then all ".h" include header files under > "petsc/finclude/" should not have "parameter" statements but just pure? cpp > macro statements such as "#define" or "#ifdef"? yes. balay at asterix /home/balay/petsc ((v3.8)) $ grep parameter include/petsc/finclude/* include/petsc/finclude/petscsys.h:! F90 uses real(), conjg() when KIND parameter is used. balay at asterix /home/balay/petsc ((v3.8)) $ > > One may imagine substituting "parameter" statements with '#define' such as > instead of > > parameter (NOT_SET_VALUES=0) > > #define NOT_SET_VALUES 0 > > #define not_set_values 0 > > In the petsc version 3.6.2? under include/petsc/finclude,? there seems to be > some files such as petscvec.h and? petscmat.h that contain "parameter" > statements. The above doc corresponds to petsc-3.8 (with major changes for fortran module usage) - which is an upgrade we recommend for fortran usage. > > > > If there should be "parameter" statements in the petsc/finclude header files, > perhaps the order of the code should be to list? all F90 module "use" > statements first, then include 'petsc/finclude' header files? For older releases - we split paramaters and #defines into different set of includes. So one would use to get only the #defines #define "petscdef.h" Satish > > > ! > ------------------------------------------------------------------------------ > > ! parameter statements after the module use statement > > ! > ------------------------------------------------------------------------------ > > ?? use petscvec > > #include "petsc/finclude/petscvec.h" > > ?? Vec b > ?? type(tVec) x > > > From yann.jobic at univ-amu.fr Sun Nov 5 12:45:27 2017 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Sun, 5 Nov 2017 19:45:27 +0100 Subject: [petsc-users] DMPlex with AMR Message-ID: <7709c6b8-113b-87df-c201-e6f4e55ad96c@univ-amu.fr> Dear PETSc expert, I first correctly solve the advection/diffusion equation, with the advection/diffusion of a gaussian, in a rotate field. I'm using finit element, with PetscFE and the velocity field is in an auxillary one, in order to correctly set the residual and the jacobian. Then, i would like to use p4est, with AMR. To do that, I re-use the example : "petsc/examples/src/dm/impls/forest/examples/tests/ex2.c" I create a "base" DM, which has everything : PetscFE, PetscDS, the boundaries, ... Then, i correctly adapt the mesh, by using : ierr = DMForestTemplate(base,comm,&postForest);CHKERRQ(ierr); ierr = DMForestSetAdaptivityLabel(postForest,adaptLabel);CHKERRQ(ierr); My problem is that nothing happens when i'm solving the system, i.e. my initial solution does not move, or diffuse. That would mean that i didn't correctly transfer the problem definition. I tried to attach everything to the adapted mesh, without succes. What is the correct way to transfer the problem definition (PetscFE and PetscDS) from one DM to another ? Many thanks in advance, Regards, Yann -------------- next part -------------- static char help[] = "Advection/diffusion of a gaussian using finite elements.\n\ We solve the problem in a rectangular\n\ domain, using a parallel unstructured mesh (DMPLEX) to discretize it.\n\n\n\n"; #include #include #include #include #include "petscpc.h" /* pour le precondditionnement LU */ #include typedef struct { PetscLogEvent createMeshEvent; /* Domain and mesh definition */ PetscInt dim; /* The topological mesh dimension */ PetscInt cells[2]; /* The initial domain division */ char filename[2048]; /* The optional mesh file */ PetscBool interpolate; /* Generate intermediate mesh elements */ PetscInt monitorStepOffset; /* Pour que le TS monitor ajoute bien step */ /* Gaussian parameters */ PetscReal pos_init[2]; /* Position initiale de la gaussienne */ PetscReal sigma; /* ecart type de la gaussienne */ PetscReal dx; PetscScalar Diffu; /* Diffusivite */ /* Exact solution */ PetscErrorCode (**exactConc) (PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nc, PetscScalar *u, void *ctx); PetscErrorCode (**bord_d) (PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nc, PetscScalar *u, void *ctx); } AppCtx; /* refine_level : 1: coarse 2: normal 3: finner */ static PetscErrorCode AddIdentityLabel(DM dm) { PetscInt cStart, cEnd, c, medium; PetscErrorCode ierr; PetscFunctionBegin; medium = 2; ierr = DMCreateLabel(dm, "refine_level");CHKERRQ(ierr); ierr = DMForestGetCellChart(dm,&cStart,&cEnd);CHKERRQ(ierr); for (c = cStart; c < cEnd; c++) {ierr = DMSetLabelValue(dm, "refine_level", c, medium);CHKERRQ(ierr);} /* 2 par defaut */ PetscFunctionReturn(0); } static PetscErrorCode CreateAdaptivityLabel(DM forest, DMLabel *adaptLabel, Vec u, PetscReal ValMax) { const PetscInt debug = 0; DM plex; DMLabel RefineLevelLabel; PetscInt cStart, cEnd, c, Nb, niv1, niv2, niv3; /* niv1 : coarse niv3 : finner */ PetscErrorCode ierr; Vec localX; PetscSection section; PetscReal ValRefRefine,ValRefCoarse; PetscFunctionBegin; ValRefRefine = ValMax/10.; niv1 = 1; niv2 = 2; niv3 = 3; ValRefCoarse = ValMax/100.; ierr = DMLabelCreate("adapt",adaptLabel);CHKERRQ(ierr); ierr = DMLabelSetDefaultValue(*adaptLabel,DM_ADAPT_DETERMINE);CHKERRQ(ierr); ierr = DMGetLabel(forest,"refine_level",&RefineLevelLabel);CHKERRQ(ierr); ierr = DMConvert(forest, DMPLEX, &plex);CHKERRQ(ierr); DMGetDefaultSection(plex, §ion); DMGetLocalVector(plex, &localX); DMGlobalToLocalBegin(plex, u, INSERT_VALUES, localX); DMGlobalToLocalEnd(plex, u, INSERT_VALUES, localX); DMPlexGetHeightStratum(plex, 0, &cStart, &cEnd); ierr = PetscPrintf(PETSC_COMM_WORLD, "COUCOU : \n\n");CHKERRQ(ierr); for (c = cStart; c < cEnd; ++c) { PetscFE fe; PetscScalar *x = NULL; PetscReal valEle = 0.0; DMGetField(forest, 0, (PetscObject *) &fe); PetscFEGetDimension(fe, &Nb); DMPlexVecGetClosure(plex, 0, localX, c, NULL, &x); if (debug) { char title[1024]; PetscSNPrintf(title, 1023, "Solution for Field %d", 0); DMPrintCellVector(c, title, Nb, &x[0]); } { PetscInt i; valEle = 0; for(i=0; i ValRefRefine) { ierr = DMLabelSetValue(*adaptLabel,c,DM_ADAPT_REFINE);CHKERRQ(ierr); ierr = DMSetLabelValue(plex, "refine_level", c, niv3);CHKERRQ(ierr); } if (valEle < ValRefCoarse) { ierr = DMLabelSetValue(*adaptLabel,c,DM_ADAPT_COARSEN);CHKERRQ(ierr); ierr = DMSetLabelValue(plex, "refine_level", c, niv1);CHKERRQ(ierr); } if (debug) {PetscPrintf(PETSC_COMM_SELF, " elem %d val %g\n", c, valEle);} } DMPlexVecRestoreClosure(plex, NULL, localX, c, NULL, &x); } DMRestoreLocalVector(plex, &localX); ierr = DMDestroy(&plex);CHKERRQ(ierr); return(0); } static PetscErrorCode dirichlet (PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nf, PetscScalar *u, void *ctx) { u[0] = 0.; return 0; } static PetscErrorCode rotate_field(PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nc, PetscScalar *v, void *ctx) { v[0] = -4.*(x[1]-0.5); v[1] = 4.*(x[0]-0.5); return 0; } static PetscErrorCode exactSol(PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nc, PetscScalar *u, void *ctx) { PetscReal posx,posy,xx,yy,x0,y0,sigma2,C0,Diffu; AppCtx *user = (AppCtx *) ctx; x0 = user->pos_init[0]; y0 = user->pos_init[1]; sigma2 = PetscPowReal(user->sigma,2); Diffu = user->Diffu; C0 = 2*sigma2/(2*sigma2 + 4*Diffu*time); posx = x[0]-0.5; posy = x[1]-0.5; xx = posx*PetscCosReal(4.*time) + posy*PetscSinReal(4.*time) - x0; yy = -posx*PetscSinReal(4.*time) + posy*PetscCosReal(4.*time) - y0; u[0] = C0*exp(-(PetscPowReal(xx,2) + PetscPowReal(yy,2))/(2*sigma2 + 4*Diffu*time)); return 0; } static void f0_u(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar f0[]) { PetscInt d; f0[0] = u_t[0]; for (d = 0; d < dim; ++d) f0[0] += a[d] * u_x[d]; } /* gradU[comp*dim+d] = {u_x, u_y} or {u_x, u_y, u_z} */ static void f1_u(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar f1[]) { PetscInt d; for (d = 0; d < dim; ++d) f1[d] = constants[0]*u_x[d]; } /* */ static void g0_ut(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, PetscReal u_tShift, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar g0[]) { g0[0] = u_tShift*1.0; } /* */ static void g1_u(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, PetscReal u_tShift, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar g1[]) { PetscInt d; for (d = 0; d < dim; ++d) g1[d] = a[d]; } /* < \nabla v, \nabla u + {\nabla u}^T > This just gives \nabla u, give the perdiagonal for the transpose */ static void g3_uu(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, PetscReal u_tShift, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar g3[]) { PetscInt d; for (d = 0; d < dim; ++d) g3[d*dim+d] = constants[0]*1.0; } static PetscErrorCode ProcessOptions(MPI_Comm comm, AppCtx *options) { PetscInt n; PetscBool flg; PetscErrorCode ierr; PetscFunctionBeginUser; options->dim = 2; options->cells[0] = 25; options->cells[1] = 25; options->filename[0] = '\0'; options->interpolate = PETSC_FALSE; ierr = PetscOptionsBegin(comm, "", "Poisson Problem Options", "DMPLEX");CHKERRQ(ierr); ierr = PetscOptionsInt("-dim", "The topological mesh dimension", "ex12.c", options->dim, &options->dim, NULL);CHKERRQ(ierr); n = 2; ierr = PetscOptionsIntArray("-cells", "The initial mesh division", "ex12.c", options->cells, &n, NULL);CHKERRQ(ierr); ierr = PetscOptionsString("-f", "Mesh filename to read", "ex12.c", options->filename, options->filename, sizeof(options->filename), &flg);CHKERRQ(ierr); ierr = PetscOptionsBool("-interpolate", "Generate intermediate mesh elements", "ex12.c", options->interpolate, &options->interpolate, NULL);CHKERRQ(ierr); ierr = PetscOptionsEnd(); ierr = PetscLogEventRegister("CreateMesh", DM_CLASSID, &options->createMeshEvent);CHKERRQ(ierr); PetscFunctionReturn(0); } static PetscErrorCode CreateBCLabel(DM dm, const char name[]) { DMLabel label; PetscErrorCode ierr; PetscFunctionBeginUser; ierr = DMCreateLabel(dm, name);CHKERRQ(ierr); ierr = DMGetLabel(dm, name, &label);CHKERRQ(ierr); ierr = DMPlexMarkBoundaryFaces(dm, label);CHKERRQ(ierr); ierr = DMPlexLabelComplete(dm, label);CHKERRQ(ierr); PetscFunctionReturn(0); } static PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx *user, DM *dm) { PetscInt dim = user->dim; const char *filename = user->filename; PetscBool interpolate = user->interpolate; PetscBool hasLabel; size_t len; PetscErrorCode ierr; PetscFunctionBeginUser; ierr = PetscLogEventBegin(user->createMeshEvent,0,0,0,0);CHKERRQ(ierr); ierr = PetscStrlen(filename, &len);CHKERRQ(ierr); if (!len) { ierr = DMPlexCreateHexBoxMesh(comm, dim, user->cells, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, dm);CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject) *dm, "Mesh");CHKERRQ(ierr); } else { ierr = DMPlexCreateFromFile(comm, filename, interpolate, dm);CHKERRQ(ierr); ierr = DMPlexSetRefinementUniform(*dm, PETSC_FALSE);CHKERRQ(ierr); } /* If no boundary marker exists, mark the whole boundary */ ierr = DMHasLabel(*dm, "marker", &hasLabel);CHKERRQ(ierr); if (!hasLabel) {ierr = CreateBCLabel(*dm, "marker");CHKERRQ(ierr);} { PetscPartitioner part; DM distributedMesh = NULL; /* Distribute mesh over processes */ ierr = DMPlexGetPartitioner(*dm,&part);CHKERRQ(ierr); ierr = PetscPartitionerSetFromOptions(part);CHKERRQ(ierr); ierr = DMPlexDistribute(*dm, 0, NULL, &distributedMesh);CHKERRQ(ierr); if (distributedMesh) { ierr = DMDestroy(dm);CHKERRQ(ierr); *dm = distributedMesh; } } { DM dmConv; ierr = DMConvert(*dm,DMP4EST,&dmConv);CHKERRQ(ierr); if (dmConv) { ierr = DMDestroy(dm);CHKERRQ(ierr); *dm = dmConv; } } ierr = DMLocalizeCoordinates(*dm);CHKERRQ(ierr); /* needed for periodic */ ierr = DMSetFromOptions(*dm);CHKERRQ(ierr); ierr = DMViewFromOptions(*dm, NULL, "-dm_view");CHKERRQ(ierr); ierr = PetscLogEventEnd(user->createMeshEvent,0,0,0,0);CHKERRQ(ierr); PetscFunctionReturn(0); } static PetscErrorCode SetupProblem(PetscDS prob, AppCtx *user) { PetscErrorCode ierr; const PetscInt id = 1; PetscScalar myconst[1]; PetscFunctionBeginUser; ierr = PetscDSSetResidual(prob, 0, f0_u, f1_u);CHKERRQ(ierr); ierr = PetscDSSetJacobian(prob, 0, 0, g0_ut, g1_u, NULL, g3_uu);CHKERRQ(ierr); user->exactConc[0] = exactSol; user->bord_d[0] = exactSol; /*user->bord_d[0] = dirichlet;*/ myconst[0] = user->Diffu; ierr = PetscDSSetConstants(prob,1,myconst); ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "wall", "marker", 0, 0, NULL, (void (*)(void)) user->bord_d[0], 1, &id, user);CHKERRQ(ierr); PetscFunctionReturn(0); } static PetscErrorCode SetupVelocity(DM dm, DM dmAux, AppCtx *user) { PetscErrorCode (*funcs[1])(PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nf, PetscScalar v[], void *ctx) = {rotate_field}; Vec v; PetscErrorCode ierr; PetscFunctionBeginUser; ierr = DMCreateLocalVector(dmAux, &v);CHKERRQ(ierr); ierr = DMProjectFunctionLocal(dmAux, 0.0, funcs, NULL, INSERT_ALL_VALUES, v);CHKERRQ(ierr); ierr = PetscObjectCompose((PetscObject) dm, "A", (PetscObject) v);CHKERRQ(ierr); ierr = VecDestroy(&v);CHKERRQ(ierr); PetscFunctionReturn(0); } static PetscErrorCode SetupDiscretization(DM dm, AppCtx *user) { const PetscInt debug = 0; DM cdm = dm; const PetscInt dim = user->dim; PetscInt order,Ncomp,npoints; PetscFE fe, feAux; PetscDS prob, probAux; PetscQuadrature q; PetscErrorCode ierr; PetscFunctionBeginUser; /* Create finite element */ ierr = PetscFECreateDefault(dm, dim, 1, 0, "conc_", PETSC_DEFAULT, &fe);CHKERRQ(ierr); /*dim 1 : c'est un champ scalaire */ ierr = PetscObjectSetName((PetscObject) fe, "conc");CHKERRQ(ierr); /* Create velocity */ ierr = PetscFECreateDefault(dm, dim, dim, 0, "vel_", -1, &feAux);CHKERRQ(ierr); ierr = PetscFEGetQuadrature(fe, &q);CHKERRQ(ierr); ierr = PetscFESetQuadrature(feAux, q);CHKERRQ(ierr); ierr = PetscDSCreate(PetscObjectComm((PetscObject) dm), &probAux);CHKERRQ(ierr); ierr = PetscDSSetDiscretization(probAux, 0, (PetscObject) feAux);CHKERRQ(ierr); /* Set discretization and boundary conditions for each mesh */ ierr = DMGetDS(dm, &prob);CHKERRQ(ierr); ierr = PetscDSSetDiscretization(prob, 0, (PetscObject) fe);CHKERRQ(ierr); ierr = SetupProblem(prob, user);CHKERRQ(ierr); while (cdm) { DM dmAux, coordDM; PetscBool hasLabel; ierr = DMSetDS(cdm, prob);CHKERRQ(ierr); ierr = DMGetCoordinateDM(cdm, &coordDM);CHKERRQ(ierr); ierr = DMClone(cdm, &dmAux);CHKERRQ(ierr); ierr = DMSetCoordinateDM(dmAux, coordDM);CHKERRQ(ierr); ierr = DMSetDS(dmAux, probAux);CHKERRQ(ierr); ierr = PetscObjectCompose((PetscObject) cdm, "dmAux", (PetscObject) dmAux);CHKERRQ(ierr); ierr = SetupVelocity(cdm, dmAux, user);CHKERRQ(ierr); ierr = DMDestroy(&dmAux);CHKERRQ(ierr); ierr = DMHasLabel(cdm, "marker", &hasLabel);CHKERRQ(ierr); if (!hasLabel) {ierr = CreateBCLabel(cdm, "marker");CHKERRQ(ierr);} ierr = DMGetCoarseDM(cdm, &cdm);CHKERRQ(ierr); } /* On test l ordre de quadrature */ if (debug) { ierr = PetscQuadratureGetOrder(q,&order);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "Ordre de quadrature : %D\n",order);CHKERRQ(ierr); ierr = PetscQuadratureGetData(q,NULL,&Ncomp,&npoints,NULL,NULL);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "Nombre de points d'integration : %D\n\n",npoints);CHKERRQ(ierr); } /*pas de condition limite pour le moment */ ierr = PetscFEDestroy(&fe);CHKERRQ(ierr); ierr = PetscFEDestroy(&feAux);CHKERRQ(ierr); ierr = PetscDSDestroy(&probAux);CHKERRQ(ierr); PetscFunctionReturn(0); } static PetscErrorCode GetDiscretizationInfo(DM dm) { DM cdm = dm; PetscFE fe; PetscQuadrature quad; PetscInt field, order, Ncomp, npoints; PetscErrorCode ierr; field = 0; ierr = DMGetField(cdm, field, (PetscObject *) &fe);CHKERRQ(ierr); ierr = PetscFEGetQuadrature(fe, &quad);CHKERRQ(ierr); ierr = PetscQuadratureGetOrder(quad,&order);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "Ordre de quadrature : %D\n",order);CHKERRQ(ierr); ierr = PetscQuadratureGetData(quad,NULL,&Ncomp,&npoints,NULL,NULL);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "Nombre de points d'integration : %D\n\n",npoints);CHKERRQ(ierr); PetscFunctionReturn(0); } static PetscErrorCode TSMonitorError(TS ts, PetscInt step, PetscReal crtime, Vec u, void *ctx) { AppCtx *user = (AppCtx *) ctx; DM dm; PetscReal error; PetscErrorCode ierr; void *myctx[1]; /* pour envoyer la structure dans DMProjectFunction */ PetscFunctionBeginUser; if (step >= 0) { step += user->monitorStepOffset; } myctx[0] = (void *) user; if (step%10 == 0) { ierr = TSGetDM(ts, &dm);CHKERRQ(ierr); ierr = DMComputeL2Diff(dm, crtime, user->exactConc, myctx, u, &error);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "Timestep: %04d time = %-8.4g \t L_2 Error: %2.5g\n", (int) step, (double) crtime, (double) error);CHKERRQ(ierr); } PetscFunctionReturn(0); } static PetscErrorCode initializeTS(DM dm, AppCtx *ctx, TS *ts, PetscReal tmax, PetscReal dt) { PetscErrorCode ierr; KSP ksp; SNES ts_snes; /* nonlinear solver */ PC pc; /* preconditioner context */ PetscFunctionBegin; /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Create timestepping solver context - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = TSCreate(PetscObjectComm((PetscObject)dm), ts);CHKERRQ(ierr); ierr = TSSetDM(*ts, dm);CHKERRQ(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Customize nonlinear solver - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = TSMonitorSet(*ts, TSMonitorError, ctx, NULL);CHKERRQ(ierr); ierr = DMTSSetBoundaryLocal(dm, DMPlexTSComputeBoundary, ctx);CHKERRQ(ierr); ierr = DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM, ctx);CHKERRQ(ierr); ierr = DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM, ctx);CHKERRQ(ierr); ierr = TSSetExactFinalTime(*ts,TS_EXACTFINALTIME_MATCHSTEP);CHKERRQ(ierr); ierr = TSSetMaxTime(*ts,tmax);CHKERRQ(ierr); ierr = TSSetTimeStep(*ts,dt);CHKERRQ(ierr); ierr = TSSetType(*ts,TSBEULER);CHKERRQ(ierr); /* TSGetType(TS,&time_scheme); PCGetType(PC TSThetaSetTheta(ts,1.0); TSSetType(ts,TSCN); */ ierr = TSGetSNES(*ts,&ts_snes);CHKERRQ(ierr); ierr = SNESGetKSP(ts_snes, &ksp);CHKERRQ(ierr); ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); ierr = PCSetType(pc,"lu");CHKERRQ(ierr); PetscFunctionReturn(0); } int main(int argc, char **argv) { MPI_Comm comm; DM base, postForest; /* Problem specification */ TS ts; /* Pour l'iteration en temps */ PetscReal ftime; PetscReal tmax,dt; /* temps max, delta t */ PetscInt steps; /* iterations for convergence */ PetscInt adaptInterval;/* TS step entre les remaillage */ Vec u,uNew,u_exact,r; /* solution, exact solution, residual vectors */ AppCtx user; /* user-defined work context */ PetscReal error = 0.0; /* L_2 error in the solution */ PetscReal Peclet,cfl,dx,Lx,Um; PetscErrorCode ierr; void *ctx[1]; /* pour envoyer la structure dans DMProjectFunction */ ctx[0] = (void *) &user; TSType time_scheme; DMLabel adaptLabel; PetscBool hasLabel; PetscDS prob; ierr = PetscInitialize(&argc, &argv, NULL,help);if (ierr) return ierr; ierr = ProcessOptions(PETSC_COMM_WORLD, &user);CHKERRQ(ierr); comm = PETSC_COMM_WORLD; Lx = 1; Um = 4*Lx/2.; dx = 1.*Lx/user.cells[0]; user.dx = dx; tmax = 0.25*PETSC_PI/2.; /* rotation complete en PETSC_PI/2 */ /* dt = 0.000625; */ dt = 0.0025; /* On init des parametres de la gaussienne */ user.pos_init[0] = 0.; user.pos_init[1] = -0.125; user.sigma = 3*dx; user.Diffu = 0.02; /* Um*Lx/Diffu */ Peclet = Um*dx/user.Diffu; cfl = Um*dt/dx; ierr = PetscPrintf(comm, "Probleme : advection/diffusion\n cas test rotation de la gaussienne\n");CHKERRQ(ierr); ierr = PetscPrintf(comm, " cas test de la gaussienne en rotation \n");CHKERRQ(ierr); ierr = PetscPrintf(comm, "Parametres simu : \n");CHKERRQ(ierr); ierr = PetscPrintf(comm, " CellsX : %D Lx : %g dx : %g dt : %g\n",user.cells[0],(double)Lx,(double)dx,(double)dt);CHKERRQ(ierr); ierr = PetscPrintf(comm, " Tmax : %g avec dt : %g soit %f iterations\n",(double)tmax,(double)dt,(double)tmax/dt);CHKERRQ(ierr); ierr = PetscPrintf(comm, "Parametres gaussienne : \n");CHKERRQ(ierr); ierr = PetscPrintf(comm, " Sigma : %g x0 : %g y0 : %g\n",(double)user.sigma,(double)user.pos_init[0],(double)user.pos_init[1]);CHKERRQ(ierr); ierr = PetscPrintf(comm, "Parametres physiques : \n");CHKERRQ(ierr); ierr = PetscPrintf(comm, " Um : %g Diffu : %g\n",(double)Um,(double)user.Diffu);CHKERRQ(ierr); ierr = PetscPrintf(comm, " Peclet de maille : %g\n",(double)Peclet);CHKERRQ(ierr); ierr = PetscPrintf(comm, " CFL : %g\n",(double)cfl);CHKERRQ(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - sur le maillage - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = CreateMesh(comm, &user, &base);CHKERRQ(ierr); ierr = DMSetApplicationContext(base, &user);CHKERRQ(ierr); ierr = PetscMalloc2(1, &user.exactConc,1,&user.bord_d);CHKERRQ(ierr); ierr = SetupDiscretization(base, &user);CHKERRQ(ierr); ierr = AddIdentityLabel(base);CHKERRQ(ierr); ierr = DMViewFromOptions(base,NULL,"-dm_pre_adapt");CHKERRQ(ierr); ierr = DMCreateGlobalVector(base, &u);CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject) u, "conc");CHKERRQ(ierr); ierr = DMProjectFunction(base, 0.0, user.exactConc, ctx, INSERT_ALL_VALUES, u);CHKERRQ(ierr); ierr = VecViewFromOptions(u,NULL,"-vec_pre_adapt");CHKERRQ(ierr); /* adapt */ ierr = CreateAdaptivityLabel(base, &adaptLabel, u, 1.);CHKERRQ(ierr); ierr = DMForestTemplate(base,comm,&postForest);CHKERRQ(ierr); ierr = DMForestSetMinimumRefinement(postForest,0);CHKERRQ(ierr); ierr = DMForestSetInitialRefinement(postForest,0);CHKERRQ(ierr); ierr = DMForestSetAdaptivityLabel(postForest,adaptLabel);CHKERRQ(ierr); ierr = DMLabelDestroy(&adaptLabel);CHKERRQ(ierr); ierr = DMSetUp(postForest);CHKERRQ(ierr); ierr = DMViewFromOptions(postForest,NULL,"-dm_post_view");CHKERRQ(ierr); /* transfer */ ierr = DMCreateGlobalVector(postForest,&uNew);CHKERRQ(ierr); ierr = DMForestTransferVec(base,u,postForest,uNew,PETSC_TRUE,0.0);CHKERRQ(ierr); ierr = VecViewFromOptions(uNew,NULL,"-vec_post_transfer_view");CHKERRQ(ierr); GetDiscretizationInfo(postForest); ierr = DMGetDS(base, &prob);CHKERRQ(ierr); ierr = DMSetDS(postForest, prob);CHKERRQ(ierr); initializeTS(postForest,&user,&ts,tmax,dt); ierr = TSSetFromOptions(ts);CHKERRQ(ierr); ierr = TSSetSolution(ts,uNew);CHKERRQ(ierr); ierr = VecDuplicate(uNew, &r);CHKERRQ(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Solve linear system - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ TSSolve(ts,uNew); ierr = TSGetTime(ts, &ftime);CHKERRQ(ierr); ierr = TSGetStepNumber(ts, &steps);CHKERRQ(ierr); ierr = DMComputeL2Diff(postForest, ftime, user.exactConc, ctx, uNew, &error);CHKERRQ(ierr); if (error < 1.0e-12) {ierr = PetscPrintf(PETSC_COMM_WORLD, "L_2 Error: < 1.0e-12\n");CHKERRQ(ierr);} else {ierr = PetscPrintf(PETSC_COMM_WORLD, "L_2 Error: %g at time %f\n",(double) error,(double) ftime);CHKERRQ(ierr);} ierr = VecViewFromOptions(uNew, NULL, "-vec_sol_u");CHKERRQ(ierr); ierr = VecDuplicate(uNew, &u_exact);CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject) u_exact, "conc_ex");CHKERRQ(ierr); ierr = DMProjectFunction(postForest, ftime, user.exactConc, ctx, INSERT_ALL_VALUES, u_exact);CHKERRQ(ierr); ierr = VecViewFromOptions(u_exact, NULL, "-vec_sol_exact");CHKERRQ(ierr); ierr = VecDestroy(&u);CHKERRQ(ierr); ierr = VecDestroy(&uNew);CHKERRQ(ierr); ierr = VecDestroy(&u_exact);CHKERRQ(ierr); ierr = VecDestroy(&r);CHKERRQ(ierr); ierr = TSDestroy(&ts);CHKERRQ(ierr); ierr = DMDestroy(&base);CHKERRQ(ierr); ierr = DMDestroy(&postForest);CHKERRQ(ierr); ierr = PetscFree2(user.exactConc,user.bord_d);CHKERRQ(ierr); ierr = PetscFinalize(); return ierr; } From knepley at gmail.com Sun Nov 5 13:04:16 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 5 Nov 2017 14:04:16 -0500 Subject: [petsc-users] DMPlex with AMR In-Reply-To: <7709c6b8-113b-87df-c201-e6f4e55ad96c@univ-amu.fr> References: <7709c6b8-113b-87df-c201-e6f4e55ad96c@univ-amu.fr> Message-ID: On Sun, Nov 5, 2017 at 1:45 PM, Yann Jobic wrote: > Dear PETSc expert, > > I first correctly solve the advection/diffusion equation, with the > advection/diffusion of a gaussian, in a rotate field. > I'm using finit element, with PetscFE and the velocity field is in an > auxillary one, in order to correctly set the residual and the jacobian. > > Then, i would like to use p4est, with AMR. To do that, I re-use the > example : "petsc/examples/src/dm/impls/forest/examples/tests/ex2.c" > I create a "base" DM, which has everything : PetscFE, PetscDS, the > boundaries, ... Then, i correctly adapt the mesh, by using : > ierr = DMForestTemplate(base,comm,&postForest);CHKERRQ(ierr); > ierr = DMForestSetAdaptivityLabel(postForest,adaptLabel);CHKERRQ(ierr); > > My problem is that nothing happens when i'm solving the system, i.e. my > initial solution does not move, or diffuse. > > That would mean that i didn't correctly transfer the problem definition. I > tried to attach everything to the adapted mesh, without succes. > > What is the correct way to transfer the problem definition (PetscFE and > PetscDS) from one DM to another ? > One way to do it is just DMGetDS(dm, &prob); DMSetDS(newdm, prob); I think I have this working automatically now in knepley/feature-adaptor-plex Check out the test in SNES ex12 make -f ./gmakefile test globsearch="snes_tutorials-runex12_quad_q1_adapt_0" EXTRA_OPTIONS="-dm_adapt_view hdf5:$PWD/adapt.h5" ./bin/petsc_gen_xdmf.py adapt.h5 and then look at adapt.xmf in Paraview. This also works if you try it with simplicial meshes and Pragmatic, see tri_p1_adapt_1. Thanks, Matt > Many thanks in advance, > > Regards, > > Yann > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Nov 5 13:06:55 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 5 Nov 2017 14:06:55 -0500 Subject: [petsc-users] DMPlex with AMR In-Reply-To: References: <7709c6b8-113b-87df-c201-e6f4e55ad96c@univ-amu.fr> Message-ID: On Sun, Nov 5, 2017 at 2:04 PM, Matthew Knepley wrote: > On Sun, Nov 5, 2017 at 1:45 PM, Yann Jobic wrote: > >> Dear PETSc expert, >> >> I first correctly solve the advection/diffusion equation, with the >> advection/diffusion of a gaussian, in a rotate field. >> I'm using finit element, with PetscFE and the velocity field is in an >> auxillary one, in order to correctly set the residual and the jacobian. >> >> Then, i would like to use p4est, with AMR. To do that, I re-use the >> example : "petsc/examples/src/dm/impls/forest/examples/tests/ex2.c" >> I create a "base" DM, which has everything : PetscFE, PetscDS, the >> boundaries, ... Then, i correctly adapt the mesh, by using : >> ierr = DMForestTemplate(base,comm,&postForest);CHKERRQ(ierr); >> ierr = DMForestSetAdaptivityLabel(postForest,adaptLabel);CHKERRQ(ierr); >> >> My problem is that nothing happens when i'm solving the system, i.e. my >> initial solution does not move, or diffuse. >> >> That would mean that i didn't correctly transfer the problem definition. >> I tried to attach everything to the adapted mesh, without succes. >> >> What is the correct way to transfer the problem definition (PetscFE and >> PetscDS) from one DM to another ? >> > > One way to do it is just > > DMGetDS(dm, &prob); > DMSetDS(newdm, prob); > > I think I have this working automatically now in > > knepley/feature-adaptor-plex > > Check out the test in SNES ex12 > > make -f ./gmakefile test globsearch="snes_tutorials-runex12_quad_q1_adapt_0" > EXTRA_OPTIONS="-dm_adapt_view hdf5:$PWD/adapt.h5" > ./bin/petsc_gen_xdmf.py adapt.h5 > > and then look at adapt.xmf in Paraview. This also works if you try it with > simplicial meshes and Pragmatic, see tri_p1_adapt_1. > I just looked at your code. That is not going to adapt I think because there is nothing in TS to do it automatically yet. You can see it done by hand in TS ex11. I am slowly marching through getting all this stuff hooked up. I will get to TS hopefully by the end of the year. Thanks, Matt > Thanks, > > Matt > > >> Many thanks in advance, >> >> Regards, >> >> Yann >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 5 13:30:14 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sun, 5 Nov 2017 19:30:14 +0000 Subject: [petsc-users] SNES update solution vector In-Reply-To: References: <48A58E9A-AFB7-448A-B9DD-73813FF5566C@mcs.anl.gov> Message-ID: <1412A35C-17D5-4945-8506-A542C295A4AB@mcs.anl.gov> > On Nov 3, 2017, at 3:34 PM, Bikash Kanungo wrote: > > Hi Barry, > > So for Newton solvers that would work by explicitly setting the boundary conditions in my gradient(function) and Jacobian vectors. But in quasi-Newton solvers where the Jacobian is built from a history of previous Jacobians and current gradient vector, I can't enforce a new boundary condition. I can change the current gradient vector appropriately but I don't see a way handle the the Jacobian. There is no way to handle the "Jacobian" because it comes from the history which presumably includes different boundary conditions. So if you are changing the boundary conditions in your form function within the same nonlinear solve the only thing I can see that you could do is each time you change the boundary conditions you tell the quasi-Newton method to remove any current approximation to the Jacobian and start building it again. Of course this assumes you only change the boundary conditions after a bunch of function evaluations and that the Quasi-Newton implementation has support for removing the current approximation (you may need to add this option to the implementation and make a pull request). Barry > > Thanks, > Bikash > > > > On Fri, Nov 3, 2017 at 6:20 PM, Smith, Barry F. wrote: > > > You should not need to "tamper" with the solution process to achieve this. > > I would just change how my FormFunction and FormJacobian behave to implement the different boundary conditions. Why would that not work? > > Barry > > > On Nov 3, 2017, at 4:39 PM, Bikash Kanungo wrote: > > > > Hi Matt, > > > > I want to update the Dirichlet boundary condition on the solution vector on-the-fly. One way to do it is to destroy the current snes solver and create a new one with the new Dirichlet boundary condition (which means setting a new solution vector with a different size, size = # of non-Dirichlet rows). But is it possible to work with the current snes and instead enforce the new Dirichlet boundary condition on the current solution vector? > > > > Thanks, > > Bikash > > > > On Fri, Nov 3, 2017 at 5:19 PM, Matthew Knepley wrote: > > What do you want to do to it? > > > > Matt > > > > On Fri, Nov 3, 2017 at 5:14 PM, Bikash Kanungo wrote: > > Hi, > > > > I'm trying to solve a nonlinear problem using BFGS Quasi-Newton solver. I would like to tamper the solution vector x on-the-fly, based on some criterion. Is there a way to do so? Will SNESGetSolution(SNES snes, Vec * x) allow me to do so for each SNES iteration? > > > > Thanks, > > Bikash > > > > -- > > Bikash S. Kanungo > > PhD Student > > Computational Materials Physics Group > > Mechanical Engineering > > University of Michigan > > > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > Bikash S. Kanungo > > PhD Student > > Computational Materials Physics Group > > Mechanical Engineering > > University of Michigan > > > > > > > -- > Bikash S. Kanungo > PhD Student > Computational Materials Physics Group > Mechanical Engineering > University of Michigan > From bsmith at mcs.anl.gov Sun Nov 5 13:36:08 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sun, 5 Nov 2017 19:36:08 +0000 Subject: [petsc-users] petsc with fortran modules In-Reply-To: References: <82cb8ff6-217b-fd4d-012b-e634699fef34@ornl.gov> Message-ID: <7745C730-2D27-4999-AB07-C83843FE8E6A@mcs.anl.gov> > On Nov 3, 2017, at 5:41 PM, Satish Balay wrote: > > On Fri, 3 Nov 2017, Ed D'Azevedo wrote: > >> Dear PETSc expert, >> >> I have a question on the correct way to use Fortran module in petsc. >> >> In this url on "UsingFortran" >> >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html#UsingFortran >> >> >> it mentions including both the "petsc/finclude/petscXXX.h" file and the >> Fortran "use" statement. >> >> The example show the following: >> >> >> #include "petsc/finclude/petscvec.h" >> use petscvec >> >> Vec b >> type(tVec) x > > Just to note: its either of the above 2 statements. I do not understand Satish's comment. You are free to declare vectors in either of the two ways above. You can mix them if you want though that would be confusing and we definitely don't recommend it. We recommend using one or the other consistently as Vec b,x or type(tVec) x,b Ed, The change for handling Fortran in PETSc is dramatic, and vastly improves Fortran usage. We absolutely recommend that you just switch to the new format and the new release. It is in no way reasonable to expect to use both the current version of PETSc and previous versions with your same code by sticking some of your own #ifdef in your Fortran code based on the version. You need to just upgrade and stay upgraded. Barry > >> >> >> My understanding of Fortran syntax is there cannot be "parameter" statements >> before the module "use" statement or in other words "use" statement cannot >> follow "parameter" statement. >> >> Do I understand correctly then all ".h" include header files under >> "petsc/finclude/" should not have "parameter" statements but just pure cpp >> macro statements such as "#define" or "#ifdef"? > > yes. > > balay at asterix /home/balay/petsc ((v3.8)) > $ grep parameter include/petsc/finclude/* > include/petsc/finclude/petscsys.h:! F90 uses real(), conjg() when KIND parameter is used. > balay at asterix /home/balay/petsc ((v3.8)) > $ > >> >> One may imagine substituting "parameter" statements with '#define' such as >> instead of >> >> parameter (NOT_SET_VALUES=0) >> >> #define NOT_SET_VALUES 0 >> >> #define not_set_values 0 >> >> In the petsc version 3.6.2 under include/petsc/finclude, there seems to be >> some files such as petscvec.h and petscmat.h that contain "parameter" >> statements. > > The above doc corresponds to petsc-3.8 (with major changes for fortran > module usage) - which is an upgrade we recommend for fortran usage. > >> >> >> >> If there should be "parameter" statements in the petsc/finclude header files, >> perhaps the order of the code should be to list all F90 module "use" >> statements first, then include 'petsc/finclude' header files? > > For older releases - we split paramaters and #defines into different > set of includes. So one would use to get only the #defines > > #define "petscdef.h" > > > Satish > >> >> >> ! >> ------------------------------------------------------------------------------ >> >> ! parameter statements after the module use statement >> >> ! >> ------------------------------------------------------------------------------ >> >> use petscvec >> >> #include "petsc/finclude/petscvec.h" >> >> Vec b >> type(tVec) x >> >> >> From yann.jobic at univ-amu.fr Sun Nov 5 13:37:29 2017 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Sun, 5 Nov 2017 20:37:29 +0100 Subject: [petsc-users] DMPlex with AMR In-Reply-To: References: <7709c6b8-113b-87df-c201-e6f4e55ad96c@univ-amu.fr> Message-ID: Thanks for the fast answer ! Le 05/11/2017 ? 20:06, Matthew Knepley a ?crit?: > On Sun, Nov 5, 2017 at 2:04 PM, Matthew Knepley > wrote: > > On Sun, Nov 5, 2017 at 1:45 PM, Yann Jobic > wrote: > > Dear PETSc expert, > > I first correctly solve the advection/diffusion equation, with > the advection/diffusion of a gaussian, in a rotate field. > I'm using finit element, with PetscFE and the velocity field > is in an auxillary one, in order to correctly set the residual > and the jacobian. > > Then, i would like to use p4est, with AMR. To do that, I > re-use the example : > "petsc/examples/src/dm/impls/forest/examples/tests/ex2.c" > I create a "base" DM, which has everything : PetscFE, PetscDS, > the boundaries, ... Then, i correctly adapt the mesh, by using : > ierr = DMForestTemplate(base,comm,&postForest);CHKERRQ(ierr); > ierr = > DMForestSetAdaptivityLabel(postForest,adaptLabel);CHKERRQ(ierr); > > My problem is that nothing happens when i'm solving the > system, i.e. my initial solution does not move, or diffuse. > > That would mean that i didn't correctly transfer the problem > definition. I tried to attach everything to the adapted mesh, > without succes. > > What is the correct way to transfer the problem definition > (PetscFE and PetscDS) from one DM to another ? > > > One way to do it is just > > ?DMGetDS(dm, &prob); > ?DMSetDS(newdm, prob); > I'm doing it in the attached code, without succes. > > > I think I have this working automatically now in > > ? knepley/feature-adaptor-plex > > Check out the test in SNES ex12 > > ? make -f ./gmakefile test > globsearch="snes_tutorials-runex12_quad_q1_adapt_0" > EXTRA_OPTIONS="-dm_adapt_view hdf5:$PWD/adapt.h5" > ? ./bin/petsc_gen_xdmf.py adapt.h5 > > and then look at adapt.xmf in Paraview. This also works if you try > it with simplicial meshes and Pragmatic, see?tri_p1_adapt_1. > > > I just looked at your code. That is not going to adapt I think because > there is nothing in TS to do it automatically yet. You can see it done > by hand > in TS ex11. I am slowly marching through getting all this stuff hooked > up. I will get to TS hopefully by the end of the year. Yes i also looked at it . But i have the same problem using Vectagger and refinebox. I just wanted a smaller example to reproduce it. The thing here is that i just want to use the adapted mesh from the initial condition, and then fixe it for the rest of the simulation, even if the gaussian is moving (as a first step). It should be easier, but i still didn't succed. Do you think i have another problem ? Thanks, Yann > > ? Thanks, > > ? ? ?Matt > > ? Thanks, > > ? ? ? Matt > > Many thanks in advance, > > Regards, > > Yann > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Sun Nov 5 21:03:06 2017 From: hzhang at mcs.anl.gov (Hong) Date: Sun, 5 Nov 2017 21:03:06 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Mark: Bug is fixed in branch hzhang/fix-submat_samerowdist https://bitbucket.org/petsc/petsc/branch/hzhang/fix-submat_samerowdist I also add the test runex56. Please test it and let me know if there is a problem. Hong Also, I have been using -petscpartition_type but now I see > -pc_gamg_mat_partitioning_type. Is -petscpartition_type depreciated for > GAMG? > > Is this some sort of auto generated portmanteau? I can not find > pc_gamg_mat_partitioning_type in the source. > > On Thu, Nov 2, 2017 at 6:44 PM, Mark Adams wrote: > >> Great, thanks, >> >> And could you please add these parameters to a regression test? As I >> recall we have with-parmetis regression test. >> >> On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: >> >>> Mark: >>> I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( >>> Now testing src/snes/examples/tutorials/ex56.c with your options, I can >>> reproduce the error. >>> I'll fix it. >>> >>> Hong >>> >>> Hong, >>>> >>>> I've tested with master and I get the same error. Maybe the >>>> partitioning parameters are wrong. -pc_gamg_mat_partitioning_type is new to >>>> me. >>>> >>>> Can you run this (snes ex56) w/o the error? >>>> >>>> >>>> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ make >>>> runex >>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 >>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -matrap 0 >>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>> [0] 27 global equations, 9 vertices >>>> [0] 27 equations in vector, 9 vertices >>>> 0 SNES Function norm 122.396 >>>> 0 KSP Residual norm 122.396 >>>> >>>> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) >>>> [0] 4725 global equations, 1575 vertices >>>> [0] 4725 equations in vector, 1575 vertices >>>> 0 SNES Function norm 17.9091 >>>> [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [0]PETSC ERROR: No support for this operation for this object type >>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> for trouble shooting. >>>> >>>> >>>> On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: >>>> >>>>> Mark : >>>>> I realize that using maint or master branch, I cannot reproduce the >>>>> same error. >>>>> For this example, you must use a parallel partitioner, e.g.,'current' >>>>> gives me following error: >>>>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it currently >>>>> only supports one domain per processor >>>>> use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch for >>>>> more than one subdomain per processor >>>>> >>>>> Please rebase your branch with maint or master, then see if you still >>>>> have problem. >>>>> >>>>> Hong >>>>> >>>>> >>>>>> >>>>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong wrote: >>>>>> >>>>>>> Mark, >>>>>>> I can reproduce this in an old branch, but not in current maint and >>>>>>> master. >>>>>>> Which branch are you using to produce this error? >>>>>>> >>>>>> >>>>>> I am using a branch from Matt. Let me try to merge it with master. >>>>>> >>>>>> >>>>>>> Hong >>>>>>> >>>>>>> >>>>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams wrote: >>>>>>> >>>>>>>> I am able to reproduce this with snes ex56 with 2 processors and >>>>>>>> adding -pc_gamg_repartition true >>>>>>>> >>>>>>>> I'm not sure how to fix it. >>>>>>>> >>>>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>>>> PETSC_DIR=/Users/markadams/Codes/petsc >>>>>>>> PETSC_ARCH=arch-macosx-gnu-g runex >>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 >>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>>> jacobi -petscpartitioner_type simple -mat_block_size 3 -matrap 0 >>>>>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>> 0 SNES Function norm 122.396 >>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>> 1 KSP Residual norm 20.4696 >>>>>>>> 2 KSP Residual norm 3.95009 >>>>>>>> 3 KSP Residual norm 0.176181 >>>>>>>> 4 KSP Residual norm 0.0208781 >>>>>>>> 5 KSP Residual norm 0.00278873 >>>>>>>> 6 KSP Residual norm 0.000482741 >>>>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>>>> 12 KSP Residual norm 1.989e-10 >>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>>>>> 1 SNES Function norm 1.990e-10 >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>> iterations 1 >>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>> type: plex >>>>>>>> Mesh in 3 dimensions: >>>>>>>> 0-cells: 12 12 >>>>>>>> 1-cells: 20 20 >>>>>>>> 2-cells: 11 11 >>>>>>>> 3-cells: 2 2 >>>>>>>> Labels: >>>>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), >>>>>>>> 6 (1)) >>>>>>>> marker: 1 strata with value/size (1 (27)) >>>>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>>>>>> [0] 441 global equations, 147 vertices >>>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>>> 0 SNES Function norm 49.7106 >>>>>>>> 0 KSP Residual norm 49.7106 >>>>>>>> 1 KSP Residual norm 12.9252 >>>>>>>> 2 KSP Residual norm 2.38019 >>>>>>>> 3 KSP Residual norm 0.426307 >>>>>>>> 4 KSP Residual norm 0.0692155 >>>>>>>> 5 KSP Residual norm 0.0123092 >>>>>>>> 6 KSP Residual norm 0.00184874 >>>>>>>> 7 KSP Residual norm 0.000320761 >>>>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>>>> 15 KSP Residual norm 2.174e-10 >>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>>>>> 1 SNES Function norm 2.174e-10 >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>> iterations 1 >>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>> type: plex >>>>>>>> Mesh in 3 dimensions: >>>>>>>> 0-cells: 45 45 >>>>>>>> 1-cells: 96 96 >>>>>>>> 2-cells: 68 68 >>>>>>>> 3-cells: 16 16 >>>>>>>> Labels: >>>>>>>> marker: 1 strata with value/size (1 (129)) >>>>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 >>>>>>>> (9), 6 (9)) >>>>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) >>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>> -------------------------------------------------------------- >>>>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>> -------------------------------------------------------------- >>>>>>>> [1]PETSC ERROR: No support for this operation for this object type >>>>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>>>> rchurchi at pppl.gov> wrote: >>>>>>>>> >>>>>>>>>> Doing some additional testing, the issue goes away when removing >>>>>>>>>> the gamg preconditioner line from the petsc.rc: >>>>>>>>>> -pc_type gamg >>>>>>>>>> >>>>>>>>> >>>>>>>>> Yea, this is GAMG setup. >>>>>>>>> >>>>>>>>> This is the code. findices is create with ISCreateStride, so it >>>>>>>>> is sorted ... >>>>>>>>> >>>>>>>>> Michael is repartitioning the coarse grids. Maybe we don't have a >>>>>>>>> regression test with this... >>>>>>>>> >>>>>>>>> I will try to reproduce this. >>>>>>>>> >>>>>>>>> Michael: you can use hypre for now, or turn repartitioning off >>>>>>>>> (eg, -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), but >>>>>>>>> I'm not sure this will fix this. >>>>>>>>> >>>>>>>>> You don't have hypre parameters for all of your all of your >>>>>>>>> solvers. I think 'boomeramg' is the default pc_hypre_type. That should be >>>>>>>>> good enough for you. >>>>>>>>> >>>>>>>>> >>>>>>>>> { >>>>>>>>> IS findices; >>>>>>>>> PetscInt Istart,Iend; >>>>>>>>> Mat Pnew; >>>>>>>>> >>>>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>>>> &Iend);CHKERRQ(ierr); >>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>> #endif >>>>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>>>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>>>>>>>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>>>> >>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>> #endif >>>>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>>>> >>>>>>>>> /* output - repartitioned */ >>>>>>>>> *a_P_inout = Pnew; >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >>>>>>>>>> >>>>>>>>>>> Randy: >>>>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>>>> Hong >>>>>>>>>>> >>>>>>>>>>> OK, this might not be completely satisfactory, because it >>>>>>>>>>>> doesn't show the partitioning or how the matrix is created, but this >>>>>>>>>>>> reproduces the problem. I wrote out my matrix, Amat, from the larger >>>>>>>>>>>> simulation, and load it in this script. This must be run with MPI rank >>>>>>>>>>>> greater than 1. This may be some combination of my petsc.rc, because when I >>>>>>>>>>>> use the PetscInitialize with it, it throws the error, but when using >>>>>>>>>>>> default (PETSC_NULL_CHARACTER) it runs fine. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Randy: >>>>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>>>> It would be helpful if you can provide us a simple example >>>>>>>>>>>>> that produces this example. >>>>>>>>>>>>> Hong >>>>>>>>>>>>> >>>>>>>>>>>>> I'm running a Fortran code that was just changed over to using >>>>>>>>>>>>>> petsc 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() >>>>>>>>>>>>>> call. The error is "unsorted iscol_local is not implemented yet" (see full >>>>>>>>>>>>>> error below). I tried to trace down the difference in the source files, but >>>>>>>>>>>>>> where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npets >>>>>>>>>>>>>> cloc,nreal,nreal,ierr) >>>>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>> object type >>>>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, >>>>>>>>>>>>>> unknown[62]PETSC ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() >>>>>>>>>>>>>> line 3418 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>> >>>>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>> >>>>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>> >>>>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>>>> >>>>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfu >>>>>>>>>>>>>> nc.c >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> R. Michael Churchill >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qhjun at live.com Mon Nov 6 08:25:31 2017 From: qhjun at live.com (Chi huijun) Date: Mon, 6 Nov 2017 14:25:31 +0000 Subject: [petsc-users] DMCreateGlobalVector in Fortran with petsc 3.8 Message-ID: Hi. I have one questions. I use DMCreateGlobalVector to creat a global vec by this ... DM::datotv Vec:ptotveq ... call DMDACreate1d(MPI_COMM_WORLD,DM_BOUNDARY_NONE,globalnode*cdofn,1,1,PETSC_NULL_INTEGER,datotv,ierr) call DMCreateGlobalVector(datotv,ptotveq,ierr) it gives the wrong code [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Null argument, when expecting valid pointer [0]PETSC ERROR: Null Object: Parameter # 2 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.8.1, unknown [0]PETSC ERROR: /mnt/NFSwork/GeHoMadrid_Petsc/EXEC/GeHoMadrid on a arch-linux2-c-debug named linux-2euo by hstarlinux Mon Nov 6 22:21:50 2017 [0]PETSC ERROR: Configure options --download-fblaslapack --download-metis --download-mpich --download-mumps --download-parmetis --download-scalapack PETSC_ARCH=arch-linux2-c-debug [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 78 in /home/hstarlinux/work/petsc/src/vec/vec/interface/vector.c [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 41 in /home/hstarlinux/work/petsc/src/dm/impls/da/dadist.c [0]PETSC ERROR: #3 DMCreateGlobalVector() line 844 in /home/hstarlinux/work/petsc/src/dm/interface/dm.c The code run well with PETSc 3.6 I haven't find the Fortran example code of DMCreateGlobalVector. So how to use it correctly in fortran? Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Nov 6 10:00:07 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 6 Nov 2017 11:00:07 -0500 Subject: [petsc-users] DMCreateGlobalVector in Fortran with petsc 3.8 In-Reply-To: References: Message-ID: On Mon, Nov 6, 2017 at 9:25 AM, Chi huijun wrote: > Hi. I have one questions. > > > > I use DMCreateGlobalVector to creat a global vec by this > > ? > > DM::datotv > > Vec:ptotveq > > ? > > call DMDACreate1d(MPI_COMM_WORLD,DM_BOUNDARY_NONE,globalnode*cdofn,1,1 > ,PETSC_NULL_INTEGER,datotv,ierr) > > call DMCreateGlobalVector(datotv,ptotveq,ierr) > > > > it gives the wrong code > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Null argument, when expecting valid pointer > [0]PETSC ERROR: Null Object: Parameter # 2 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.8.1, unknown > [0]PETSC ERROR: /mnt/NFSwork/GeHoMadrid_Petsc/EXEC/GeHoMadrid on a > arch-linux2-c-debug named linux-2euo by hstarlinux Mon Nov 6 22:21:50 2017 > [0]PETSC ERROR: Configure options --download-fblaslapack --download-metis > --download-mpich --download-mumps --download-parmetis --download-scalapack > PETSC_ARCH=arch-linux2-c-debug > [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 78 in > /home/hstarlinux/work/petsc/src/vec/vec/interface/vector.c > [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 41 in > /home/hstarlinux/work/petsc/src/dm/impls/da/dadist.c > [0]PETSC ERROR: #3 DMCreateGlobalVector() line 844 in > /home/hstarlinux/work/petsc/src/dm/interface/dm.c > > > > The code run well with PETSc 3.6 > You are missing DMSetUp(). > I haven?t find the Fortran example code of DMCreateGlobalVector. So how to > use it correctly in fortran? > Here is one https://bitbucket.org/petsc/petsc/src/dc424cfae46ffd055acbb99fbc61fc85fc92f9ad/src/snes/examples/tutorials/ex5f90.F90?at=master&fileviewer=file-view-default Thanks, Matt > > > Thank you. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ccetinbas at anl.gov Mon Nov 6 10:38:45 2017 From: ccetinbas at anl.gov (Cetinbas, Cankur Firat) Date: Mon, 6 Nov 2017 16:38:45 +0000 Subject: [petsc-users] petsc4py sparse matrix construction time In-Reply-To: References: Message-ID: It took me a while to respond because I couldn?t get converged results with the iterative solvers (they didn?t give any errors but the results didn?t match with Matlab mldivide and Python spsolve). So we spent some time trying to reconfigure petsc and petsc4py with MUMPS. I don?t know if locally called mat.createAIJ or setValues come with any overhead but I timed each section of the code locally. Single core 2 cores 4 cores 8 cores 16 cores bcast 0.064 0.064 0.071 0.069 local vector init 0.004 0.0028 0.0015 0.00087 scatterv 0.0076 0.0056 0.0079 0.0099 convert to int 32 0.0024 0.0015 0.00094 0.00072 calculate indptr 0.0028 0.0018 0.0011 0.0007 create AIJ 0.044 0.0037 0.022 0.017 0.013 RHS vector 0.0016 0.0017 0.0019 0.0023 solve (MUMPS) 10.94 7.23 5.67 5.78 5.13 total time (s) 10.984 7.3161 5.7694 5.88134 5.22649 MUMPS solver is the most time consuming part so I need to find a way to make iterative solvers converge. The single core MUMPS solver takes almost 9 times longer with respect to mldivide in Matlab. I might be doing something wrong setting up the solver but the results of MUMPS match with Matlab. ksp = PETSc.KSP() ksp.create() ksp.setOperators(pA) ksp.setType('preonly') pc = ksp.getPC() pc.setType('lu') pc.setFactorSolverPackage('mumps') ksp.solve(rhv,x) In addition I don?t understand why broadcast takes so much time w.r.t. scatter. I only pass a small array at the size of number of processes. Regards, Firat From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Wednesday, November 01, 2017 3:35 AM To: Cetinbas, Cankur Firat Cc: Smith, Barry F.; petsc-users at mcs.anl.gov; Ahluwalia, Rajesh K. Subject: Re: [petsc-users] petsc4py sparse matrix construction time On Tue, Oct 31, 2017 at 10:00 PM, Cetinbas, Cankur Firat > wrote: Hi, Thanks a lot. Based on both of your suggestions I modified the code using Mat.createAIJ() and csr option. The computation time decreased significantly after using this method. Still if there is a better option please let me know after seeing the modified code below. At first trial with 1000x1000 matrix with 96019 non-zeros in the matrix, the computation time did not scale with the number of cores : Single core python @ 0.0035s, single core petsc @ 0.0024s, 2 cores petsc @ 0.0036s, 4 cores petsc @ 0.0032, 8 cores petsc @ 0.0030s. Then I tried with larger matrix 181797x181797 with more non-zeros and I got the following results: Single core python @ 0.021, single core petsc @ 0.031, 2 cores petsc @ 0.024s, 4 cores petsc @ 0.014, 8 cores petsc @ 0.009s, 16 cores petsc @ 0.0087s. I think the optimum number of nodes is highly dependent on matrix size and the number of non-zeros. In the real code matrix size (and so the number of non-zero elements) will grow at every iteration starting with very small matrices growing to very big ones. Is it possible to set the number process from the code dynamically? I am not sure how you are interpreting these measurements. Normally, I would say 1) Everything timed below is "parallel overhead". This is not intended to scale with P, instead it will look like a constant, as you observe 2) The time to compute the matrix entires should far outstrip the time below to figure the nonzero structure. Is this true? 3) Solve time is often larger than matrix calculation. Is it? Thus, we deciding on parallelism, you need to look at the largest costs, and how they scale with P. Thanks, Matt Another question is about the data types; mpi4py only let me transfer float type data, and petsc4py only lets me use int32 type indices. Besides keep converting the data, is there any solution for this? The modified code for matrix creation part: comm = MPI.COMM_WORLD rank = comm.Get_rank() size = comm.Get_size() if rank==0: row = np.loadtxt('row1000.out').astype(dtype='int32') col = np.loadtxt('col1000.out').astype(dtype='int32') val = np.loadtxt('val1000.out').astype(dtype='int32') m = 1000 # 1000 x 1000 matrix if size>1: rbc = np.bincount(row)*1.0 ieq = int(np.floor(m/size)) a = [ieq]*size ix = int(np.mod(m,size)) if ix>0: for i in range(0,ix): a[i]= a[i]+1 a = np.array([0]+a).cumsum() b = np.zeros(a.shape[0]-1) for i in range(0,a.shape[0]-1): b[i]=rbc[a[i]:a[i+1]].sum() # b is the send counts for Scatterv row = row.astype(dtype=float) col = col.astype(dtype=float) val = val.astype(dtype=float) else: row=None col=None val=None indpt=None b=None m=None if size>1: ml = comm.bcast(m,root=0) bl = comm.bcast(b,root=0) row_lcl = np.zeros(bl[rank]) col_lcl = row_lcl.copy() val_lcl = row_lcl.copy() comm.Scatterv([row,b],row_lcl) comm.Scatterv([col,b],col_lcl) comm.Scatterv([val,b],val_lcl) comm.Barrier() row_lcl = row_lcl.astype(dtype='int32') col_lcl = col_lcl.astype(dtype='int32') val_lcl = val_lcl.astype(dtype='int32') indptr = np.bincount(row_lcl) indptr = indptr[indptr>0] indptr = np.insert(indptr,0,0).cumsum() indptr = indptr.astype(dtype='int32') comm.Barrier() pA = PETSc.Mat().createAIJ(size=(ml,ml),csr=(indptr, col_lcl, val_lcl)) # Matrix generation else: indptr = np.bincount(row) indptr = np.insert(indptr,0,0).cumsum() indptr = indptr.astype(dtype='int32') st=time.time() pA = PETSc.Mat().createAIJ(size=(m,m),csr=(indptr, col, val)) print('dt:',time.time()-st) Regards, Firat -----Original Message----- From: Smith, Barry F. Sent: Tuesday, October 31, 2017 10:18 AM To: Matthew Knepley Cc: Cetinbas, Cankur Firat; petsc-users at mcs.anl.gov; Ahluwalia, Rajesh K. Subject: Re: [petsc-users] petsc4py sparse matrix construction time You also need to make sure that most matrix entries are generated on the process that they will belong on. Barry > On Oct 30, 2017, at 8:01 PM, Matthew Knepley > wrote: > > On Mon, Oct 30, 2017 at 8:06 PM, Cetinbas, Cankur Firat > wrote: > Hello, > > > > I am a beginner both in PETSc and mpi4py. I have been working on parallelizing our water transport code (where we solve linear system of equations) and I started with the toy code below. > > > > The toy code reads right hand size (rh), row, column, value vectors to construct sparse coefficient matrix and then scatters them to construct parallel PETSc coefficient matrix and right hand side vector. > > > > The sparse matrix generation time is extremely high in comparison to sps.csr_matrix((val, (row, col)), shape=(n,n)) in python. For instance python generates 181197x181197 sparse matrix in 0.06 seconds and this code with 32 cores:1.19s, 16 cores:6.98s and 8 cores:29.5 s. I was wondering if I am making a mistake in generating sparse matrix? Is there a more efficient way? > > > It looks like you do not preallocate the matrix. There is a chapter on this in the manual. > > Matt > > Thanks for your help in advance. > > > > Regards, > > > > Firat > > > > from petsc4py import PETSc > > from mpi4py import MPI > > import numpy as np > > import time > > > > comm = MPI.COMM_WORLD > > rank = comm.Get_rank() > > size = comm.Get_size() > > > > if rank==0: > > # proc 0 loads tomo image and does fast calculations to append row, col, val, rh lists > > # in the real code this vectors will be available on proc 0 no txt files are read > > row = np.loadtxt('row.out') # indices of non-zero rows > > col = np.loadtxt('col.out') # indices of non-zero columns > > val = np.loadtxt('vs.out') # values in the sparse matrix > > rh = np.loadtxt('RHS.out') # right hand side vector > > n = row.shape[0] #1045699 > > m = rh.shape[0] #181197 square sparse matrix size > > else: > > n = None > > m = None > > row = None > > col = None > > val = None > > rh = None > > rh_ind = None > > > > m_lcl = comm.bcast(m,root=0) > > n_lcl = comm.bcast(n,root=0) > > neq = n_lcl//size > > meq = m_lcl//size > > nx = np.mod(n_lcl,size) > > mx = np.mod(m_lcl,size) > > row_lcl = np.zeros(neq) > > col_lcl = np.zeros(neq) > > val_lcl = np.zeros(neq) > > rh_lcl = np.zeros(meq) > > a = [neq]*size #send counts for Scatterv > > am = [meq]*size #send counts for Scatterv > > > > if nx>0: > > for i in range(0,nx): > > if rank==i: > > row_lcl = np.zeros(neq+1) > > col_lcl = np.zeros(neq+1) > > val_lcl = np.zeros(neq+1) > > a[i] = a[i]+1 > > if mx>0: > > for ii in range(0,mx): > > if rank==ii: > > rh_lcl = np.zeros(meq+1) > > am[ii] = am[ii]+1 > > > > comm.Scatterv([row,a],row_lcl) > > comm.Scatterv([col,a],col_lcl) > > comm.Scatterv([val,a],val_lcl) > > comm.Scatterv([rh,am],rh_lcl) > > comm.Barrier() > > > > A = PETSc.Mat() > > A.create() > > A.setSizes([m_lcl,m_lcl]) > > A.setType('aij') > > A.setUp() > > lr = row_lcl.shape[0] > > for i in range(0,lr): > > A[row_lcl[i],col_lcl[i]] = val_lcl[i] > > A.assemblyBegin() > > A.assemblyEnd() > > > > if size>1: # to get the range for scattered vectors > > ami = [0] > > ami = np.array([0]+am).cumsum() > > for kk in range(0,size): > > if rank==kk: > > Is = ami[kk] > > Ie = ami[kk+1] > > else: > > Is=0; Ie=m_lcl > > > > b= PETSc.Vec() > > b.create() > > b.setSizes(m_lcl) > > b.setFromOptions() > > b.setUp() > > b.setValues(list(range(Is,Ie)),rh_lcl) > > b.assemblyBegin() > > b.assemblyEnd() > > > > # solution vector > > x = b.duplicate() > > x.assemblyBegin() > > x.assemblyEnd() > > > > # create linear solver > > ksp = PETSc.KSP() > > ksp.create() > > ksp.setOperators(A) > > ksp.setType('cg') > > #ksp.getPC().setType('icc') # only sequential > > ksp.getPC().setType('jacobi') > > print('solving with:', ksp.getType()) > > > > #solve > > st=time.time() > > ksp.solve(b,x) > > et=time.time() > > print(et-st) > > > > if size>1: > > #gather > > if rank==0: > > xGthr = np.zeros(m) > > else: > > xGthr = None > > comm.Gatherv(x,[xGthr,am]) > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bikash at umich.edu Mon Nov 6 13:01:52 2017 From: bikash at umich.edu (Bikash Kanungo) Date: Mon, 6 Nov 2017 14:01:52 -0500 Subject: [petsc-users] SNES update solution vector In-Reply-To: <1412A35C-17D5-4945-8506-A542C295A4AB@mcs.anl.gov> References: <48A58E9A-AFB7-448A-B9DD-73813FF5566C@mcs.anl.gov> <1412A35C-17D5-4945-8506-A542C295A4AB@mcs.anl.gov> Message-ID: Thank you Barry. I will see if I can make the changes to the incorporate re-building the Jacobian(s). On Sun, Nov 5, 2017 at 2:30 PM, Smith, Barry F. wrote: > > > > On Nov 3, 2017, at 3:34 PM, Bikash Kanungo wrote: > > > > Hi Barry, > > > > So for Newton solvers that would work by explicitly setting the boundary > conditions in my gradient(function) and Jacobian vectors. But in > quasi-Newton solvers where the Jacobian is built from a history of previous > Jacobians and current gradient vector, I can't enforce a new boundary > condition. I can change the current gradient vector appropriately but I > don't see a way handle the the Jacobian. > > There is no way to handle the "Jacobian" because it comes from the > history which presumably includes different boundary conditions. So if you > are changing the boundary conditions in your form function within the same > nonlinear solve the only thing I can see that you could do is each time you > change the boundary conditions you tell the quasi-Newton method to remove > any current approximation to the Jacobian and start building it again. Of > course this assumes you only change the boundary conditions after a bunch > of function evaluations and that the Quasi-Newton implementation has > support for removing the current approximation (you may need to add this > option to the implementation and make a pull request). > > Barry > > > > > Thanks, > > Bikash > > > > > > > > On Fri, Nov 3, 2017 at 6:20 PM, Smith, Barry F. > wrote: > > > > > > You should not need to "tamper" with the solution process to achieve > this. > > > > I would just change how my FormFunction and FormJacobian behave to > implement the different boundary conditions. Why would that not work? > > > > Barry > > > > > On Nov 3, 2017, at 4:39 PM, Bikash Kanungo wrote: > > > > > > Hi Matt, > > > > > > I want to update the Dirichlet boundary condition on the solution > vector on-the-fly. One way to do it is to destroy the current snes solver > and create a new one with the new Dirichlet boundary condition (which means > setting a new solution vector with a different size, size = # of > non-Dirichlet rows). But is it possible to work with the current snes and > instead enforce the new Dirichlet boundary condition on the current > solution vector? > > > > > > Thanks, > > > Bikash > > > > > > On Fri, Nov 3, 2017 at 5:19 PM, Matthew Knepley > wrote: > > > What do you want to do to it? > > > > > > Matt > > > > > > On Fri, Nov 3, 2017 at 5:14 PM, Bikash Kanungo > wrote: > > > Hi, > > > > > > I'm trying to solve a nonlinear problem using BFGS Quasi-Newton > solver. I would like to tamper the solution vector x on-the-fly, based on > some criterion. Is there a way to do so? Will SNESGetSolution(SNES snes, > Vec * x) allow me to do so for each SNES iteration? > > > > > > Thanks, > > > Bikash > > > > > > -- > > > Bikash S. Kanungo > > > PhD Student > > > Computational Materials Physics Group > > > Mechanical Engineering > > > University of Michigan > > > > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > -- > > > Bikash S. Kanungo > > > PhD Student > > > Computational Materials Physics Group > > > Mechanical Engineering > > > University of Michigan > > > > > > > > > > > > > -- > > Bikash S. Kanungo > > PhD Student > > Computational Materials Physics Group > > Mechanical Engineering > > University of Michigan > > > > -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From bikash at umich.edu Mon Nov 6 13:11:21 2017 From: bikash at umich.edu (Bikash Kanungo) Date: Mon, 6 Nov 2017 14:11:21 -0500 Subject: [petsc-users] SNESGetJacobian Quasi-Newton Message-ID: Hi, I would like to access the approximate Jacobian in quasi-Newton solvers (e.g., BFGS) to analyze its condition number. I guess for BFGS, where the approximate Jacobian is built from outer products of several vectors coming from current and previous gradient vectors, the approximate Jacobian might not be stored explicitly as a matrix and instead the Jacobian times a vector is performed in a matrix-free manner. My reason for believing so is the fact that when I try to retrieve the Jacobian matrix using SNESGetJacobian and feed it to SLEPc, I incur segmentation fault at EPSSetOperators. If that's the case, is there a way to still get access to the approximate Jacobian and feed it to SLEPc for eigen analysis, like with the use of MatShell operations? Thanks, Bikash -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Nov 6 13:42:36 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 6 Nov 2017 14:42:36 -0500 Subject: [petsc-users] SNESGetJacobian Quasi-Newton In-Reply-To: References: Message-ID: On Mon, Nov 6, 2017 at 2:11 PM, Bikash Kanungo wrote: > Hi, > > I would like to access the approximate Jacobian in quasi-Newton solvers > (e.g., BFGS) to analyze its condition number. I guess for BFGS, where the > approximate Jacobian is built from outer products of several vectors coming > from current and previous gradient vectors, the approximate Jacobian might > not be stored explicitly as a matrix and instead the Jacobian times a > vector is performed in a matrix-free manner. My reason for believing so is > the fact that when I try to retrieve the Jacobian matrix using > SNESGetJacobian and feed it to SLEPc, I incur segmentation fault at > EPSSetOperators. > > If that's the case, is there a way to still get access to the approximate > Jacobian and feed it to SLEPc for eigen analysis, like with the use of > MatShell operations? > There is no code in SNESQN to apply the Jacobian, just its inverse. You could wrap up that code in a MatShell and hand it to SLEPc. I am not sure what you are looking for, but its doable. I would think there are analytic expressions or at least bounds for the condition number. Thanks, Matt > Thanks, > Bikash > > -- > Bikash S. Kanungo > PhD Student > Computational Materials Physics Group > Mechanical Engineering > University of Michigan > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bikash at umich.edu Mon Nov 6 17:29:00 2017 From: bikash at umich.edu (Bikash Kanungo) Date: Mon, 6 Nov 2017 18:29:00 -0500 Subject: [petsc-users] SNESGetJacobian Quasi-Newton In-Reply-To: References: Message-ID: Thank you so much Matthew. I'll try to wrap the Jacobian inverse as a MatShell and pass it to SLEPc. On Mon, Nov 6, 2017 at 2:42 PM, Matthew Knepley wrote: > On Mon, Nov 6, 2017 at 2:11 PM, Bikash Kanungo wrote: > >> Hi, >> >> I would like to access the approximate Jacobian in quasi-Newton solvers >> (e.g., BFGS) to analyze its condition number. I guess for BFGS, where the >> approximate Jacobian is built from outer products of several vectors coming >> from current and previous gradient vectors, the approximate Jacobian might >> not be stored explicitly as a matrix and instead the Jacobian times a >> vector is performed in a matrix-free manner. My reason for believing so is >> the fact that when I try to retrieve the Jacobian matrix using >> SNESGetJacobian and feed it to SLEPc, I incur segmentation fault at >> EPSSetOperators. >> >> If that's the case, is there a way to still get access to the approximate >> Jacobian and feed it to SLEPc for eigen analysis, like with the use of >> MatShell operations? >> > > There is no code in SNESQN to apply the Jacobian, just its inverse. You > could wrap up that code in a MatShell and hand it to SLEPc. I am not > sure what you are looking for, but its doable. I would think there are > analytic expressions or at least bounds for the condition number. > > Thanks, > > Matt > > >> Thanks, >> Bikash >> >> -- >> Bikash S. Kanungo >> PhD Student >> Computational Materials Physics Group >> Mechanical Engineering >> University of Michigan >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hbuesing at eonerc.rwth-aachen.de Tue Nov 7 03:19:11 2017 From: hbuesing at eonerc.rwth-aachen.de (Buesing, Henrik) Date: Tue, 7 Nov 2017 09:19:11 +0000 Subject: [petsc-users] Newton methods that converge all the time Message-ID: Dear all, I am solving a system of nonlinear, transient PDEs. I am using Newton's method in every time step to solve the nonlinear algebraic equations. Of course, Newton's method only converges if the initial guess is sufficiently close to the solution. This is often not the case and Newton's method diverges. Then, I reduce the time step and try again. This can become prohibitively costly, if the time steps get very small. I am thus looking for variants of Newton's method, which have a bigger convergence radius or ideally converge all the time. I tried out the pseudo-timestepping described in http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex1f.F.html. However, this does converge even worse. I am seeing breakdown when I have phase changes (e.g. liquid to two-phase). I was under the impression that pseudo-timestepping should converge better. Thus, my question: Am I doing something wrong or is it possible that Newton's method converges and pseudo-timestepping does not? Thank you for any insight on this. Henrik -- Dipl.-Math. Henrik B?sing Institute for Applied Geophysics and Geothermal Energy E.ON Energy Research Center RWTH Aachen University ------------------------------------------------------ Mathieustr. 10 | Tel +49 (0)241 80 49907 52074 Aachen, Germany | Fax +49 (0)241 80 49889 ------------------------------------------------------ http://www.eonerc.rwth-aachen.de/GGE hbuesing at eonerc.rwth-aachen.de ------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 7 05:53:44 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 7 Nov 2017 06:53:44 -0500 Subject: [petsc-users] Newton methods that converge all the time In-Reply-To: References: Message-ID: On Tue, Nov 7, 2017 at 4:19 AM, Buesing, Henrik < hbuesing at eonerc.rwth-aachen.de> wrote: > Dear all, > > > > I am solving a system of nonlinear, transient PDEs. I am using Newton?s > method in every time step to solve the nonlinear algebraic equations. Of > course, Newton?s method only converges if the initial guess is sufficiently > close to the solution. > > This is often not the case and Newton?s method diverges. Then, I reduce > the time step and try again. This can become prohibitively costly, if the > time steps get very small. I am thus looking for variants of Newton?s > method, which have a bigger convergence radius or ideally converge all the > time. > > > > I tried out the pseudo-timestepping described in > http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/ > tutorials/ex1f.F.html. > > > > However, this does converge even worse. I am seeing breakdown when I have > phase changes (e.g. liquid to two-phase). > > > > I was under the impression that pseudo-timestepping should converge > better. Thus, my question: > > Am I doing something wrong or is it possible that Newton?s method > converges and pseudo-timestepping does not? > > Thank you for any insight on this. > Hi Hendrik, I would try using NGMRES as a nonlinear preconditioner. I have an example in my tutorial slides for using it with SNES ex19. I hope this will work because I suspect that around the phase boundary Newton directions are noisy, since sometimes you step into the other phase. NGMRES takes a few directions (you set the m) and then picks the best one. Hopefully this helps, Matt > Henrik > > > > > > > -- > > Dipl.-Math. Henrik B?sing > > Institute for Applied Geophysics and Geothermal Energy > > E.ON Energy Research Center > > RWTH Aachen University > > ------------------------------------------------------ > > Mathieustr. 10 > > | Tel +49 (0)241 80 49907 <+49%20241%208049907> > > 52074 Aachen, Germany | Fax +49 (0)241 80 49889 > <+49%20241%208049889> > > ------------------------------------------------------ > > http://www.eonerc.rwth-aachen.de/GGE > > hbuesing at eonerc.rwth-aachen.de > > ------------------------------------------------------ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gspr at nonempty.org Tue Nov 7 06:17:23 2017 From: gspr at nonempty.org (Gard Spreemann) Date: Tue, 07 Nov 2017 13:17:23 +0100 Subject: [petsc-users] Help with PETSc signal handling Message-ID: <1536133.zgCtiWFWTP@moose> Hello list, I seem to be misunderstanding how to install a signal handler with PETSc. If I send the USR1 signal to a program with the below code, it exits with a non-zero exit code as if no signal handler were intalled. I'd appreciate if someone could shed some light on the matter. *** #include #include #include #include #include #include PetscErrorCode handler(int signum, void * ctx) { if (signum == SIGUSR1) { *((sig_atomic_t *) ctx) = 1; } return 0; } int main(int argc, char ** argv) { printf("I am %lu, hear me roar.\n", getpid()); PetscInitialize(&argc, &argv, NULL, NULL); sig_atomic_t flag = 0; PetscErrorCode err; err = PetscPopSignalHandler(); CHKERRQ(err); err = PetscPushSignalHandler(handler, (void *)&flag); CHKERRQ(err); for (int i = 0; i < 20; i++) { printf("I'm awake. Did anything happen?\n"); if (flag) { printf("YES!\n"); return 0; } else printf("No...\n"); err = PetscSleep(5); CHKERRQ(err); } PetscFinalize(); return 0; } *** Best regards, Gard Spreemann From mfadams at lbl.gov Tue Nov 7 06:35:36 2017 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 7 Nov 2017 07:35:36 -0500 Subject: [petsc-users] Help with PETSc signal handling In-Reply-To: <1536133.zgCtiWFWTP@moose> References: <1536133.zgCtiWFWTP@moose> Message-ID: PETSc's signal handler is for segvs, etc. I don't know the details but I don't think we care about external signals. On Tue, Nov 7, 2017 at 7:17 AM, Gard Spreemann wrote: > Hello list, > > I seem to be misunderstanding how to install a signal handler with > PETSc. If I send the USR1 signal to a program with the below code, it > exits with a non-zero exit code as if no signal handler were > intalled. I'd appreciate if someone could shed some light on the > matter. > > *** > > #include > #include > #include > #include > #include > #include > > PetscErrorCode handler(int signum, void * ctx) > { > if (signum == SIGUSR1) > { > *((sig_atomic_t *) ctx) = 1; > } > return 0; > } > > > int main(int argc, char ** argv) > { > printf("I am %lu, hear me roar.\n", getpid()); > > PetscInitialize(&argc, &argv, NULL, NULL); > > sig_atomic_t flag = 0; > PetscErrorCode err; > err = PetscPopSignalHandler(); CHKERRQ(err); > err = PetscPushSignalHandler(handler, (void *)&flag); CHKERRQ(err); > > for (int i = 0; i < 20; i++) > { > printf("I'm awake. Did anything happen?\n"); > if (flag) > { > printf("YES!\n"); > return 0; > } > else > printf("No...\n"); > > err = PetscSleep(5); CHKERRQ(err); > } > > PetscFinalize(); > return 0; > } > > > *** > > > Best regards, > Gard Spreemann > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gspr at nonempty.org Tue Nov 7 06:47:54 2017 From: gspr at nonempty.org (Gard Spreemann) Date: Tue, 07 Nov 2017 13:47:54 +0100 Subject: [petsc-users] Help with PETSc signal handling In-Reply-To: References: <1536133.zgCtiWFWTP@moose> Message-ID: <3357209.g0sAtxkE7J@moose> On Tuesday 7 November 2017 07:35:36 CET Mark Adams wrote: > PETSc's signal handler is for segvs, etc. I don't know the details but I > don't think we care about external signals. I see. I'll sketch what I'm trying to achieve in case someone can think of another approach. I have some long-running SLEPc eigenvalue computations, and I'd like to have SLURM signal my program that its time limit is drawing near. In that case, my problem would set a flag and before the next iteration of the SLEPc eigenvalue solver it would give up and save the eigenvalues it has so far managed to obtain. The only workaround I can think of would be to let my program keep track of its time limit on its own and check it at each iteration. There is no intention for PETSc to have handling of user-defined signals? Thanks. -- Gard From knepley at gmail.com Tue Nov 7 07:27:39 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 7 Nov 2017 08:27:39 -0500 Subject: [petsc-users] Help with PETSc signal handling In-Reply-To: <3357209.g0sAtxkE7J@moose> References: <1536133.zgCtiWFWTP@moose> <3357209.g0sAtxkE7J@moose> Message-ID: On Tue, Nov 7, 2017 at 7:47 AM, Gard Spreemann wrote: > On Tuesday 7 November 2017 07:35:36 CET Mark Adams wrote: > > PETSc's signal handler is for segvs, etc. I don't know the details but I > > don't think we care about external signals. > Its a little more nuanced than that. We specifically ignore USR1 and USR2 https://bitbucket.org/petsc/petsc/src/17bd883d72f40a596f2d89b5afda5a233b621464/src/sys/error/signal.c?at=master&fileviewer=file-view-default#signal.c-239 > I see. I'll sketch what I'm trying to achieve in case someone can > think of another approach. > > I have some long-running SLEPc eigenvalue computations, and I'd like > to have SLURM signal my program that its time limit is drawing > near. In that case, my problem would set a flag and before the next > iteration of the SLEPc eigenvalue solver it would give up and save the > eigenvalues it has so far managed to obtain. > > The only workaround I can think of would be to let my program keep > track of its time limit on its own and check it at each iteration. > > There is no intention for PETSc to have handling of user-defined > signals? > I would use SIGHUP. Matt > > Thanks. > > -- Gard > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bernardomartinsrocha at gmail.com Tue Nov 7 07:43:22 2017 From: bernardomartinsrocha at gmail.com (Bernardo Rocha) Date: Tue, 7 Nov 2017 11:43:22 -0200 Subject: [petsc-users] Performance of Fieldsplit PC Message-ID: Hello everyone, I have a general question about the performance of the PCFieldSplit that I'm not sure if I understood properly. Consider a simple Poisson problem discretized by FEM into a system Ax=b which is then solved by CG and Jacobi. Then, I create a "vectorial Poisson" problem by simply adding another block of this problem to create a block-like version of it. Something like [ [A, 0] [0, A]] then I create a PCFieldSplit with CG and Jacobi for each block. Either with additive or multiplicative fieldsplit, the PC is much better and solves it with fewer iterations than the scalar case. However, the execution time taken by the PCFieldSplit is much bigger than the simple Jacobi for the scalar case. (From -log_view I see all the time difference in PCApply) Why is this happening? Best regards, Bernardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Tue Nov 7 07:54:26 2017 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Tue, 7 Nov 2017 14:54:26 +0100 Subject: [petsc-users] Performance of Fieldsplit PC In-Reply-To: References: Message-ID: >From what you're describing, it sounds like the solver you're using is GMRES (if you are using the default), preconditioned with fieldsplit with nested CG-Jacobi solves. That is, your preconditioner involves inner CG solves on each field, so is a much "heavier". This seems consistent with your observation of fewer (outer) Krylov iterations but much more work being done per iteration. This should all be visible with -ksp_view. Do you see what you expect if, instead of CG/Jacobi on each block, you use Preonly/Jacobi on each block? On Tue, Nov 7, 2017 at 2:43 PM, Bernardo Rocha < bernardomartinsrocha at gmail.com> wrote: > Hello everyone, > > I have a general question about the performance of the PCFieldSplit > that I'm not sure if I understood properly. > > Consider a simple Poisson problem discretized by FEM into a system Ax=b > which is then solved by CG and Jacobi. > > Then, I create a "vectorial Poisson" problem by simply adding another block > of this problem to create a block-like version of it. > Something like > [ [A, 0] > [0, A]] > then I create a PCFieldSplit with CG and Jacobi for each block. > > Either with additive or multiplicative fieldsplit, the PC is much better > and solves it > with fewer iterations than the scalar case. However, the execution time > taken by > the PCFieldSplit is much bigger than the simple Jacobi for the scalar case. > > (From -log_view I see all the time difference in PCApply) > > Why is this happening? > > Best regards, > Bernardo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 7 08:03:39 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 7 Nov 2017 09:03:39 -0500 Subject: [petsc-users] Performance of Fieldsplit PC In-Reply-To: References: Message-ID: On Tue, Nov 7, 2017 at 8:43 AM, Bernardo Rocha < bernardomartinsrocha at gmail.com> wrote: > Hello everyone, > > I have a general question about the performance of the PCFieldSplit > that I'm not sure if I understood properly. > > Consider a simple Poisson problem discretized by FEM into a system Ax=b > which is then solved by CG and Jacobi. > > Then, I create a "vectorial Poisson" problem by simply adding another block > of this problem to create a block-like version of it. > Something like > [ [A, 0] > [0, A]] > then I create a PCFieldSplit with CG and Jacobi for each block. > > Either with additive or multiplicative fieldsplit, the PC is much better > and solves it > with fewer iterations than the scalar case. However, the execution time > taken by > the PCFieldSplit is much bigger than the simple Jacobi for the scalar case. > > (From -log_view I see all the time difference in PCApply) > > Why is this happening? > 1) This is block-Jacobi, why not use PCBJACOBI? Is it because you want to select rows? 2) We cannot tell anything without knowing how many iterates were used -ksp_monitor_true_residual -ksp_converged_reason -pc_fieldsplit_[0,1]_ksp_monitor_true_residual 3) We cannot say anything about performance without seeing the log for both runs -log_view Thanks, Matt > Best regards, > Bernardo > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gspr at nonempty.org Tue Nov 7 08:11:41 2017 From: gspr at nonempty.org (Gard Spreemann) Date: Tue, 07 Nov 2017 15:11:41 +0100 Subject: [petsc-users] Help with PETSc signal handling In-Reply-To: References: <1536133.zgCtiWFWTP@moose> <3357209.g0sAtxkE7J@moose> Message-ID: <1717504.sRf1D73FZy@moose> On Tuesday 7 November 2017 08:27:39 CET Matthew Knepley wrote: > On Tue, Nov 7, 2017 at 7:47 AM, Gard Spreemann wrote: > > > On Tuesday 7 November 2017 07:35:36 CET Mark Adams wrote: > > > PETSc's signal handler is for segvs, etc. I don't know the details but I > > > don't think we care about external signals. > > > > Its a little more nuanced than that. We specifically ignore USR1 and USR2 > > > https://bitbucket.org/petsc/petsc/src/17bd883d72f40a596f2d89b5afda5a233b621464/src/sys/error/signal.c?at=master&fileviewer=file-view-default#signal.c-239 Ah, thanks a lot! -- Gard From Javier.SIERRA-AUSIN at isae-supaero.fr Tue Nov 7 09:18:21 2017 From: Javier.SIERRA-AUSIN at isae-supaero.fr (SIERRA-AUSIN Javier) Date: Tue, 07 Nov 2017 16:18:21 +0100 Subject: [petsc-users] Coloring of a finite volume unstructured mesh Message-ID: <2d31-5a01ce80-f-394b2b00@130530600> Hi, I would like to? ask you concerning the computation of the Jacobian matrix via finite difference and coloring of the connectivity graph. I wonder whether it is possible or not to color the Jacobian matrix of a given solver that evaluates the RHS with its associated connectivity in the global indeces of my solver (not PETSc). As well, if it is possible to do this from an already partioned domain in parallel. All of this is better explained in this post : https://scicomp.stackexchange.com/questions/28209/linking-petsc-with-an-already-parallel-in-house-finite-volume-solver? Thanks in advance, Javier. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bernardomartinsrocha at gmail.com Tue Nov 7 09:55:19 2017 From: bernardomartinsrocha at gmail.com (Bernardo Rocha) Date: Tue, 7 Nov 2017 13:55:19 -0200 Subject: [petsc-users] Performance of Fieldsplit PC In-Reply-To: References: Message-ID: Thanks for the reply. 1) This is block-Jacobi, why not use PCBJACOBI? Is it because you want to > select rows? > I'm only using it to understand the performance behavior of PCFieldSplit since I'm also having the same issue in a large and more complex problem. ? > 2) We cannot tell anything without knowing how many iterates were used > -ksp_monitor_true_residual -ksp_converged_reason > -pc_fieldsplit_[0,1]_ksp_monitor_true_residual > > 3) We cannot say anything about performance without seeing the log for > both runs > -log_view > I'm sending to you the log files with the recommended command line arguments for the three cases. 1-scalar case 2-PCFieldSplit (as we were initially running) 3-PCFieldSplit with Preonly/Jacobi in each block, as suggested by Patrick. As Patrick pointed out, with Preonly/Jacobi the behavior is closer to what I expected. Please note that the log was taken for 100 calls to KSPSolve, I just simplified it. What would be the proper way of creating this block preconditioner As you can see, the timing with PCFieldSplit is bigger for case 3. For case 2 it is nearly 2x, as I expected (I don't know if this idea makes sense). So for the case 2, the reason for the large timing is due to the inner/outer solver? ?Does the "machinery" behind the PCFieldSplit for a block preconditioner results in some performance overhead? (neglecting the efficiency of the PC itself) Best regards, Bernardo? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- 0 KSP preconditioned resid norm 9.909609586673e+01 true resid norm 6.621816260761e+10 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.273902381670e+01 true resid norm 1.094099204442e+09 ||r(i)||/||b|| 1.652264516800e-02 2 KSP preconditioned resid norm 6.460292016408e+00 true resid norm 2.813270803973e+08 ||r(i)||/||b|| 4.248488168788e-03 3 KSP preconditioned resid norm 4.267523086682e+00 true resid norm 1.227432629606e+08 ||r(i)||/||b|| 1.853619280982e-03 4 KSP preconditioned resid norm 2.956794930503e+00 true resid norm 5.892889289089e+07 ||r(i)||/||b|| 8.899203869502e-04 5 KSP preconditioned resid norm 1.553867540080e+00 true resid norm 1.629422769996e+07 ||r(i)||/||b|| 2.460688587286e-04 6 KSP preconditioned resid norm 5.863411243068e-01 true resid norm 2.339364215736e+06 ||r(i)||/||b|| 3.532813541805e-05 7 KSP preconditioned resid norm 2.949598244316e-01 true resid norm 6.042774335918e+05 ||r(i)||/||b|| 9.125554225548e-06 8 KSP preconditioned resid norm 1.810861505194e-01 true resid norm 2.303685149906e+05 ||r(i)||/||b|| 3.478932454766e-06 9 KSP preconditioned resid norm 1.063228930690e-01 true resid norm 7.738363702513e+04 ||r(i)||/||b|| 1.168616493993e-06 10 KSP preconditioned resid norm 5.539338985670e-02 true resid norm 1.753917253023e+04 ||r(i)||/||b|| 2.648695137339e-07 11 KSP preconditioned resid norm 2.897182710946e-02 true resid norm 2.729986729636e+03 ||r(i)||/||b|| 4.122715916799e-08 12 KSP preconditioned resid norm 1.695869301131e-02 true resid norm 4.766670897136e+02 ||r(i)||/||b|| 7.198434250408e-09 13 KSP preconditioned resid norm 9.226255542270e-03 true resid norm 2.547340821913e+02 ||r(i)||/||b|| 3.846891429181e-09 14 KSP preconditioned resid norm 4.999664022085e-03 true resid norm 2.173100659726e+02 ||r(i)||/||b|| 3.281729021392e-09 15 KSP preconditioned resid norm 2.822889124856e-03 true resid norm 1.227719812726e+02 ||r(i)||/||b|| 1.854052973353e-09 16 KSP preconditioned resid norm 1.612223553945e-03 true resid norm 7.400379602761e+01 ||r(i)||/||b|| 1.117575497619e-09 17 KSP preconditioned resid norm 8.796911180671e-04 true resid norm 7.103508796600e+01 ||r(i)||/||b|| 1.072743265121e-09 18 KSP preconditioned resid norm 4.819064182097e-04 true resid norm 7.368434157044e+01 ||r(i)||/||b|| 1.112751225175e-09 19 KSP preconditioned resid norm 2.758550257952e-04 true resid norm 8.624440250822e+01 ||r(i)||/||b|| 1.302428202656e-09 20 KSP preconditioned resid norm 1.527969474414e-04 true resid norm 9.706581744817e+01 ||r(i)||/||b|| 1.465848849104e-09 21 KSP preconditioned resid norm 8.423993997128e-05 true resid norm 1.055048065519e+02 ||r(i)||/||b|| 1.593291060900e-09 22 KSP preconditioned resid norm 4.630697601290e-05 true resid norm 1.164992856073e+02 ||r(i)||/||b|| 1.759325251860e-09 23 KSP preconditioned resid norm 2.582157010506e-05 true resid norm 1.251285547265e+02 ||r(i)||/||b|| 1.889640995748e-09 24 KSP preconditioned resid norm 1.439945755021e-05 true resid norm 1.468650092478e+02 ||r(i)||/||b|| 2.217896170241e-09 25 KSP preconditioned resid norm 7.649560254453e-06 true resid norm 1.686932293751e+02 ||r(i)||/||b|| 2.547537151926e-09 26 KSP preconditioned resid norm 4.116629854365e-06 true resid norm 1.862186959283e+02 ||r(i)||/||b|| 2.812199683519e-09 27 KSP preconditioned resid norm 2.293702349842e-06 true resid norm 2.098309185181e+02 ||r(i)||/||b|| 3.168781951283e-09 28 KSP preconditioned resid norm 1.302840266880e-06 true resid norm 2.200686132505e+02 ||r(i)||/||b|| 3.323387490447e-09 29 KSP preconditioned resid norm 7.703427859997e-07 true resid norm 1.626698090949e+02 ||r(i)||/||b|| 2.456573886818e-09 Linear solve converged due to CONVERGED_RTOL iterations 29 KSP Object: 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: jacobi linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=9953, cols=9953 total: nonzeros=132617, allocated nonzeros=298590 total number of mallocs used during MatSetValues calls =0 not using I-node routines Number of iterations: 29 Residual norm: 7.70343e-07 Total time: 2.67914 Writing data file Done ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./poisson on a arch-linux2-c-debug named localhost.localdomain with 1 processor, by joventino Tue Nov 7 13:00:48 2017 Using Petsc Release Version 3.5.4, May, 23, 2015 Max Max/Min Avg Total Time (sec): 3.104e+00 1.00000 3.104e+00 Objects: 6.042e+03 1.00000 6.042e+03 Flops: 4.430e+09 1.00000 4.430e+09 4.430e+09 Flops/sec: 1.427e+09 1.00000 1.427e+09 1.427e+09 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.1044e+00 100.0% 4.4303e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 5900 1.0 1.1548e+00 1.0 1.51e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 0 0 0 37 34 0 0 0 1304 MatAssemblyBegin 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 9.4891e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 100 1.0 4.6666e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMDot 2900 1.0 3.2593e-01 1.0 8.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 20 0 0 0 10 20 0 0 0 2657 VecNorm 6001 1.0 5.5606e-02 1.0 1.19e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 2148 VecScale 3000 1.0 1.6580e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 1801 VecCopy 3100 1.0 3.1643e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecSet 9141 1.0 5.4358e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecAXPY 3000 1.0 2.2286e-02 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2680 VecAYPX 3000 1.0 4.8099e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 621 VecMAXPY 5900 1.0 6.5193e-01 1.0 1.79e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 40 0 0 0 21 40 0 0 0 2745 VecPointwiseMult 3000 1.0 5.8018e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 515 VecNormalize 3000 1.0 4.7506e-02 1.0 8.96e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1886 KSPGMRESOrthog 2900 1.0 6.4069e-01 1.0 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 39 0 0 0 21 39 0 0 0 2703 KSPSetUp 100 1.0 2.6917e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 100 1.0 2.6619e+00 1.0 4.43e+09 1.0 0.0e+00 0.0e+00 0.0e+00 86100 0 0 0 86100 0 0 0 1664 PCSetUp 1 1.0 7.1526e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 3000 1.0 6.0910e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 490 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 1 0 0 0 Vector 6038 6036 489688608 0 Krylov Solver 1 1 18616 0 Preconditioner 1 1 856 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 4.76837e-08 #PETSc Option Table entries: -ksp_converged_reason -ksp_monitor_true_residual -ksp_rtol 1e-8 -ksp_type gmres -ksp_view -log_view -m /home/joventino/Downloads/russa.xml -pc_type jacobi #End of PETSc Option Table entries -------------- next part -------------- 0 KSP preconditioned resid norm 9.515173597913e+01 true resid norm 9.364662363510e+10 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.204643328070e-06 true resid norm 2.672963301097e+00 ||r(i)||/||b|| 2.854308246619e-11 2 KSP preconditioned resid norm 2.804824521054e-13 true resid norm 2.770364057767e-04 ||r(i)||/||b|| 2.958317075650e-15 Linear solve converged due to CONVERGED_RTOL iterations 2 KSP Object: 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: fieldsplit FieldSplit with ADDITIVE composition: total splits = 2 Solver info for each split is in the following KSP objects: Split number 0 Defined by IS KSP Object: (fieldsplit_X_) 1 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_X_) 1 MPI processes type: jacobi linear system matrix = precond matrix: Mat Object: (fieldsplit_X_) 1 MPI processes type: seqaij rows=9953, cols=9953 total: nonzeros=132617, allocated nonzeros=132617 total number of mallocs used during MatSetValues calls =0 not using I-node routines Split number 1 Defined by IS KSP Object: (fieldsplit_Y_) 1 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_Y_) 1 MPI processes type: jacobi linear system matrix = precond matrix: Mat Object: (fieldsplit_Y_) 1 MPI processes type: seqaij rows=9953, cols=9953 total: nonzeros=132617, allocated nonzeros=132617 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=19906, cols=19906 total: nonzeros=265234, allocated nonzeros=1.19436e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines Number of iterations: 2 Residual norm: 2.80482e-13 Total time: 5.1269 Writing data file Done ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./poisson on a arch-linux2-c-debug named localhost.localdomain with 1 processor, by joventino Tue Nov 7 13:01:56 2017 Using Petsc Release Version 3.5.4, May, 23, 2015 Max Max/Min Avg Total Time (sec): 5.609e+00 1.00000 5.609e+00 Objects: 6.370e+02 1.00000 6.370e+02 Flops: 7.589e+09 1.00000 7.589e+09 7.589e+09 Flops/sec: 1.353e+09 1.00000 1.353e+09 1.353e+09 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 5.6092e+00 100.0% 7.5885e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 19300 1.0 3.5085e+00 1.0 5.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00 63 67 0 0 0 63 67 0 0 0 1441 MatAssemblyBegin 3 1.0 1.4305e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 3 1.0 3.1419e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 2 1.0 2.2733e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 300 1.0 1.8467e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMDot 200 1.0 7.8726e-03 1.0 1.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1517 VecTDot 37600 1.0 3.4746e-01 1.0 7.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00 6 10 0 0 0 6 10 0 0 0 2154 VecNorm 20100 1.0 1.8720e-01 1.0 4.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 5 0 0 0 3 5 0 0 0 2212 VecScale 300 1.0 2.8970e-03 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2061 VecCopy 1600 1.0 2.0661e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 1925 1.0 2.7724e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 37900 1.0 3.0721e-01 1.0 7.60e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 10 0 0 0 5 10 0 0 0 2475 VecAYPX 18500 1.0 2.6606e-01 1.0 3.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 5 0 0 0 5 5 0 0 0 1384 VecMAXPY 500 1.0 1.6370e-02 1.0 3.18e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1946 VecPointwiseMult 19400 1.0 2.5479e-01 1.0 1.93e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 3 0 0 0 5 3 0 0 0 758 VecScatterBegin 1200 1.0 2.5261e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 300 1.0 9.1097e-03 1.0 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1967 KSPGMRESOrthog 200 1.0 1.4125e-02 1.0 2.39e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1691 KSPSetUp 102 1.0 3.4118e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 100 1.0 5.0754e+00 1.0 7.59e+09 1.0 0.0e+00 0.0e+00 0.0e+00 90100 0 0 0 90100 0 0 0 1495 PCSetUp 3 1.0 2.5232e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 300 1.0 4.7249e+00 1.0 7.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 84 95 0 0 0 84 95 0 0 0 1532 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 2 3506688 0 Vector 621 619 98550000 0 Vector Scatter 2 2 1288 0 Krylov Solver 3 3 21064 0 Preconditioner 3 3 2720 0 Viewer 1 0 0 0 Index Set 4 2 1568 0 ======================================================================================================================== Average time to get PetscTime(): 4.76837e-08 #PETSc Option Table entries: -fieldsplit_X_ksp_rtol 1e-8 -fieldsplit_X_ksp_type cg -fieldsplit_X_pc_type jacobi -fieldsplit_Y_ksp_rtol 1e-8 -fieldsplit_Y_ksp_type cg -fieldsplit_Y_pc_type jacobi -ksp_converged_reason -ksp_monitor_true_residual -ksp_view -log_view -m /home/joventino/Downloads/russa.xml -pc_fieldsplit_[0,1]_ksp_monitor_true_residual -pc_fieldsplit_type additive -pc_type fieldsplit #End of PETSc Option Table entries -------------- next part -------------- 0 KSP preconditioned resid norm 1.401430427530e+02 true resid norm 9.364662363510e+10 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.801570025298e+01 true resid norm 1.547289933503e+09 ||r(i)||/||b|| 1.652264516799e-02 2 KSP preconditioned resid norm 9.136232586494e+00 true resid norm 3.978565725596e+08 ||r(i)||/||b|| 4.248488168777e-03 3 KSP preconditioned resid norm 6.035189026926e+00 true resid norm 1.735851871679e+08 ||r(i)||/||b|| 1.853619280971e-03 4 KSP preconditioned resid norm 4.181539491874e+00 true resid norm 8.333803954090e+07 ||r(i)||/||b|| 8.899203869392e-04 5 KSP preconditioned resid norm 2.197500549312e+00 true resid norm 2.304351780071e+07 ||r(i)||/||b|| 2.460688587183e-04 6 KSP preconditioned resid norm 8.292115701718e-01 true resid norm 3.308360600270e+06 ||r(i)||/||b|| 3.532813540786e-05 7 KSP preconditioned resid norm 4.171361840663e-01 true resid norm 8.545773410120e+05 ||r(i)||/||b|| 9.125554214767e-06 8 KSP preconditioned resid norm 2.560944900224e-01 true resid norm 3.257902772942e+05 ||r(i)||/||b|| 3.478932444630e-06 9 KSP preconditioned resid norm 1.503632773689e-01 true resid norm 1.094369879940e+05 ||r(i)||/||b|| 1.168616483392e-06 10 KSP preconditioned resid norm 7.833808320116e-02 true resid norm 2.480413464487e+04 ||r(i)||/||b|| 2.648695028400e-07 11 KSP preconditioned resid norm 4.097235082492e-02 true resid norm 3.860783217718e+03 ||r(i)||/||b|| 4.122714805780e-08 12 KSP preconditioned resid norm 2.398321365671e-02 true resid norm 6.741080801947e+02 ||r(i)||/||b|| 7.198423755473e-09 13 KSP preconditioned resid norm 1.304789571780e-02 true resid norm 3.602473867603e+02 ||r(i)||/||b|| 3.846880675208e-09 14 KSP preconditioned resid norm 7.070592667344e-03 true resid norm 3.073218548774e+02 ||r(i)||/||b|| 3.281718474708e-09 15 KSP preconditioned resid norm 3.992168085451e-03 true resid norm 1.736249159923e+02 ||r(i)||/||b|| 1.854043522902e-09 16 KSP preconditioned resid norm 2.280028415568e-03 true resid norm 1.046566169144e+02 ||r(i)||/||b|| 1.117569570071e-09 17 KSP preconditioned resid norm 1.244071109869e-03 true resid norm 1.004589356606e+02 ||r(i)||/||b|| 1.072744876014e-09 18 KSP preconditioned resid norm 6.815185924237e-04 true resid norm 1.042054174147e+02 ||r(i)||/||b|| 1.112751462570e-09 19 KSP preconditioned resid norm 3.901179187272e-04 true resid norm 1.219681423290e+02 ||r(i)||/||b|| 1.302429682935e-09 20 KSP preconditioned resid norm 2.160875153605e-04 true resid norm 1.372716915795e+02 ||r(i)||/||b|| 1.465847739630e-09 21 KSP preconditioned resid norm 1.191332656017e-04 true resid norm 1.492063885795e+02 ||r(i)||/||b|| 1.593291704364e-09 22 KSP preconditioned resid norm 6.548795351426e-05 true resid norm 1.647547652308e+02 ||r(i)||/||b|| 1.759324136157e-09 23 KSP preconditioned resid norm 3.651721464750e-05 true resid norm 1.769585591874e+02 ||r(i)||/||b|| 1.889641637021e-09 24 KSP preconditioned resid norm 2.036390815968e-05 true resid norm 2.076983894694e+02 ||r(i)||/||b|| 2.217895118981e-09 25 KSP preconditioned resid norm 1.081811185654e-05 true resid norm 2.385683116238e+02 ||r(i)||/||b|| 2.547537779401e-09 26 KSP preconditioned resid norm 5.821793768980e-06 true resid norm 2.633529192793e+02 ||r(i)||/||b|| 2.812198764426e-09 27 KSP preconditioned resid norm 3.243784970505e-06 true resid norm 2.967457868685e+02 ||r(i)||/||b|| 3.168782550290e-09 28 KSP preconditioned resid norm 1.842494373206e-06 true resid norm 3.112239423017e+02 ||r(i)||/||b|| 3.323386687324e-09 29 KSP preconditioned resid norm 1.089429216198e-06 true resid norm 2.300499177590e+02 ||r(i)||/||b|| 2.456574608129e-09 Linear solve converged due to CONVERGED_RTOL iterations 29 KSP Object: 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: fieldsplit FieldSplit with ADDITIVE composition: total splits = 2 Solver info for each split is in the following KSP objects: Split number 0 Defined by IS KSP Object: (fieldsplit_X_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_X_) 1 MPI processes type: jacobi linear system matrix = precond matrix: Mat Object: (fieldsplit_X_) 1 MPI processes type: seqaij rows=9953, cols=9953 total: nonzeros=132617, allocated nonzeros=132617 total number of mallocs used during MatSetValues calls =0 not using I-node routines Split number 1 Defined by IS KSP Object: (fieldsplit_Y_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_Y_) 1 MPI processes type: jacobi linear system matrix = precond matrix: Mat Object: (fieldsplit_Y_) 1 MPI processes type: seqaij rows=9953, cols=9953 total: nonzeros=132617, allocated nonzeros=132617 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=19906, cols=19906 total: nonzeros=265234, allocated nonzeros=1.19436e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines Number of iterations: 29 Residual norm: 1.08943e-06 Total time: 6.58937 0 0 Writing data file Done ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./poisson on a arch-linux2-c-debug named localhost.localdomain with 1 processor, by joventino Tue Nov 7 13:04:06 2017 Using Petsc Release Version 3.5.4, May, 23, 2015 Max Max/Min Avg Total Time (sec): 7.098e+00 1.00000 7.098e+00 Objects: 6.060e+03 1.00000 6.060e+03 Flops: 8.865e+09 1.00000 8.865e+09 8.865e+09 Flops/sec: 1.249e+09 1.00000 1.249e+09 1.249e+09 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 7.0984e+00 100.0% 8.8646e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 5900 1.0 2.8474e+00 1.0 3.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 34 0 0 0 40 34 0 0 0 1058 MatAssemblyBegin 3 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 3 1.0 2.9294e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 2 1.0 2.2478e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 300 1.0 1.9589e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMDot 2900 1.0 7.7983e-01 1.0 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11 20 0 0 0 11 20 0 0 0 2221 VecNorm 6100 1.0 1.1267e-01 1.0 2.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 2155 VecScale 3000 1.0 3.1897e-02 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1872 VecCopy 3100 1.0 7.6415e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecSet 18148 1.0 2.6946e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 VecAXPY 3000 1.0 5.7204e-02 1.0 1.19e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2088 VecAYPX 3000 1.0 1.0830e-01 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 551 VecMAXPY 5900 1.0 1.5695e+00 1.0 3.58e+09 1.0 0.0e+00 0.0e+00 0.0e+00 22 40 0 0 0 22 40 0 0 0 2280 VecPointwiseMult 6000 1.0 1.0145e-01 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 589 VecScatterBegin 12000 1.0 1.8266e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 VecNormalize 3000 1.0 9.1894e-02 1.0 1.79e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1950 KSPGMRESOrthog 2900 1.0 1.5169e+00 1.0 3.46e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 39 0 0 0 21 39 0 0 0 2283 KSPSetUp 102 1.0 2.6441e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 100 1.0 6.5351e+00 1.0 8.86e+09 1.0 0.0e+00 0.0e+00 0.0e+00 92100 0 0 0 92100 0 0 0 1356 PCSetUp 3 1.0 2.5163e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 3000 1.0 4.7949e-01 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 1 0 0 0 7 1 0 0 0 125 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 2 3506688 0 Vector 6044 6042 970785840 0 Vector Scatter 2 2 1288 0 Krylov Solver 3 3 20936 0 Preconditioner 3 3 2720 0 Viewer 1 0 0 0 Index Set 4 2 1568 0 ======================================================================================================================== Average time to get PetscTime(): 4.76837e-08 #PETSc Option Table entries: -fieldsplit_X_ksp_rtol 1e-8 -fieldsplit_X_ksp_type preonly -fieldsplit_X_pc_type jacobi -fieldsplit_Y_ksp_rtol 1e-8 -fieldsplit_Y_ksp_type preonly -fieldsplit_Y_pc_type jacobi -ksp_converged_reason -ksp_monitor_true_residual -ksp_view -log_view -m /home/joventino/Downloads/russa.xml -pc_fieldsplit_[0,1]_ksp_monitor_true_residual -pc_fieldsplit_type additive -pc_type fieldsplit #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-suitesparse --download-hypre --download-mpich --with-debugging=0 ----------------------------------------- Libraries compiled on Fri Aug 4 16:15:03 2017 on localhost.localdomain Machine characteristics: Linux-4.11.9-200.fc25.x86_64-x86_64-with-fedora-25-Twenty_Five Using PETSc directory: /home/joventino/source/petsc-3.5.4 Using PETSc arch: arch-linux2-c-debug ----------------------------------------- Using C compiler: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/include -I/home/joventino/source/petsc-3.5.4/include -I/home/joventino/source/petsc-3.5.4/include -I/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/include ----------------------------------------- Using C linker: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpicc Using Fortran linker: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpif90 Using libraries: -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -L/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -L/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lHYPRE -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -L/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -lmpichcxx -lstdc++ -lflapack -lfblas -lpthread -lm -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -L/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -L/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -ldl -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl ----------------------------------------- From knepley at gmail.com Tue Nov 7 10:13:06 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 7 Nov 2017 11:13:06 -0500 Subject: [petsc-users] Coloring of a finite volume unstructured mesh In-Reply-To: <2d31-5a01ce80-f-394b2b00@130530600> References: <2d31-5a01ce80-f-394b2b00@130530600> Message-ID: On Tue, Nov 7, 2017 at 10:18 AM, SIERRA-AUSIN Javier < Javier.SIERRA-AUSIN at isae-supaero.fr> wrote: > Hi, > > I would like to ask you concerning the computation of the Jacobian matrix > via finite difference and coloring of the connectivity graph. > I wonder whether it is possible or not to color the Jacobian matrix of a > given solver that evaluates the RHS with its associated connectivity in the > global indeces of my solver (not PETSc). > As well, if it is possible to do this from an already partioned domain in > parallel. > All of this is better explained in this post : > https://scicomp.stackexchange.com/questions/28209/linking- > petsc-with-an-already-parallel-in-house-finite-volume-solver > The simplest thing you can do is to use the finite-difference Jacobian action (MatMFFD). This is setup automatically by SNES if you give a FormFunction pointer, but no FormJacobian routine. Just tell the PETSc Vecs to use your ParMetis layout (by setting the local sizes), and it should run fine in SNES. However, usually you need some kind of preconditioning. Thus you either have to form the Jacobian or some approximation. If you cannot form an approximation, then you can use coloring. Once option is to create a DMPlex with your mesh information. This can be done in parallel after you have already partitioned with ParMetis (as long as you know the "overlap" of vertices, or adjacency of cells). Then the coloring can be done automatically using that DM information. Otherwise, you will have to supply a coloring to the SNES. Thanks, Matt > Thanks in advance, > > Javier. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Nov 7 10:45:30 2017 From: hzhang at mcs.anl.gov (Hong) Date: Tue, 7 Nov 2017 10:45:30 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Mark, The fix is merged to next branch for tests which show diff as ******* Testing: testexamples_PARMETIS ******* 5c5 < 1 SNES Function norm 1.983e-10 --- > 1 SNES Function norm 1.990e-10 10,13c10,13 < 0-cells: 8 8 8 8 < 1-cells: 12 12 12 12 < 2-cells: 6 6 6 6 < 3-cells: 1 1 1 1 --- > 0-cells: 12 12 0 0 > 1-cells: 20 20 0 0 > 2-cells: 11 11 0 0 > 3-cells: 2 2 0 0 15,18c15,18 < boundary: 1 strata with value/size (1 (23)) < Face Sets: 4 strata with value/size (1 (1), 2 (1), 4 (1), 6 (1)) < marker: 1 strata with value/size (1 (15)) < depth: 4 strata with value/size (0 (8), 1 (12), 2 (6), 3 (1)) --- > boundary: 1 strata with value/size (1 (39)) > Face Sets: 5 strata with value/size (1 (2), 2 (2), 4 (2), 5 (1), 6 (1)) > marker: 1 strata with value/size (1 (27)) > depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) see http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2017/11/07/examples_full_next-tmp.log I guess parmetis produces random partition on different machines (I made output file for ex56_1 on my imac). Please take a look at the differences. If the outputs are correct, I will remove option '-ex56_dm_view' Hong On Sun, Nov 5, 2017 at 9:03 PM, Hong wrote: > Mark: > Bug is fixed in branch hzhang/fix-submat_samerowdist > https://bitbucket.org/petsc/petsc/branch/hzhang/fix-submat_samerowdist > > I also add the test runex56. Please test it and let me know if there is a > problem. > Hong > > Also, I have been using -petscpartition_type but now I see >> -pc_gamg_mat_partitioning_type. Is -petscpartition_type depreciated for >> GAMG? >> >> Is this some sort of auto generated portmanteau? I can not find >> pc_gamg_mat_partitioning_type in the source. >> >> On Thu, Nov 2, 2017 at 6:44 PM, Mark Adams wrote: >> >>> Great, thanks, >>> >>> And could you please add these parameters to a regression test? As I >>> recall we have with-parmetis regression test. >>> >>> On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: >>> >>>> Mark: >>>> I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( >>>> Now testing src/snes/examples/tutorials/ex56.c with your options, I >>>> can reproduce the error. >>>> I'll fix it. >>>> >>>> Hong >>>> >>>> Hong, >>>>> >>>>> I've tested with master and I get the same error. Maybe the >>>>> partitioning parameters are wrong. -pc_gamg_mat_partitioning_type is new to >>>>> me. >>>>> >>>>> Can you run this (snes ex56) w/o the error? >>>>> >>>>> >>>>> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ make >>>>> runex >>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 >>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -matrap 0 >>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>> [0] 27 global equations, 9 vertices >>>>> [0] 27 equations in vector, 9 vertices >>>>> 0 SNES Function norm 122.396 >>>>> 0 KSP Residual norm 122.396 >>>>> >>>>> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) >>>>> [0] 4725 global equations, 1575 vertices >>>>> [0] 4725 equations in vector, 1575 vertices >>>>> 0 SNES Function norm 17.9091 >>>>> [0]PETSC ERROR: --------------------- Error Message >>>>> -------------------------------------------------------------- >>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>> ocumentation/faq.html for trouble shooting. >>>>> >>>>> >>>>> On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: >>>>> >>>>>> Mark : >>>>>> I realize that using maint or master branch, I cannot reproduce the >>>>>> same error. >>>>>> For this example, you must use a parallel partitioner, e.g.,'current' >>>>>> gives me following error: >>>>>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it currently >>>>>> only supports one domain per processor >>>>>> use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch for >>>>>> more than one subdomain per processor >>>>>> >>>>>> Please rebase your branch with maint or master, then see if you still >>>>>> have problem. >>>>>> >>>>>> Hong >>>>>> >>>>>> >>>>>>> >>>>>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong wrote: >>>>>>> >>>>>>>> Mark, >>>>>>>> I can reproduce this in an old branch, but not in current maint and >>>>>>>> master. >>>>>>>> Which branch are you using to produce this error? >>>>>>>> >>>>>>> >>>>>>> I am using a branch from Matt. Let me try to merge it with master. >>>>>>> >>>>>>> >>>>>>>> Hong >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams wrote: >>>>>>>> >>>>>>>>> I am able to reproduce this with snes ex56 with 2 processors and >>>>>>>>> adding -pc_gamg_repartition true >>>>>>>>> >>>>>>>>> I'm not sure how to fix it. >>>>>>>>> >>>>>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>>>>> PETSC_DIR=/Users/markadams/Codes/petsc >>>>>>>>> PETSC_ARCH=arch-macosx-gnu-g runex >>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 >>>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>>>> jacobi -petscpartitioner_type simple -mat_block_size 3 -matrap 0 >>>>>>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>> 1 KSP Residual norm 20.4696 >>>>>>>>> 2 KSP Residual norm 3.95009 >>>>>>>>> 3 KSP Residual norm 0.176181 >>>>>>>>> 4 KSP Residual norm 0.0208781 >>>>>>>>> 5 KSP Residual norm 0.00278873 >>>>>>>>> 6 KSP Residual norm 0.000482741 >>>>>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>>>>> 12 KSP Residual norm 1.989e-10 >>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>>>>>> 1 SNES Function norm 1.990e-10 >>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>> iterations 1 >>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>> type: plex >>>>>>>>> Mesh in 3 dimensions: >>>>>>>>> 0-cells: 12 12 >>>>>>>>> 1-cells: 20 20 >>>>>>>>> 2-cells: 11 11 >>>>>>>>> 3-cells: 2 2 >>>>>>>>> Labels: >>>>>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 (1), >>>>>>>>> 6 (1)) >>>>>>>>> marker: 1 strata with value/size (1 (27)) >>>>>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>>>>>>> [0] 441 global equations, 147 vertices >>>>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>>>> 0 SNES Function norm 49.7106 >>>>>>>>> 0 KSP Residual norm 49.7106 >>>>>>>>> 1 KSP Residual norm 12.9252 >>>>>>>>> 2 KSP Residual norm 2.38019 >>>>>>>>> 3 KSP Residual norm 0.426307 >>>>>>>>> 4 KSP Residual norm 0.0692155 >>>>>>>>> 5 KSP Residual norm 0.0123092 >>>>>>>>> 6 KSP Residual norm 0.00184874 >>>>>>>>> 7 KSP Residual norm 0.000320761 >>>>>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>>>>> 15 KSP Residual norm 2.174e-10 >>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>>>>>> 1 SNES Function norm 2.174e-10 >>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>> iterations 1 >>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>> type: plex >>>>>>>>> Mesh in 3 dimensions: >>>>>>>>> 0-cells: 45 45 >>>>>>>>> 1-cells: 96 96 >>>>>>>>> 2-cells: 68 68 >>>>>>>>> 3-cells: 16 16 >>>>>>>>> Labels: >>>>>>>>> marker: 1 strata with value/size (1 (129)) >>>>>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 >>>>>>>>> (9), 6 (9)) >>>>>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) >>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> [1]PETSC ERROR: No support for this operation for this object type >>>>>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>>>>> rchurchi at pppl.gov> wrote: >>>>>>>>>> >>>>>>>>>>> Doing some additional testing, the issue goes away when removing >>>>>>>>>>> the gamg preconditioner line from the petsc.rc: >>>>>>>>>>> -pc_type gamg >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yea, this is GAMG setup. >>>>>>>>>> >>>>>>>>>> This is the code. findices is create with ISCreateStride, so it >>>>>>>>>> is sorted ... >>>>>>>>>> >>>>>>>>>> Michael is repartitioning the coarse grids. Maybe we don't have a >>>>>>>>>> regression test with this... >>>>>>>>>> >>>>>>>>>> I will try to reproduce this. >>>>>>>>>> >>>>>>>>>> Michael: you can use hypre for now, or turn repartitioning off >>>>>>>>>> (eg, -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), >>>>>>>>>> but I'm not sure this will fix this. >>>>>>>>>> >>>>>>>>>> You don't have hypre parameters for all of your all of your >>>>>>>>>> solvers. I think 'boomeramg' is the default pc_hypre_type. That should be >>>>>>>>>> good enough for you. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> { >>>>>>>>>> IS findices; >>>>>>>>>> PetscInt Istart,Iend; >>>>>>>>>> Mat Pnew; >>>>>>>>>> >>>>>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>>>>> &Iend);CHKERRQ(ierr); >>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>> #endif >>>>>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>>>>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>>>>>>>>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>>>>> >>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>> #endif >>>>>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>>>>> >>>>>>>>>> /* output - repartitioned */ >>>>>>>>>> *a_P_inout = Pnew; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong wrote: >>>>>>>>>>> >>>>>>>>>>>> Randy: >>>>>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>>>>> Hong >>>>>>>>>>>> >>>>>>>>>>>> OK, this might not be completely satisfactory, because it >>>>>>>>>>>>> doesn't show the partitioning or how the matrix is created, but this >>>>>>>>>>>>> reproduces the problem. I wrote out my matrix, Amat, from the larger >>>>>>>>>>>>> simulation, and load it in this script. This must be run with MPI rank >>>>>>>>>>>>> greater than 1. This may be some combination of my petsc.rc, because when I >>>>>>>>>>>>> use the PetscInitialize with it, it throws the error, but when using >>>>>>>>>>>>> default (PETSC_NULL_CHARACTER) it runs fine. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>>>>> It would be helpful if you can provide us a simple example >>>>>>>>>>>>>> that produces this example. >>>>>>>>>>>>>> Hong >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm running a Fortran code that was just changed over to >>>>>>>>>>>>>>> using petsc 3.8 (previously petsc 3.7.6). An error was thrown during a >>>>>>>>>>>>>>> KSPSetUp() call. The error is "unsorted iscol_local is not implemented yet" >>>>>>>>>>>>>>> (see full error below). I tried to trace down the difference in the source >>>>>>>>>>>>>>> files, but where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npets >>>>>>>>>>>>>>> cloc,nreal,nreal,ierr) >>>>>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, >>>>>>>>>>>>>>> unknown[62]PETSC ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() >>>>>>>>>>>>>>> line 3418 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfu >>>>>>>>>>>>>>> nc.c >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> R. Michael Churchill >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 7 13:10:10 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 7 Nov 2017 14:10:10 -0500 Subject: [petsc-users] Performance of Fieldsplit PC In-Reply-To: References: Message-ID: On Tue, Nov 7, 2017 at 10:55 AM, Bernardo Rocha < bernardomartinsrocha at gmail.com> wrote: > Thanks for the reply. > > 1) This is block-Jacobi, why not use PCBJACOBI? Is it because you want to >> select rows? >> > > I'm only using it to understand the performance behavior of PCFieldSplit > since I'm also > having the same issue in a large and more complex problem. > ? > >> 2) We cannot tell anything without knowing how many iterates were used >> > -ksp_monitor_true_residual -ksp_converged_reason >> -pc_fieldsplit_[0,1]_ksp_monitor_true_residual >> >> 3) We cannot say anything about performance without seeing the log for >> both runs >> -log_view >> > > I'm sending to you the log files with the recommended command line > arguments for the three cases. > You did not print out the iterates for the field split solves: -pc_fieldsplit_[0,1]_ksp_monitor_true_residual > 1-scalar case > 2-PCFieldSplit (as we were initially running) > 3-PCFieldSplit with Preonly/Jacobi in each block, as suggested by Patrick. > > As Patrick pointed out, with Preonly/Jacobi the behavior is closer to what > I expected. > > Please note that the log was taken for 100 calls to KSPSolve, I just > simplified it. > > What would be the proper way of creating this block preconditioner > > As you can see, the timing with PCFieldSplit is bigger for case 3. > For case 2 it is nearly 2x, as I expected (I don't know if this idea makes > sense). > > So for the case 2, the reason for the large timing is due to the > inner/outer solver? > > ?Does the "machinery" behind the PCFieldSplit for a block preconditioner > results > in some performance overhead? (neglecting the efficiency of the PC itself) > No. What you sent makes little sense to me. How do you have 29 iterates for the solve, but 5900 MatMults in the log? The number of MatMults in 2) is 4x, not 2x. I suspect that is because convergence of the Krylov solver on each block takes the same number of iterates that your global Krylov solver takes. Thus you have 2x, but you do 2 outer iterates, which is 4x. Thus you get your 2x time. You could try to play games with the inner tolerances (say run the blocks only to 10^-4). However, the fact remains that this is not even a credible solver for the problem. Thanks, Matt Best regards, > Bernardo? > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Tue Nov 7 14:00:39 2017 From: mlohry at gmail.com (Mark Lohry) Date: Tue, 7 Nov 2017 15:00:39 -0500 Subject: [petsc-users] troubleshooting AMG, coupled navier stokes, large eigenvalues on coarsest level Message-ID: I've now got gamg running on matrix-free newton-krylov with the jacobian provided by coloring finite differences (thanks again for the help). 3D Poisson with 4th order DG or higher (35^2 blocks), gamg with default settings is giving textbook convergence, which is great. Of course coupled compressible navier-stokes is harder, and convergence is bad-to-nonexistent. 1) Doc says "Equations must be ordered in ?vertex-major? ordering"; in my discretization, each "node" has 5 coupled degrees of freedom (density, 3 x momentum, energy). I'm ordering my unknowns: rho_i, rhou_i, rhov_i, rhow_i, Et_i, rho_i+1, rhou_i+1, ... e.g. row-major matrix order if you wrote the unknowns [{rho}, {rhou}, ... ]. and then setting block size to 5. Is that correct? I've also tried using the actual sparsity of the matrix which has larger dense blocks (e.g. [35x5]^2), but neither seemed to help. 2) With default settings, and with -pc_gamg_square_graph, pc_gamg_sym_graph, agg_nsmooths 0 mentioned in the manual, the eigenvalue estimates explode on the coarsest level, which I don't see with poisson: Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 32 MPI processes type: chebyshev eigenvalue estimates used: min = 0.18994, max = 2.08935 eigenvalues estimate via gmres min 0.00933256, max 1.8994 Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 32 MPI processes type: chebyshev eigenvalue estimates used: min = 0.165969, max = 1.82566 eigenvalues estimate via gmres min 0.0290728, max 1.65969 Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 32 MPI processes type: chebyshev eigenvalue estimates used: min = 0.146479, max = 1.61126 eigenvalues estimate via gmres min 0.204673, max 1.46479 Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 32 MPI processes type: chebyshev eigenvalue estimates used: min = 6.81977e+09, max = 7.50175e+10 eigenvalues estimate via gmres min -2.76436e+12, max 6.81977e+10 What's happening here? (Full -ksp_view below) 3) I'm not very familiar with chebyshev smoothers, but they're only for SPD systems (?). Is this an inappropriate smoother for this problem? 4) With gmres, the preconditioned residual is ~10 orders larger than the true residual; and the preconditioned residual drops while the true residual rises. I'm assuming this means something very wrong? 5) -pc_type hyper -pc_hypre_type boomeramg also works perfectly for the poisson case, but hits NaN on the first cycle for NS. KSP Object: 32 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=50, initial guess is zero tolerances: relative=1e-10, absolute=1e-07, divergence=10. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 32 MPI processes type: gamg type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using externally compute Galerkin coarse grid matrices GAMG specific options Threshold for dropping small values in graph on each level = 0.1 0.1 0.1 Threshold scaling factor for each level not specified = 1. AGG specific options Symmetric graph true Number of levels to square graph 10 Number smoothing steps 0 Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 32 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 32 MPI processes type: bjacobi number of blocks = 32 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 0. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=0, cols=0, bs=5 package used to perform factorization: petsc total: nonzeros=1, allocated nonzeros=1 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=0, cols=0, bs=5 total: nonzeros=0, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 32 MPI processes type: mpiaij rows=5, cols=5, bs=5 total: nonzeros=25, allocated nonzeros=25 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 32 MPI processes type: chebyshev eigenvalue estimates used: min = 0.18994, max = 2.08935 eigenvalues estimate via gmres min 0.00933256, max 1.8994 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_1_esteig_) 32 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=2, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_1_) 32 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 32 MPI processes type: mpiaij rows=70, cols=70, bs=5 total: nonzeros=1550, allocated nonzeros=1550 total number of mallocs used during MatSetValues calls =0 using nonscalable MatPtAP() implementation not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 32 MPI processes type: chebyshev eigenvalue estimates used: min = 0.165969, max = 1.82566 eigenvalues estimate via gmres min 0.0290728, max 1.65969 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_2_esteig_) 32 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=2, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_2_) 32 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 32 MPI processes type: mpiaij rows=270, cols=270, bs=5 total: nonzeros=7550, allocated nonzeros=7550 total number of mallocs used during MatSetValues calls =0 using nonscalable MatPtAP() implementation not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 32 MPI processes type: chebyshev eigenvalue estimates used: min = 0.146479, max = 1.61126 eigenvalues estimate via gmres min 0.204673, max 1.46479 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_3_esteig_) 32 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=2, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_3_) 32 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 32 MPI processes type: mpiaij rows=1610, cols=1610, bs=5 total: nonzeros=55550, allocated nonzeros=55550 total number of mallocs used during MatSetValues calls =0 using nonscalable MatPtAP() implementation using I-node (on process 0) routines: found 6 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 32 MPI processes type: chebyshev eigenvalue estimates used: min = 6.81977e+09, max = 7.50175e+10 eigenvalues estimate via gmres min -2.76436e+12, max 6.81977e+10 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_4_esteig_) 32 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=2, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_4_) 32 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix followed by preconditioner matrix: Mat Object: 32 MPI processes type: mffd rows=153600, cols=153600 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: 32 MPI processes type: mpiaij rows=153600, cols=153600, bs=5 total: nonzeros=65280000, allocated nonzeros=65280000 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 960 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix followed by preconditioner matrix: Mat Object: 32 MPI processes type: mffd rows=153600, cols=153600 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: 32 MPI processes type: mpiaij rows=153600, cols=153600, bs=5 total: nonzeros=65280000, allocated nonzeros=65280000 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 960 nodes, limit used is 5 1 SNES Function norm 5.917486103148e+05 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Nov 7 19:37:01 2017 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 7 Nov 2017 20:37:01 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Humm, this looks a little odd, but it may be OK. Is this this diffing with the old non-repartition data? (more below) On Tue, Nov 7, 2017 at 11:45 AM, Hong wrote: > Mark, > The fix is merged to next branch for tests which show diff as > > ******* Testing: testexamples_PARMETIS ******* > 5c5 > < 1 SNES Function norm 1.983e-10 > --- > > 1 SNES Function norm 1.990e-10 > 10,13c10,13 > < 0-cells: 8 8 8 8 > < 1-cells: 12 12 12 12 > < 2-cells: 6 6 6 6 > < 3-cells: 1 1 1 1 > > I assume this is the old. > --- > > 0-cells: 12 12 0 0 > > 1-cells: 20 20 0 0 > > 2-cells: 11 11 0 0 > > 3-cells: 2 2 0 0 > 15,18c15,18 > > and this is the new. This is funny because the processors are not fully populated. This can happen on coarse grids and indeed it should happen in a test with good coverage. I assume these diffs are views from coarse grids? That is, in the raw output files do you see fully populated fine grids, with no diffs, and then the diffs come on coarse grids. Repartitioning the coarse grids can change the coarsening, It is possible that repartitioning causes faster coarsening (it does a little) and this faster coarsening is tripping the aggregation switch, which gives us empty processors. Am I understanding this correctly ... Thanks, Mark > < boundary: 1 strata with value/size (1 (23)) > < Face Sets: 4 strata with value/size (1 (1), 2 (1), 4 (1), 6 (1)) > < marker: 1 strata with value/size (1 (15)) > < depth: 4 strata with value/size (0 (8), 1 (12), 2 (6), 3 (1)) > --- > > boundary: 1 strata with value/size (1 (39)) > > Face Sets: 5 strata with value/size (1 (2), 2 (2), 4 (2), 5 (1), 6 (1)) > > marker: 1 strata with value/size (1 (27)) > > depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) > > see http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2017/11/07/examples_full_next-tmp.log > > I guess parmetis produces random partition on different machines (I made output file for ex56_1 on my imac). Please take a look at the differences. If the outputs are correct, I will remove option '-ex56_dm_view' > > Hong > > > On Sun, Nov 5, 2017 at 9:03 PM, Hong wrote: > >> Mark: >> Bug is fixed in branch hzhang/fix-submat_samerowdist >> https://bitbucket.org/petsc/petsc/branch/hzhang/fix-submat_samerowdist >> >> I also add the test runex56. Please test it and let me know if there is a >> problem. >> Hong >> >> Also, I have been using -petscpartition_type but now I see >>> -pc_gamg_mat_partitioning_type. Is -petscpartition_type depreciated for >>> GAMG? >>> >>> Is this some sort of auto generated portmanteau? I can not find >>> pc_gamg_mat_partitioning_type in the source. >>> >>> On Thu, Nov 2, 2017 at 6:44 PM, Mark Adams wrote: >>> >>>> Great, thanks, >>>> >>>> And could you please add these parameters to a regression test? As I >>>> recall we have with-parmetis regression test. >>>> >>>> On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: >>>> >>>>> Mark: >>>>> I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( >>>>> Now testing src/snes/examples/tutorials/ex56.c with your options, I >>>>> can reproduce the error. >>>>> I'll fix it. >>>>> >>>>> Hong >>>>> >>>>> Hong, >>>>>> >>>>>> I've tested with master and I get the same error. Maybe the >>>>>> partitioning parameters are wrong. -pc_gamg_mat_partitioning_type is new to >>>>>> me. >>>>>> >>>>>> Can you run this (snes ex56) w/o the error? >>>>>> >>>>>> >>>>>> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ make >>>>>> runex >>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 >>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -matrap 0 >>>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>>> [0] 27 global equations, 9 vertices >>>>>> [0] 27 equations in vector, 9 vertices >>>>>> 0 SNES Function norm 122.396 >>>>>> 0 KSP Residual norm 122.396 >>>>>> >>>>>> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) >>>>>> [0] 4725 global equations, 1575 vertices >>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>> 0 SNES Function norm 17.9091 >>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>> -------------------------------------------------------------- >>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>> ocumentation/faq.html for trouble shooting. >>>>>> >>>>>> >>>>>> On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: >>>>>> >>>>>>> Mark : >>>>>>> I realize that using maint or master branch, I cannot reproduce the >>>>>>> same error. >>>>>>> For this example, you must use a parallel partitioner, >>>>>>> e.g.,'current' gives me following error: >>>>>>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it currently >>>>>>> only supports one domain per processor >>>>>>> use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch for >>>>>>> more than one subdomain per processor >>>>>>> >>>>>>> Please rebase your branch with maint or master, then see if you >>>>>>> still have problem. >>>>>>> >>>>>>> Hong >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong wrote: >>>>>>>> >>>>>>>>> Mark, >>>>>>>>> I can reproduce this in an old branch, but not in current maint >>>>>>>>> and master. >>>>>>>>> Which branch are you using to produce this error? >>>>>>>>> >>>>>>>> >>>>>>>> I am using a branch from Matt. Let me try to merge it with master. >>>>>>>> >>>>>>>> >>>>>>>>> Hong >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I am able to reproduce this with snes ex56 with 2 processors and >>>>>>>>>> adding -pc_gamg_repartition true >>>>>>>>>> >>>>>>>>>> I'm not sure how to fix it. >>>>>>>>>> >>>>>>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>>>>>> PETSC_DIR=/Users/markadams/Codes/petsc >>>>>>>>>> PETSC_ARCH=arch-macosx-gnu-g runex >>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 >>>>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>>>>> jacobi -petscpartitioner_type simple -mat_block_size 3 -matrap 0 >>>>>>>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>> 1 KSP Residual norm 20.4696 >>>>>>>>>> 2 KSP Residual norm 3.95009 >>>>>>>>>> 3 KSP Residual norm 0.176181 >>>>>>>>>> 4 KSP Residual norm 0.0208781 >>>>>>>>>> 5 KSP Residual norm 0.00278873 >>>>>>>>>> 6 KSP Residual norm 0.000482741 >>>>>>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>>>>>> 12 KSP Residual norm 1.989e-10 >>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>>>>>>> 1 SNES Function norm 1.990e-10 >>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>> iterations 1 >>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>> type: plex >>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>> 0-cells: 12 12 >>>>>>>>>> 1-cells: 20 20 >>>>>>>>>> 2-cells: 11 11 >>>>>>>>>> 3-cells: 2 2 >>>>>>>>>> Labels: >>>>>>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 >>>>>>>>>> (1), 6 (1)) >>>>>>>>>> marker: 1 strata with value/size (1 (27)) >>>>>>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>>>>>>>> [0] 441 global equations, 147 vertices >>>>>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>>>>> 0 SNES Function norm 49.7106 >>>>>>>>>> 0 KSP Residual norm 49.7106 >>>>>>>>>> 1 KSP Residual norm 12.9252 >>>>>>>>>> 2 KSP Residual norm 2.38019 >>>>>>>>>> 3 KSP Residual norm 0.426307 >>>>>>>>>> 4 KSP Residual norm 0.0692155 >>>>>>>>>> 5 KSP Residual norm 0.0123092 >>>>>>>>>> 6 KSP Residual norm 0.00184874 >>>>>>>>>> 7 KSP Residual norm 0.000320761 >>>>>>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>>>>>> 15 KSP Residual norm 2.174e-10 >>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>>>>>>> 1 SNES Function norm 2.174e-10 >>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>> iterations 1 >>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>> type: plex >>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>> 0-cells: 45 45 >>>>>>>>>> 1-cells: 96 96 >>>>>>>>>> 2-cells: 68 68 >>>>>>>>>> 3-cells: 16 16 >>>>>>>>>> Labels: >>>>>>>>>> marker: 1 strata with value/size (1 (129)) >>>>>>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 >>>>>>>>>> (9), 6 (9)) >>>>>>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 (16)) >>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>> [1]PETSC ERROR: No support for this operation for this object type >>>>>>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>>>>>> rchurchi at pppl.gov> wrote: >>>>>>>>>>> >>>>>>>>>>>> Doing some additional testing, the issue goes away when >>>>>>>>>>>> removing the gamg preconditioner line from the petsc.rc: >>>>>>>>>>>> -pc_type gamg >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Yea, this is GAMG setup. >>>>>>>>>>> >>>>>>>>>>> This is the code. findices is create with ISCreateStride, so it >>>>>>>>>>> is sorted ... >>>>>>>>>>> >>>>>>>>>>> Michael is repartitioning the coarse grids. Maybe we don't have >>>>>>>>>>> a regression test with this... >>>>>>>>>>> >>>>>>>>>>> I will try to reproduce this. >>>>>>>>>>> >>>>>>>>>>> Michael: you can use hypre for now, or turn repartitioning off >>>>>>>>>>> (eg, -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), >>>>>>>>>>> but I'm not sure this will fix this. >>>>>>>>>>> >>>>>>>>>>> You don't have hypre parameters for all of your all of your >>>>>>>>>>> solvers. I think 'boomeramg' is the default pc_hypre_type. That should be >>>>>>>>>>> good enough for you. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> { >>>>>>>>>>> IS findices; >>>>>>>>>>> PetscInt Istart,Iend; >>>>>>>>>>> Mat Pnew; >>>>>>>>>>> >>>>>>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>>>>>> &Iend);CHKERRQ(ierr); >>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>> #endif >>>>>>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>>>>>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>>>>>>>>>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>>>>>> >>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>> #endif >>>>>>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>>>>>> >>>>>>>>>>> /* output - repartitioned */ >>>>>>>>>>> *a_P_inout = Pnew; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Randy: >>>>>>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>>>>>> Hong >>>>>>>>>>>>> >>>>>>>>>>>>> OK, this might not be completely satisfactory, because it >>>>>>>>>>>>>> doesn't show the partitioning or how the matrix is created, but this >>>>>>>>>>>>>> reproduces the problem. I wrote out my matrix, Amat, from the larger >>>>>>>>>>>>>> simulation, and load it in this script. This must be run with MPI rank >>>>>>>>>>>>>> greater than 1. This may be some combination of my petsc.rc, because when I >>>>>>>>>>>>>> use the PetscInitialize with it, it throws the error, but when using >>>>>>>>>>>>>> default (PETSC_NULL_CHARACTER) it runs fine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>>>>>> It would be helpful if you can provide us a simple example >>>>>>>>>>>>>>> that produces this example. >>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm running a Fortran code that was just changed over to >>>>>>>>>>>>>>>> using petsc 3.8 (previously petsc 3.7.6). An error was thrown during a >>>>>>>>>>>>>>>> KSPSetUp() call. The error is "unsorted iscol_local is not implemented yet" >>>>>>>>>>>>>>>> (see full error below). I tried to trace down the difference in the source >>>>>>>>>>>>>>>> files, but where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npets >>>>>>>>>>>>>>>> cloc,nreal,nreal,ierr) >>>>>>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, >>>>>>>>>>>>>>>> unknown[62]PETSC ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() >>>>>>>>>>>>>>>> line 3418 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 >>>>>>>>>>>>>>>> in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>> 8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>> 8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Nov 7 21:13:03 2017 From: hzhang at mcs.anl.gov (Hong) Date: Tue, 7 Nov 2017 21:13:03 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Mark: I removed option '-ex56_dm_view'. Hong Humm, this looks a little odd, but it may be OK. Is this this diffing with > the old non-repartition data? (more below) > > On Tue, Nov 7, 2017 at 11:45 AM, Hong wrote: > >> Mark, >> The fix is merged to next branch for tests which show diff as >> >> ******* Testing: testexamples_PARMETIS ******* >> 5c5 >> < 1 SNES Function norm 1.983e-10 >> --- >> > 1 SNES Function norm 1.990e-10 >> 10,13c10,13 >> < 0-cells: 8 8 8 8 >> < 1-cells: 12 12 12 12 >> < 2-cells: 6 6 6 6 >> < 3-cells: 1 1 1 1 >> >> > I assume this is the old. > > >> --- >> > 0-cells: 12 12 0 0 >> > 1-cells: 20 20 0 0 >> > 2-cells: 11 11 0 0 >> > 3-cells: 2 2 0 0 >> 15,18c15,18 >> >> > and this is the new. > > This is funny because the processors are not fully populated. This can > happen on coarse grids and indeed it should happen in a test with good > coverage. > > I assume these diffs are views from coarse grids? That is, in the raw > output files do you see fully populated fine grids, with no diffs, and then > the diffs come on coarse grids. > > Repartitioning the coarse grids can change the coarsening, It is possible > that repartitioning causes faster coarsening (it does a little) and this > faster coarsening is tripping the aggregation switch, which gives us empty > processors. > > Am I understanding this correctly ... > > Thanks, > Mark > > >> < boundary: 1 strata with value/size (1 (23)) >> < Face Sets: 4 strata with value/size (1 (1), 2 (1), 4 (1), 6 (1)) >> < marker: 1 strata with value/size (1 (15)) >> < depth: 4 strata with value/size (0 (8), 1 (12), 2 (6), 3 (1)) >> --- >> > boundary: 1 strata with value/size (1 (39)) >> > Face Sets: 5 strata with value/size (1 (2), 2 (2), 4 (2), 5 (1), 6 (1)) >> > marker: 1 strata with value/size (1 (27)) >> > depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >> >> see http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2017/11/07/examples_full_next-tmp.log >> >> I guess parmetis produces random partition on different machines (I made output file for ex56_1 on my imac). Please take a look at the differences. If the outputs are correct, I will remove option '-ex56_dm_view' >> >> Hong >> >> >> On Sun, Nov 5, 2017 at 9:03 PM, Hong wrote: >> >>> Mark: >>> Bug is fixed in branch hzhang/fix-submat_samerowdist >>> https://bitbucket.org/petsc/petsc/branch/hzhang/fix-submat_samerowdist >>> >>> I also add the test runex56. Please test it and let me know if there is >>> a problem. >>> Hong >>> >>> Also, I have been using -petscpartition_type but now I see >>>> -pc_gamg_mat_partitioning_type. Is -petscpartition_type depreciated >>>> for GAMG? >>>> >>>> Is this some sort of auto generated portmanteau? I can not find >>>> pc_gamg_mat_partitioning_type in the source. >>>> >>>> On Thu, Nov 2, 2017 at 6:44 PM, Mark Adams wrote: >>>> >>>>> Great, thanks, >>>>> >>>>> And could you please add these parameters to a regression test? As I >>>>> recall we have with-parmetis regression test. >>>>> >>>>> On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: >>>>> >>>>>> Mark: >>>>>> I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( >>>>>> Now testing src/snes/examples/tutorials/ex56.c with your options, I >>>>>> can reproduce the error. >>>>>> I'll fix it. >>>>>> >>>>>> Hong >>>>>> >>>>>> Hong, >>>>>>> >>>>>>> I've tested with master and I get the same error. Maybe the >>>>>>> partitioning parameters are wrong. -pc_gamg_mat_partitioning_type is new to >>>>>>> me. >>>>>>> >>>>>>> Can you run this (snes ex56) w/o the error? >>>>>>> >>>>>>> >>>>>>> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ make >>>>>>> runex >>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 >>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -matrap 0 >>>>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>>>> [0] 27 global equations, 9 vertices >>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>> 0 SNES Function norm 122.396 >>>>>>> 0 KSP Residual norm 122.396 >>>>>>> >>>>>>> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) >>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>> 0 SNES Function norm 17.9091 >>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>> -------------------------------------------------------------- >>>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>> >>>>>>> >>>>>>> On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: >>>>>>> >>>>>>>> Mark : >>>>>>>> I realize that using maint or master branch, I cannot reproduce the >>>>>>>> same error. >>>>>>>> For this example, you must use a parallel partitioner, >>>>>>>> e.g.,'current' gives me following error: >>>>>>>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it currently >>>>>>>> only supports one domain per processor >>>>>>>> use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch >>>>>>>> for more than one subdomain per processor >>>>>>>> >>>>>>>> Please rebase your branch with maint or master, then see if you >>>>>>>> still have problem. >>>>>>>> >>>>>>>> Hong >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong wrote: >>>>>>>>> >>>>>>>>>> Mark, >>>>>>>>>> I can reproduce this in an old branch, but not in current maint >>>>>>>>>> and master. >>>>>>>>>> Which branch are you using to produce this error? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I am using a branch from Matt. Let me try to merge it with master. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hong >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I am able to reproduce this with snes ex56 with 2 processors and >>>>>>>>>>> adding -pc_gamg_repartition true >>>>>>>>>>> >>>>>>>>>>> I'm not sure how to fix it. >>>>>>>>>>> >>>>>>>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>>>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>>>>>>> PETSC_DIR=/Users/markadams/Codes/petsc >>>>>>>>>>> PETSC_ARCH=arch-macosx-gnu-g runex >>>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 2 >>>>>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>>> -mg_levels_pc_type jacobi -petscpartitioner_type simple -mat_block_size 3 >>>>>>>>>>> -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition >>>>>>>>>>> true >>>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>>> 1 KSP Residual norm 20.4696 >>>>>>>>>>> 2 KSP Residual norm 3.95009 >>>>>>>>>>> 3 KSP Residual norm 0.176181 >>>>>>>>>>> 4 KSP Residual norm 0.0208781 >>>>>>>>>>> 5 KSP Residual norm 0.00278873 >>>>>>>>>>> 6 KSP Residual norm 0.000482741 >>>>>>>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>>>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>>>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>>>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>>>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>>>>>>> 12 KSP Residual norm 1.989e-10 >>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>>>>>>>> 1 SNES Function norm 1.990e-10 >>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>> iterations 1 >>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>> type: plex >>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>> 0-cells: 12 12 >>>>>>>>>>> 1-cells: 20 20 >>>>>>>>>>> 2-cells: 11 11 >>>>>>>>>>> 3-cells: 2 2 >>>>>>>>>>> Labels: >>>>>>>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>>>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 >>>>>>>>>>> (1), 6 (1)) >>>>>>>>>>> marker: 1 strata with value/size (1 (27)) >>>>>>>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>>>>>>>>> [0] 441 global equations, 147 vertices >>>>>>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>>>>>> 0 SNES Function norm 49.7106 >>>>>>>>>>> 0 KSP Residual norm 49.7106 >>>>>>>>>>> 1 KSP Residual norm 12.9252 >>>>>>>>>>> 2 KSP Residual norm 2.38019 >>>>>>>>>>> 3 KSP Residual norm 0.426307 >>>>>>>>>>> 4 KSP Residual norm 0.0692155 >>>>>>>>>>> 5 KSP Residual norm 0.0123092 >>>>>>>>>>> 6 KSP Residual norm 0.00184874 >>>>>>>>>>> 7 KSP Residual norm 0.000320761 >>>>>>>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>>>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>>>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>>>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>>>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>>>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>>>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>>>>>>> 15 KSP Residual norm 2.174e-10 >>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>>>>>>>> 1 SNES Function norm 2.174e-10 >>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>> iterations 1 >>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>> type: plex >>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>> 0-cells: 45 45 >>>>>>>>>>> 1-cells: 96 96 >>>>>>>>>>> 2-cells: 68 68 >>>>>>>>>>> 3-cells: 16 16 >>>>>>>>>>> Labels: >>>>>>>>>>> marker: 1 strata with value/size (1 (129)) >>>>>>>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), 5 >>>>>>>>>>> (9), 6 (9)) >>>>>>>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>>>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 >>>>>>>>>>> (16)) >>>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>> [0]PETSC ERROR: No support for this operation for this object >>>>>>>>>>> type >>>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>> [1]PETSC ERROR: No support for this operation for this object >>>>>>>>>>> type >>>>>>>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>>>>>>> rchurchi at pppl.gov> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Doing some additional testing, the issue goes away when >>>>>>>>>>>>> removing the gamg preconditioner line from the petsc.rc: >>>>>>>>>>>>> -pc_type gamg >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Yea, this is GAMG setup. >>>>>>>>>>>> >>>>>>>>>>>> This is the code. findices is create with ISCreateStride, so >>>>>>>>>>>> it is sorted ... >>>>>>>>>>>> >>>>>>>>>>>> Michael is repartitioning the coarse grids. Maybe we don't have >>>>>>>>>>>> a regression test with this... >>>>>>>>>>>> >>>>>>>>>>>> I will try to reproduce this. >>>>>>>>>>>> >>>>>>>>>>>> Michael: you can use hypre for now, or turn repartitioning off >>>>>>>>>>>> (eg, -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), >>>>>>>>>>>> but I'm not sure this will fix this. >>>>>>>>>>>> >>>>>>>>>>>> You don't have hypre parameters for all of your all of your >>>>>>>>>>>> solvers. I think 'boomeramg' is the default pc_hypre_type. That should be >>>>>>>>>>>> good enough for you. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> { >>>>>>>>>>>> IS findices; >>>>>>>>>>>> PetscInt Istart,Iend; >>>>>>>>>>>> Mat Pnew; >>>>>>>>>>>> >>>>>>>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>>>>>>> &Iend);CHKERRQ(ierr); >>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>> #endif >>>>>>>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>>>>>>> ierr = MatCreateSubMatrix(Pold, findices, new_eq_indices, >>>>>>>>>>>> MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>>>>>>> >>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>> #endif >>>>>>>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>>>>>>> >>>>>>>>>>>> /* output - repartitioned */ >>>>>>>>>>>> *a_P_inout = Pnew; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>>>>>>> Hong >>>>>>>>>>>>>> >>>>>>>>>>>>>> OK, this might not be completely satisfactory, because it >>>>>>>>>>>>>>> doesn't show the partitioning or how the matrix is created, but this >>>>>>>>>>>>>>> reproduces the problem. I wrote out my matrix, Amat, from the larger >>>>>>>>>>>>>>> simulation, and load it in this script. This must be run with MPI rank >>>>>>>>>>>>>>> greater than 1. This may be some combination of my petsc.rc, because when I >>>>>>>>>>>>>>> use the PetscInitialize with it, it throws the error, but when using >>>>>>>>>>>>>>> default (PETSC_NULL_CHARACTER) it runs fine. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>>>>>>> It would be helpful if you can provide us a simple example >>>>>>>>>>>>>>>> that produces this example. >>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm running a Fortran code that was just changed over to >>>>>>>>>>>>>>>>> using petsc 3.8 (previously petsc 3.7.6). An error was thrown during a >>>>>>>>>>>>>>>>> KSPSetUp() call. The error is "unsorted iscol_local is not implemented yet" >>>>>>>>>>>>>>>>> (see full error below). I tried to trace down the difference in the source >>>>>>>>>>>>>>>>> files, but where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npets >>>>>>>>>>>>>>>>> cloc,nreal,nreal,ierr) >>>>>>>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented >>>>>>>>>>>>>>>>> yet >>>>>>>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, >>>>>>>>>>>>>>>>> unknown[62]PETSC ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() >>>>>>>>>>>>>>>>> line 3418 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 >>>>>>>>>>>>>>>>> in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>> 8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bernardomartinsrocha at gmail.com Wed Nov 8 04:51:01 2017 From: bernardomartinsrocha at gmail.com (Bernardo Rocha) Date: Wed, 8 Nov 2017 08:51:01 -0200 Subject: [petsc-users] Performance of Fieldsplit PC In-Reply-To: References: Message-ID: OK I see what you mean now. Thanks a lot for the help, playing with the inner tolerances was the key to improve the performance of the PCFieldSplit preconditioner. The only question which still remains is when should one use preonly for the inner solves. You could try to play games with the inner tolerances (say run the blocks > only to 10^-4). However, > the fact remains that this is not even a credible solver for the problem. > > Thanks, > > Matt > > Best regards, >> Bernardo? >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Nov 8 06:53:12 2017 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 8 Nov 2017 07:53:12 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Hong, is > 0-cells: 12 12 0 0 > 1-cells: 20 20 0 0 > 2-cells: 11 11 0 0 > 3-cells: 2 2 0 0 from the old version? I am seeing 0-cells: 8 8 8 8 1-cells: 12 12 12 12 2-cells: 6 6 6 6 3-cells: 1 1 1 1 Thanks, On Tue, Nov 7, 2017 at 10:13 PM, Hong wrote: > Mark: > I removed option '-ex56_dm_view'. > Hong > > Humm, this looks a little odd, but it may be OK. Is this this diffing >> with the old non-repartition data? (more below) >> >> On Tue, Nov 7, 2017 at 11:45 AM, Hong wrote: >> >>> Mark, >>> The fix is merged to next branch for tests which show diff as >>> >>> ******* Testing: testexamples_PARMETIS ******* >>> 5c5 >>> < 1 SNES Function norm 1.983e-10 >>> --- >>> > 1 SNES Function norm 1.990e-10 >>> 10,13c10,13 >>> < 0-cells: 8 8 8 8 >>> < 1-cells: 12 12 12 12 >>> < 2-cells: 6 6 6 6 >>> < 3-cells: 1 1 1 1 >>> >>> >> I assume this is the old. >> >> >>> --- >>> > 0-cells: 12 12 0 0 >>> > 1-cells: 20 20 0 0 >>> > 2-cells: 11 11 0 0 >>> > 3-cells: 2 2 0 0 >>> 15,18c15,18 >>> >>> >> and this is the new. >> >> This is funny because the processors are not fully populated. This can >> happen on coarse grids and indeed it should happen in a test with good >> coverage. >> >> I assume these diffs are views from coarse grids? That is, in the raw >> output files do you see fully populated fine grids, with no diffs, and then >> the diffs come on coarse grids. >> >> Repartitioning the coarse grids can change the coarsening, It is possible >> that repartitioning causes faster coarsening (it does a little) and this >> faster coarsening is tripping the aggregation switch, which gives us empty >> processors. >> >> Am I understanding this correctly ... >> >> Thanks, >> Mark >> >> >>> < boundary: 1 strata with value/size (1 (23)) >>> < Face Sets: 4 strata with value/size (1 (1), 2 (1), 4 (1), 6 (1)) >>> < marker: 1 strata with value/size (1 (15)) >>> < depth: 4 strata with value/size (0 (8), 1 (12), 2 (6), 3 (1)) >>> --- >>> > boundary: 1 strata with value/size (1 (39)) >>> > Face Sets: 5 strata with value/size (1 (2), 2 (2), 4 (2), 5 (1), 6 (1)) >>> > marker: 1 strata with value/size (1 (27)) >>> > depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>> >>> see http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2017/11/07/examples_full_next-tmp.log >>> >>> I guess parmetis produces random partition on different machines (I made output file for ex56_1 on my imac). Please take a look at the differences. If the outputs are correct, I will remove option '-ex56_dm_view' >>> >>> Hong >>> >>> >>> On Sun, Nov 5, 2017 at 9:03 PM, Hong wrote: >>> >>>> Mark: >>>> Bug is fixed in branch hzhang/fix-submat_samerowdist >>>> https://bitbucket.org/petsc/petsc/branch/hzhang/fix-submat_samerowdist >>>> >>>> I also add the test runex56. Please test it and let me know if there is >>>> a problem. >>>> Hong >>>> >>>> Also, I have been using -petscpartition_type but now I see >>>>> -pc_gamg_mat_partitioning_type. Is -petscpartition_type depreciated >>>>> for GAMG? >>>>> >>>>> Is this some sort of auto generated portmanteau? I can not find >>>>> pc_gamg_mat_partitioning_type in the source. >>>>> >>>>> On Thu, Nov 2, 2017 at 6:44 PM, Mark Adams wrote: >>>>> >>>>>> Great, thanks, >>>>>> >>>>>> And could you please add these parameters to a regression test? As I >>>>>> recall we have with-parmetis regression test. >>>>>> >>>>>> On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: >>>>>> >>>>>>> Mark: >>>>>>> I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( >>>>>>> Now testing src/snes/examples/tutorials/ex56.c with your options, I >>>>>>> can reproduce the error. >>>>>>> I'll fix it. >>>>>>> >>>>>>> Hong >>>>>>> >>>>>>> Hong, >>>>>>>> >>>>>>>> I've tested with master and I get the same error. Maybe the >>>>>>>> partitioning parameters are wrong. -pc_gamg_mat_partitioning_type is new to >>>>>>>> me. >>>>>>>> >>>>>>>> Can you run this (snes ex56) w/o the error? >>>>>>>> >>>>>>>> >>>>>>>> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ >>>>>>>> make runex >>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 >>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -matrap 0 >>>>>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>> 0 SNES Function norm 122.396 >>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>> >>>>>>>> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) >>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>> -------------------------------------------------------------- >>>>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: >>>>>>>> >>>>>>>>> Mark : >>>>>>>>> I realize that using maint or master branch, I cannot reproduce >>>>>>>>> the same error. >>>>>>>>> For this example, you must use a parallel partitioner, >>>>>>>>> e.g.,'current' gives me following error: >>>>>>>>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it >>>>>>>>> currently only supports one domain per processor >>>>>>>>> use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch >>>>>>>>> for more than one subdomain per processor >>>>>>>>> >>>>>>>>> Please rebase your branch with maint or master, then see if you >>>>>>>>> still have problem. >>>>>>>>> >>>>>>>>> Hong >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong wrote: >>>>>>>>>> >>>>>>>>>>> Mark, >>>>>>>>>>> I can reproduce this in an old branch, but not in current maint >>>>>>>>>>> and master. >>>>>>>>>>> Which branch are you using to produce this error? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I am using a branch from Matt. Let me try to merge it with master. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Hong >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I am able to reproduce this with snes ex56 with 2 processors >>>>>>>>>>>> and adding -pc_gamg_repartition true >>>>>>>>>>>> >>>>>>>>>>>> I'm not sure how to fix it. >>>>>>>>>>>> >>>>>>>>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>>>>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>>>>>>>> PETSC_DIR=/Users/markadams/Codes/petsc >>>>>>>>>>>> PETSC_ARCH=arch-macosx-gnu-g runex >>>>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n >>>>>>>>>>>> 2 ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>>>> -mg_levels_pc_type jacobi -petscpartitioner_type simple -mat_block_size 3 >>>>>>>>>>>> -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition >>>>>>>>>>>> true >>>>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>>>> 1 KSP Residual norm 20.4696 >>>>>>>>>>>> 2 KSP Residual norm 3.95009 >>>>>>>>>>>> 3 KSP Residual norm 0.176181 >>>>>>>>>>>> 4 KSP Residual norm 0.0208781 >>>>>>>>>>>> 5 KSP Residual norm 0.00278873 >>>>>>>>>>>> 6 KSP Residual norm 0.000482741 >>>>>>>>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>>>>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>>>>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>>>>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>>>>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>>>>>>>> 12 KSP Residual norm 1.989e-10 >>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>>>>>>>>> 1 SNES Function norm 1.990e-10 >>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>> iterations 1 >>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>> type: plex >>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>> 0-cells: 12 12 >>>>>>>>>>>> 1-cells: 20 20 >>>>>>>>>>>> 2-cells: 11 11 >>>>>>>>>>>> 3-cells: 2 2 >>>>>>>>>>>> Labels: >>>>>>>>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 >>>>>>>>>>>> (1), 6 (1)) >>>>>>>>>>>> marker: 1 strata with value/size (1 (27)) >>>>>>>>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 >>>>>>>>>>>> (2)) >>>>>>>>>>>> [0] 441 global equations, 147 vertices >>>>>>>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>>>>>>> 0 SNES Function norm 49.7106 >>>>>>>>>>>> 0 KSP Residual norm 49.7106 >>>>>>>>>>>> 1 KSP Residual norm 12.9252 >>>>>>>>>>>> 2 KSP Residual norm 2.38019 >>>>>>>>>>>> 3 KSP Residual norm 0.426307 >>>>>>>>>>>> 4 KSP Residual norm 0.0692155 >>>>>>>>>>>> 5 KSP Residual norm 0.0123092 >>>>>>>>>>>> 6 KSP Residual norm 0.00184874 >>>>>>>>>>>> 7 KSP Residual norm 0.000320761 >>>>>>>>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>>>>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>>>>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>>>>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>>>>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>>>>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>>>>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>>>>>>>> 15 KSP Residual norm 2.174e-10 >>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>>>>>>>>> 1 SNES Function norm 2.174e-10 >>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>> iterations 1 >>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>> type: plex >>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>> 0-cells: 45 45 >>>>>>>>>>>> 1-cells: 96 96 >>>>>>>>>>>> 2-cells: 68 68 >>>>>>>>>>>> 3-cells: 16 16 >>>>>>>>>>>> Labels: >>>>>>>>>>>> marker: 1 strata with value/size (1 (129)) >>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), >>>>>>>>>>>> 5 (9), 6 (9)) >>>>>>>>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>>>>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 >>>>>>>>>>>> (16)) >>>>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>>> [0]PETSC ERROR: No support for this operation for this object >>>>>>>>>>>> type >>>>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>>> [1]PETSC ERROR: No support for this operation for this object >>>>>>>>>>>> type >>>>>>>>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>>>>>>>> rchurchi at pppl.gov> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Doing some additional testing, the issue goes away when >>>>>>>>>>>>>> removing the gamg preconditioner line from the petsc.rc: >>>>>>>>>>>>>> -pc_type gamg >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Yea, this is GAMG setup. >>>>>>>>>>>>> >>>>>>>>>>>>> This is the code. findices is create with ISCreateStride, so >>>>>>>>>>>>> it is sorted ... >>>>>>>>>>>>> >>>>>>>>>>>>> Michael is repartitioning the coarse grids. Maybe we don't >>>>>>>>>>>>> have a regression test with this... >>>>>>>>>>>>> >>>>>>>>>>>>> I will try to reproduce this. >>>>>>>>>>>>> >>>>>>>>>>>>> Michael: you can use hypre for now, or turn repartitioning off >>>>>>>>>>>>> (eg, -fsa_fieldsplit_lambda_upper_pc_gamg_repartition false), >>>>>>>>>>>>> but I'm not sure this will fix this. >>>>>>>>>>>>> >>>>>>>>>>>>> You don't have hypre parameters for all of your all of your >>>>>>>>>>>>> solvers. I think 'boomeramg' is the default pc_hypre_type. That should be >>>>>>>>>>>>> good enough for you. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> { >>>>>>>>>>>>> IS findices; >>>>>>>>>>>>> PetscInt Istart,Iend; >>>>>>>>>>>>> Mat Pnew; >>>>>>>>>>>>> >>>>>>>>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>>>>>>>> &Iend);CHKERRQ(ierr); >>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>>>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>> #endif >>>>>>>>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>>>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>>>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>>>>>>>> ierr = MatCreateSubMatrix(Pold, findices, >>>>>>>>>>>>> new_eq_indices, MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>>>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>>>>>>>> >>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>>>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>> #endif >>>>>>>>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>>>>>>>> >>>>>>>>>>>>> /* output - repartitioned */ >>>>>>>>>>>>> *a_P_inout = Pnew; >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> OK, this might not be completely satisfactory, because it >>>>>>>>>>>>>>>> doesn't show the partitioning or how the matrix is created, but this >>>>>>>>>>>>>>>> reproduces the problem. I wrote out my matrix, Amat, from the larger >>>>>>>>>>>>>>>> simulation, and load it in this script. This must be run with MPI rank >>>>>>>>>>>>>>>> greater than 1. This may be some combination of my petsc.rc, because when I >>>>>>>>>>>>>>>> use the PetscInitialize with it, it throws the error, but when using >>>>>>>>>>>>>>>> default (PETSC_NULL_CHARACTER) it runs fine. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>>>>>>>> It would be helpful if you can provide us a simple example >>>>>>>>>>>>>>>>> that produces this example. >>>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm running a Fortran code that was just changed over to >>>>>>>>>>>>>>>>>> using petsc 3.8 (previously petsc 3.7.6). An error was thrown during a >>>>>>>>>>>>>>>>>> KSPSetUp() call. The error is "unsorted iscol_local is not implemented yet" >>>>>>>>>>>>>>>>>> (see full error below). I tried to trace down the difference in the source >>>>>>>>>>>>>>>>>> files, but where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npets >>>>>>>>>>>>>>>>>> cloc,nreal,nreal,ierr) >>>>>>>>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented >>>>>>>>>>>>>>>>>> yet >>>>>>>>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, >>>>>>>>>>>>>>>>>> unknown[62]PETSC ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() >>>>>>>>>>>>>>>>>> line 3418 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 >>>>>>>>>>>>>>>>>> in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>> 8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukas.drinkt.thee at gmail.com Wed Nov 8 08:36:28 2017 From: lukas.drinkt.thee at gmail.com (Lukas van de Wiel) Date: Wed, 8 Nov 2017 15:36:28 +0100 Subject: [petsc-users] linking PETSC with SSL Message-ID: Good day, during an upgrade of PETSc (from 3.4.2 to 3.7.7) and some changes needed to use MUMPs I ran into a problem with SSL that I could not really figure out. ******************************************************** I have configured PETSc 3.7.7 using: (explicitly note the option --with-ssl=0 ) ./configure \ COPTFLAGS='-O3 -march=native -mtune=native' \ CXXOPTFLAGS='-O3 -march=native -mtune=native' \ FOPTFLAGS='-O3 -march=native -mtune=native' \ --with-debugging=0 \ --with-x=0 \ --with-ssl=0 \ --with-shared-libraries=0 \ --download-metis \ --download-parmetis \ --download-fblaslapack \ --download-scalapack \ --download-openmpi \ --download-mumps \ --download-hypre \ --download-ptscotch ******************************************************** Next I have a tiny Fortran program to illustrate the problem: program sslhuh implicit none #include "petsc/finclude/petscsys.h" write(*,*) "Hello World" end program ******************************************************** I compile it using makefile: petscDir = /net/home/gtecton/sw_dev/petsc-3.7.7 all: compile link compile: sslhuh.F $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ -c \ -o sslhuh.o \ -std=f2008 \ -ffree-form \ -I$(petscDir)/include \ -I$(petscDir)/include/petsc/finclude \ sslhuh.F link: sslhuh.o $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ -o sslhuh \ sslhuh.o \ $(petscDir)/linux-gnu-x86_64/lib/libpetsc.a \ $(petscDir)/linux-gnu-x86_64/lib/libHYPRE.a \ $(petscDir)/linux-gnu-x86_64/lib/libzmumps.a \ $(petscDir)/linux-gnu-x86_64/lib/libsmumps.a \ $(petscDir)/linux-gnu-x86_64/lib/libdmumps.a \ $(petscDir)/linux-gnu-x86_64/lib/libcmumps.a \ $(petscDir)/linux-gnu-x86_64/lib/libmumps_common.a \ $(petscDir)/linux-gnu-x86_64/lib/libesmumps.a \ $(petscDir)/linux-gnu-x86_64/lib/libparmetis.a \ $(petscDir)/linux-gnu-x86_64/lib/libmetis.a \ $(petscDir)/linux-gnu-x86_64/lib/libpord.a \ $(petscDir)/linux-gnu-x86_64/lib/libscalapack.a \ $(petscDir)/linux-gnu-x86_64/lib/libflapack.a \ $(petscDir)/linux-gnu-x86_64/lib/libfblas.a \ $(petscDir)/linux-gnu-x86_64/lib/libptscotch.a \ $(petscDir)/linux-gnu-x86_64/lib/libptscotcherr.a \ $(petscDir)/linux-gnu-x86_64/lib/libptscotcherrexit.a \ $(petscDir)/linux-gnu-x86_64/lib/libptscotchparmetis.a \ $(petscDir)/linux-gnu-x86_64/lib/libscotch.a \ $(petscDir)/linux-gnu-x86_64/lib/libscotcherr.a \ $(petscDir)/linux-gnu-x86_64/lib/libscotcherrexit.a \ $(petscDir)/linux-gnu-x86_64/lib/libscotchmetis.a \ -ldl ******************************************************** Compiling works fine. Linking gives me the error: /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): In function `PetscSSLInitializeContext': client.c:(.text+0x3b): undefined reference to `SSLv23_method' client.c:(.text+0x43): undefined reference to `SSL_CTX_new' client.c:(.text+0x5a): undefined reference to `SSL_CTX_ctrl' client.c:(.text+0x79): undefined reference to `SSL_library_init' client.c:(.text+0x7e): undefined reference to `SSL_load_error_strings' client.c:(.text+0x8c): undefined reference to `BIO_new_fp' /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): In function `PetscSSLDestroyContext': client.c:(.text+0xa5): undefined reference to `SSL_CTX_free' /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): In function `PetscHTTPSRequest': client.c:(.text+0xa9e): undefined reference to `SSL_write' client.c:(.text+0xaab): undefined reference to `SSL_get_error' client.c:(.text+0xb40): undefined reference to `SSL_read' client.c:(.text+0xb50): undefined reference to `SSL_get_error' client.c:(.text+0xbde): undefined reference to `SSL_free' client.c:(.text+0xc74): undefined reference to `SSL_shutdown' /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): In function `PetscHTTPSConnect': client.c:(.text+0x105c): undefined reference to `SSL_new' client.c:(.text+0x106a): undefined reference to `BIO_new_socket' client.c:(.text+0x1079): undefined reference to `SSL_set_bio' client.c:(.text+0x1082): undefined reference to `SSL_connect' collect2: ld returned 1 exit status make: *** [link] Error 1 ****************************************************** >From what I see, the SSL functionality is in client.o, which is part of libpetsc.a, which is included in the linking. There is no difference in whether I turn --with-ssl on or off... Does anybody have an idea what I am doing wrong? Best wishes and thank you for your time! Lukas From balay at mcs.anl.gov Wed Nov 8 08:54:08 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 8 Nov 2017 08:54:08 -0600 Subject: [petsc-users] linking PETSC with SSL In-Reply-To: References: Message-ID: Hm --with-ssl=0 should work. Can you do a fresh/clean build and see if the problem persists? If so send configure.log make.log and test.log for this build. BTW: 3.8 has better support for fortran usage - so you might consider upgrading all the way to it. https://www.mcs.anl.gov/petsc/documentation/changes/38.html http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html#UsingFortran Satish On Wed, 8 Nov 2017, Lukas van de Wiel wrote: > Good day, > > during an upgrade of PETSc (from 3.4.2 to 3.7.7) and some changes > needed to use MUMPs I ran into a problem with SSL that I could not > really figure out. > > ******************************************************** > I have configured PETSc 3.7.7 using: (explicitly note the option --with-ssl=0 ) > > > ./configure \ > COPTFLAGS='-O3 -march=native -mtune=native' \ > CXXOPTFLAGS='-O3 -march=native -mtune=native' \ > FOPTFLAGS='-O3 -march=native -mtune=native' \ > --with-debugging=0 \ > --with-x=0 \ > --with-ssl=0 \ > --with-shared-libraries=0 \ > --download-metis \ > --download-parmetis \ > --download-fblaslapack \ > --download-scalapack \ > --download-openmpi \ > --download-mumps \ > --download-hypre \ > --download-ptscotch > > > ******************************************************** > Next I have a tiny Fortran program to illustrate the problem: > > > program sslhuh > implicit none > #include "petsc/finclude/petscsys.h" > write(*,*) "Hello World" > end program > > > > ******************************************************** > I compile it using makefile: > > > petscDir = /net/home/gtecton/sw_dev/petsc-3.7.7 > > all: compile link > > compile: sslhuh.F > $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ > -c \ > -o sslhuh.o \ > -std=f2008 \ > -ffree-form \ > -I$(petscDir)/include \ > -I$(petscDir)/include/petsc/finclude \ > sslhuh.F > > link: sslhuh.o > $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ > -o sslhuh \ > sslhuh.o \ > $(petscDir)/linux-gnu-x86_64/lib/libpetsc.a \ > $(petscDir)/linux-gnu-x86_64/lib/libHYPRE.a \ > $(petscDir)/linux-gnu-x86_64/lib/libzmumps.a \ > $(petscDir)/linux-gnu-x86_64/lib/libsmumps.a \ > $(petscDir)/linux-gnu-x86_64/lib/libdmumps.a \ > $(petscDir)/linux-gnu-x86_64/lib/libcmumps.a \ > $(petscDir)/linux-gnu-x86_64/lib/libmumps_common.a \ > $(petscDir)/linux-gnu-x86_64/lib/libesmumps.a \ > $(petscDir)/linux-gnu-x86_64/lib/libparmetis.a \ > $(petscDir)/linux-gnu-x86_64/lib/libmetis.a \ > $(petscDir)/linux-gnu-x86_64/lib/libpord.a \ > $(petscDir)/linux-gnu-x86_64/lib/libscalapack.a \ > $(petscDir)/linux-gnu-x86_64/lib/libflapack.a \ > $(petscDir)/linux-gnu-x86_64/lib/libfblas.a \ > $(petscDir)/linux-gnu-x86_64/lib/libptscotch.a \ > $(petscDir)/linux-gnu-x86_64/lib/libptscotcherr.a \ > $(petscDir)/linux-gnu-x86_64/lib/libptscotcherrexit.a \ > $(petscDir)/linux-gnu-x86_64/lib/libptscotchparmetis.a \ > $(petscDir)/linux-gnu-x86_64/lib/libscotch.a \ > $(petscDir)/linux-gnu-x86_64/lib/libscotcherr.a \ > $(petscDir)/linux-gnu-x86_64/lib/libscotcherrexit.a \ > $(petscDir)/linux-gnu-x86_64/lib/libscotchmetis.a \ > -ldl > > > ******************************************************** > Compiling works fine. > Linking gives me the error: > > > /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > In function `PetscSSLInitializeContext': > client.c:(.text+0x3b): undefined reference to `SSLv23_method' > client.c:(.text+0x43): undefined reference to `SSL_CTX_new' > client.c:(.text+0x5a): undefined reference to `SSL_CTX_ctrl' > client.c:(.text+0x79): undefined reference to `SSL_library_init' > client.c:(.text+0x7e): undefined reference to `SSL_load_error_strings' > client.c:(.text+0x8c): undefined reference to `BIO_new_fp' > /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > In function `PetscSSLDestroyContext': > client.c:(.text+0xa5): undefined reference to `SSL_CTX_free' > /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > In function `PetscHTTPSRequest': > client.c:(.text+0xa9e): undefined reference to `SSL_write' > client.c:(.text+0xaab): undefined reference to `SSL_get_error' > client.c:(.text+0xb40): undefined reference to `SSL_read' > client.c:(.text+0xb50): undefined reference to `SSL_get_error' > client.c:(.text+0xbde): undefined reference to `SSL_free' > client.c:(.text+0xc74): undefined reference to `SSL_shutdown' > /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > In function `PetscHTTPSConnect': > client.c:(.text+0x105c): undefined reference to `SSL_new' > client.c:(.text+0x106a): undefined reference to `BIO_new_socket' > client.c:(.text+0x1079): undefined reference to `SSL_set_bio' > client.c:(.text+0x1082): undefined reference to `SSL_connect' > collect2: ld returned 1 exit status > make: *** [link] Error 1 > > > ****************************************************** > > From what I see, the SSL functionality is in client.o, which is part > of libpetsc.a, which is included in the linking. > > There is no difference in whether I turn --with-ssl on or off... > > Does anybody have an idea what I am doing wrong? > > Best wishes and thank you for your time! > > Lukas > From lukas.drinkt.thee at gmail.com Wed Nov 8 09:35:49 2017 From: lukas.drinkt.thee at gmail.com (Lukas van de Wiel) Date: Wed, 8 Nov 2017 16:35:49 +0100 Subject: [petsc-users] linking PETSC with SSL In-Reply-To: References: Message-ID: Hi Satish, thank you for your quick reply! I have installed 3.8.1, but the problem persists. I have attached configure.log and test.log. Does that reveal anything? Cheers and thanks again Lukas On 11/8/17, Satish Balay wrote: > Hm --with-ssl=0 should work. Can you do a fresh/clean build and see if the > problem persists? > > If so send configure.log make.log and test.log for this build. > > BTW: 3.8 has better support for fortran usage - so you might consider > upgrading all the way to it. > > https://www.mcs.anl.gov/petsc/documentation/changes/38.html > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html#UsingFortran > > Satish > > On Wed, 8 Nov 2017, Lukas van de Wiel wrote: > >> Good day, >> >> during an upgrade of PETSc (from 3.4.2 to 3.7.7) and some changes >> needed to use MUMPs I ran into a problem with SSL that I could not >> really figure out. >> >> ******************************************************** >> I have configured PETSc 3.7.7 using: (explicitly note the option >> --with-ssl=0 ) >> >> >> ./configure \ >> COPTFLAGS='-O3 -march=native -mtune=native' \ >> CXXOPTFLAGS='-O3 -march=native -mtune=native' \ >> FOPTFLAGS='-O3 -march=native -mtune=native' \ >> --with-debugging=0 \ >> --with-x=0 \ >> --with-ssl=0 \ >> --with-shared-libraries=0 \ >> --download-metis \ >> --download-parmetis \ >> --download-fblaslapack \ >> --download-scalapack \ >> --download-openmpi \ >> --download-mumps \ >> --download-hypre \ >> --download-ptscotch >> >> >> ******************************************************** >> Next I have a tiny Fortran program to illustrate the problem: >> >> >> program sslhuh >> implicit none >> #include "petsc/finclude/petscsys.h" >> write(*,*) "Hello World" >> end program >> >> >> >> ******************************************************** >> I compile it using makefile: >> >> >> petscDir = /net/home/gtecton/sw_dev/petsc-3.7.7 >> >> all: compile link >> >> compile: sslhuh.F >> $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ >> -c \ >> -o sslhuh.o \ >> -std=f2008 \ >> -ffree-form \ >> -I$(petscDir)/include \ >> -I$(petscDir)/include/petsc/finclude \ >> sslhuh.F >> >> link: sslhuh.o >> $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ >> -o sslhuh \ >> sslhuh.o \ >> $(petscDir)/linux-gnu-x86_64/lib/libpetsc.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libHYPRE.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libzmumps.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libsmumps.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libdmumps.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libcmumps.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libmumps_common.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libesmumps.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libparmetis.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libmetis.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libpord.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libscalapack.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libflapack.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libfblas.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libptscotch.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libptscotcherr.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libptscotcherrexit.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libptscotchparmetis.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libscotch.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libscotcherr.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libscotcherrexit.a \ >> $(petscDir)/linux-gnu-x86_64/lib/libscotchmetis.a \ >> -ldl >> >> >> ******************************************************** >> Compiling works fine. >> Linking gives me the error: >> >> >> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >> In function `PetscSSLInitializeContext': >> client.c:(.text+0x3b): undefined reference to `SSLv23_method' >> client.c:(.text+0x43): undefined reference to `SSL_CTX_new' >> client.c:(.text+0x5a): undefined reference to `SSL_CTX_ctrl' >> client.c:(.text+0x79): undefined reference to `SSL_library_init' >> client.c:(.text+0x7e): undefined reference to `SSL_load_error_strings' >> client.c:(.text+0x8c): undefined reference to `BIO_new_fp' >> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >> In function `PetscSSLDestroyContext': >> client.c:(.text+0xa5): undefined reference to `SSL_CTX_free' >> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >> In function `PetscHTTPSRequest': >> client.c:(.text+0xa9e): undefined reference to `SSL_write' >> client.c:(.text+0xaab): undefined reference to `SSL_get_error' >> client.c:(.text+0xb40): undefined reference to `SSL_read' >> client.c:(.text+0xb50): undefined reference to `SSL_get_error' >> client.c:(.text+0xbde): undefined reference to `SSL_free' >> client.c:(.text+0xc74): undefined reference to `SSL_shutdown' >> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >> In function `PetscHTTPSConnect': >> client.c:(.text+0x105c): undefined reference to `SSL_new' >> client.c:(.text+0x106a): undefined reference to `BIO_new_socket' >> client.c:(.text+0x1079): undefined reference to `SSL_set_bio' >> client.c:(.text+0x1082): undefined reference to `SSL_connect' >> collect2: ld returned 1 exit status >> make: *** [link] Error 1 >> >> >> ****************************************************** >> >> From what I see, the SSL functionality is in client.o, which is part >> of libpetsc.a, which is included in the linking. >> >> There is no difference in whether I turn --with-ssl on or off... >> >> Does anybody have an idea what I am doing wrong? >> >> Best wishes and thank you for your time! >> >> Lukas >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 5604054 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.log Type: application/octet-stream Size: 414 bytes Desc: not available URL: From balay at mcs.anl.gov Wed Nov 8 09:43:55 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 8 Nov 2017 09:43:55 -0600 Subject: [petsc-users] linking PETSC with SSL In-Reply-To: References: Message-ID: Looks fine to me - I do not see any reference to ssl in these logs. To confirm - can you send make.log for this build. Also: cd src/ksp/ksp/examples/tutorials make ex2 Satish On Wed, 8 Nov 2017, Lukas van de Wiel wrote: > Hi Satish, > > thank you for your quick reply! > > I have installed 3.8.1, but the problem persists. > I have attached configure.log and test.log. > > Does that reveal anything? > > Cheers and thanks again > > Lukas > > On 11/8/17, Satish Balay wrote: > > Hm --with-ssl=0 should work. Can you do a fresh/clean build and see if the > > problem persists? > > > > If so send configure.log make.log and test.log for this build. > > > > BTW: 3.8 has better support for fortran usage - so you might consider > > upgrading all the way to it. > > > > https://www.mcs.anl.gov/petsc/documentation/changes/38.html > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html#UsingFortran > > > > Satish > > > > On Wed, 8 Nov 2017, Lukas van de Wiel wrote: > > > >> Good day, > >> > >> during an upgrade of PETSc (from 3.4.2 to 3.7.7) and some changes > >> needed to use MUMPs I ran into a problem with SSL that I could not > >> really figure out. > >> > >> ******************************************************** > >> I have configured PETSc 3.7.7 using: (explicitly note the option > >> --with-ssl=0 ) > >> > >> > >> ./configure \ > >> COPTFLAGS='-O3 -march=native -mtune=native' \ > >> CXXOPTFLAGS='-O3 -march=native -mtune=native' \ > >> FOPTFLAGS='-O3 -march=native -mtune=native' \ > >> --with-debugging=0 \ > >> --with-x=0 \ > >> --with-ssl=0 \ > >> --with-shared-libraries=0 \ > >> --download-metis \ > >> --download-parmetis \ > >> --download-fblaslapack \ > >> --download-scalapack \ > >> --download-openmpi \ > >> --download-mumps \ > >> --download-hypre \ > >> --download-ptscotch > >> > >> > >> ******************************************************** > >> Next I have a tiny Fortran program to illustrate the problem: > >> > >> > >> program sslhuh > >> implicit none > >> #include "petsc/finclude/petscsys.h" > >> write(*,*) "Hello World" > >> end program > >> > >> > >> > >> ******************************************************** > >> I compile it using makefile: > >> > >> > >> petscDir = /net/home/gtecton/sw_dev/petsc-3.7.7 > >> > >> all: compile link > >> > >> compile: sslhuh.F > >> $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ > >> -c \ > >> -o sslhuh.o \ > >> -std=f2008 \ > >> -ffree-form \ > >> -I$(petscDir)/include \ > >> -I$(petscDir)/include/petsc/finclude \ > >> sslhuh.F > >> > >> link: sslhuh.o > >> $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ > >> -o sslhuh \ > >> sslhuh.o \ > >> $(petscDir)/linux-gnu-x86_64/lib/libpetsc.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libHYPRE.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libzmumps.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libsmumps.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libdmumps.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libcmumps.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libmumps_common.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libesmumps.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libparmetis.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libmetis.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libpord.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libscalapack.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libflapack.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libfblas.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libptscotch.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libptscotcherr.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libptscotcherrexit.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libptscotchparmetis.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libscotch.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libscotcherr.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libscotcherrexit.a \ > >> $(petscDir)/linux-gnu-x86_64/lib/libscotchmetis.a \ > >> -ldl > >> > >> > >> ******************************************************** > >> Compiling works fine. > >> Linking gives me the error: > >> > >> > >> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > >> In function `PetscSSLInitializeContext': > >> client.c:(.text+0x3b): undefined reference to `SSLv23_method' > >> client.c:(.text+0x43): undefined reference to `SSL_CTX_new' > >> client.c:(.text+0x5a): undefined reference to `SSL_CTX_ctrl' > >> client.c:(.text+0x79): undefined reference to `SSL_library_init' > >> client.c:(.text+0x7e): undefined reference to `SSL_load_error_strings' > >> client.c:(.text+0x8c): undefined reference to `BIO_new_fp' > >> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > >> In function `PetscSSLDestroyContext': > >> client.c:(.text+0xa5): undefined reference to `SSL_CTX_free' > >> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > >> In function `PetscHTTPSRequest': > >> client.c:(.text+0xa9e): undefined reference to `SSL_write' > >> client.c:(.text+0xaab): undefined reference to `SSL_get_error' > >> client.c:(.text+0xb40): undefined reference to `SSL_read' > >> client.c:(.text+0xb50): undefined reference to `SSL_get_error' > >> client.c:(.text+0xbde): undefined reference to `SSL_free' > >> client.c:(.text+0xc74): undefined reference to `SSL_shutdown' > >> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > >> In function `PetscHTTPSConnect': > >> client.c:(.text+0x105c): undefined reference to `SSL_new' > >> client.c:(.text+0x106a): undefined reference to `BIO_new_socket' > >> client.c:(.text+0x1079): undefined reference to `SSL_set_bio' > >> client.c:(.text+0x1082): undefined reference to `SSL_connect' > >> collect2: ld returned 1 exit status > >> make: *** [link] Error 1 > >> > >> > >> ****************************************************** > >> > >> From what I see, the SSL functionality is in client.o, which is part > >> of libpetsc.a, which is included in the linking. > >> > >> There is no difference in whether I turn --with-ssl on or off... > >> > >> Does anybody have an idea what I am doing wrong? > >> > >> Best wishes and thank you for your time! > >> > >> Lukas > >> > > > > > From lukas.drinkt.thee at gmail.com Wed Nov 8 09:44:43 2017 From: lukas.drinkt.thee at gmail.com (Lukas van de Wiel) Date: Wed, 8 Nov 2017 16:44:43 +0100 Subject: [petsc-users] linking PETSC with SSL In-Reply-To: References: Message-ID: Ah, never mind, It works in 3.8.1! Had to change my makefile also, of course... :-\ Thanks and have a great day! Lukas On 11/8/17, Lukas van de Wiel wrote: > Hi Satish, > > thank you for your quick reply! > > I have installed 3.8.1, but the problem persists. > I have attached configure.log and test.log. > > Does that reveal anything? > > Cheers and thanks again > > Lukas > > On 11/8/17, Satish Balay wrote: >> Hm --with-ssl=0 should work. Can you do a fresh/clean build and see if >> the >> problem persists? >> >> If so send configure.log make.log and test.log for this build. >> >> BTW: 3.8 has better support for fortran usage - so you might consider >> upgrading all the way to it. >> >> https://www.mcs.anl.gov/petsc/documentation/changes/38.html >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html#UsingFortran >> >> Satish >> >> On Wed, 8 Nov 2017, Lukas van de Wiel wrote: >> >>> Good day, >>> >>> during an upgrade of PETSc (from 3.4.2 to 3.7.7) and some changes >>> needed to use MUMPs I ran into a problem with SSL that I could not >>> really figure out. >>> >>> ******************************************************** >>> I have configured PETSc 3.7.7 using: (explicitly note the option >>> --with-ssl=0 ) >>> >>> >>> ./configure \ >>> COPTFLAGS='-O3 -march=native -mtune=native' \ >>> CXXOPTFLAGS='-O3 -march=native -mtune=native' \ >>> FOPTFLAGS='-O3 -march=native -mtune=native' \ >>> --with-debugging=0 \ >>> --with-x=0 \ >>> --with-ssl=0 \ >>> --with-shared-libraries=0 \ >>> --download-metis \ >>> --download-parmetis \ >>> --download-fblaslapack \ >>> --download-scalapack \ >>> --download-openmpi \ >>> --download-mumps \ >>> --download-hypre \ >>> --download-ptscotch >>> >>> >>> ******************************************************** >>> Next I have a tiny Fortran program to illustrate the problem: >>> >>> >>> program sslhuh >>> implicit none >>> #include "petsc/finclude/petscsys.h" >>> write(*,*) "Hello World" >>> end program >>> >>> >>> >>> ******************************************************** >>> I compile it using makefile: >>> >>> >>> petscDir = /net/home/gtecton/sw_dev/petsc-3.7.7 >>> >>> all: compile link >>> >>> compile: sslhuh.F >>> $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ >>> -c \ >>> -o sslhuh.o \ >>> -std=f2008 \ >>> -ffree-form \ >>> -I$(petscDir)/include \ >>> -I$(petscDir)/include/petsc/finclude \ >>> sslhuh.F >>> >>> link: sslhuh.o >>> $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ >>> -o sslhuh \ >>> sslhuh.o \ >>> $(petscDir)/linux-gnu-x86_64/lib/libpetsc.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libHYPRE.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libzmumps.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libsmumps.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libdmumps.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libcmumps.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libmumps_common.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libesmumps.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libparmetis.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libmetis.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libpord.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libscalapack.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libflapack.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libfblas.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotch.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotcherr.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotcherrexit.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotchparmetis.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libscotch.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libscotcherr.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libscotcherrexit.a \ >>> $(petscDir)/linux-gnu-x86_64/lib/libscotchmetis.a \ >>> -ldl >>> >>> >>> ******************************************************** >>> Compiling works fine. >>> Linking gives me the error: >>> >>> >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >>> In function `PetscSSLInitializeContext': >>> client.c:(.text+0x3b): undefined reference to `SSLv23_method' >>> client.c:(.text+0x43): undefined reference to `SSL_CTX_new' >>> client.c:(.text+0x5a): undefined reference to `SSL_CTX_ctrl' >>> client.c:(.text+0x79): undefined reference to `SSL_library_init' >>> client.c:(.text+0x7e): undefined reference to `SSL_load_error_strings' >>> client.c:(.text+0x8c): undefined reference to `BIO_new_fp' >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >>> In function `PetscSSLDestroyContext': >>> client.c:(.text+0xa5): undefined reference to `SSL_CTX_free' >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >>> In function `PetscHTTPSRequest': >>> client.c:(.text+0xa9e): undefined reference to `SSL_write' >>> client.c:(.text+0xaab): undefined reference to `SSL_get_error' >>> client.c:(.text+0xb40): undefined reference to `SSL_read' >>> client.c:(.text+0xb50): undefined reference to `SSL_get_error' >>> client.c:(.text+0xbde): undefined reference to `SSL_free' >>> client.c:(.text+0xc74): undefined reference to `SSL_shutdown' >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >>> In function `PetscHTTPSConnect': >>> client.c:(.text+0x105c): undefined reference to `SSL_new' >>> client.c:(.text+0x106a): undefined reference to `BIO_new_socket' >>> client.c:(.text+0x1079): undefined reference to `SSL_set_bio' >>> client.c:(.text+0x1082): undefined reference to `SSL_connect' >>> collect2: ld returned 1 exit status >>> make: *** [link] Error 1 >>> >>> >>> ****************************************************** >>> >>> From what I see, the SSL functionality is in client.o, which is part >>> of libpetsc.a, which is included in the linking. >>> >>> There is no difference in whether I turn --with-ssl on or off... >>> >>> Does anybody have an idea what I am doing wrong? >>> >>> Best wishes and thank you for your time! >>> >>> Lukas >>> >> >> > From balay at mcs.anl.gov Wed Nov 8 09:51:05 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 8 Nov 2017 09:51:05 -0600 Subject: [petsc-users] linking PETSC with SSL In-Reply-To: References: Message-ID: Glad it works! Thanks for the update. Satish On Wed, 8 Nov 2017, Lukas van de Wiel wrote: > Ah, never mind, It works in 3.8.1! > Had to change my makefile also, of course... :-\ > > Thanks and have a great day! > > Lukas > > > On 11/8/17, Lukas van de Wiel wrote: > > Hi Satish, > > > > thank you for your quick reply! > > > > I have installed 3.8.1, but the problem persists. > > I have attached configure.log and test.log. > > > > Does that reveal anything? > > > > Cheers and thanks again > > > > Lukas > > > > On 11/8/17, Satish Balay wrote: > >> Hm --with-ssl=0 should work. Can you do a fresh/clean build and see if > >> the > >> problem persists? > >> > >> If so send configure.log make.log and test.log for this build. > >> > >> BTW: 3.8 has better support for fortran usage - so you might consider > >> upgrading all the way to it. > >> > >> https://www.mcs.anl.gov/petsc/documentation/changes/38.html > >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html#UsingFortran > >> > >> Satish > >> > >> On Wed, 8 Nov 2017, Lukas van de Wiel wrote: > >> > >>> Good day, > >>> > >>> during an upgrade of PETSc (from 3.4.2 to 3.7.7) and some changes > >>> needed to use MUMPs I ran into a problem with SSL that I could not > >>> really figure out. > >>> > >>> ******************************************************** > >>> I have configured PETSc 3.7.7 using: (explicitly note the option > >>> --with-ssl=0 ) > >>> > >>> > >>> ./configure \ > >>> COPTFLAGS='-O3 -march=native -mtune=native' \ > >>> CXXOPTFLAGS='-O3 -march=native -mtune=native' \ > >>> FOPTFLAGS='-O3 -march=native -mtune=native' \ > >>> --with-debugging=0 \ > >>> --with-x=0 \ > >>> --with-ssl=0 \ > >>> --with-shared-libraries=0 \ > >>> --download-metis \ > >>> --download-parmetis \ > >>> --download-fblaslapack \ > >>> --download-scalapack \ > >>> --download-openmpi \ > >>> --download-mumps \ > >>> --download-hypre \ > >>> --download-ptscotch > >>> > >>> > >>> ******************************************************** > >>> Next I have a tiny Fortran program to illustrate the problem: > >>> > >>> > >>> program sslhuh > >>> implicit none > >>> #include "petsc/finclude/petscsys.h" > >>> write(*,*) "Hello World" > >>> end program > >>> > >>> > >>> > >>> ******************************************************** > >>> I compile it using makefile: > >>> > >>> > >>> petscDir = /net/home/gtecton/sw_dev/petsc-3.7.7 > >>> > >>> all: compile link > >>> > >>> compile: sslhuh.F > >>> $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ > >>> -c \ > >>> -o sslhuh.o \ > >>> -std=f2008 \ > >>> -ffree-form \ > >>> -I$(petscDir)/include \ > >>> -I$(petscDir)/include/petsc/finclude \ > >>> sslhuh.F > >>> > >>> link: sslhuh.o > >>> $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ > >>> -o sslhuh \ > >>> sslhuh.o \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libpetsc.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libHYPRE.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libzmumps.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libsmumps.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libdmumps.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libcmumps.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libmumps_common.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libesmumps.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libparmetis.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libmetis.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libpord.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libscalapack.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libflapack.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libfblas.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotch.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotcherr.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotcherrexit.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotchparmetis.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libscotch.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libscotcherr.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libscotcherrexit.a \ > >>> $(petscDir)/linux-gnu-x86_64/lib/libscotchmetis.a \ > >>> -ldl > >>> > >>> > >>> ******************************************************** > >>> Compiling works fine. > >>> Linking gives me the error: > >>> > >>> > >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > >>> In function `PetscSSLInitializeContext': > >>> client.c:(.text+0x3b): undefined reference to `SSLv23_method' > >>> client.c:(.text+0x43): undefined reference to `SSL_CTX_new' > >>> client.c:(.text+0x5a): undefined reference to `SSL_CTX_ctrl' > >>> client.c:(.text+0x79): undefined reference to `SSL_library_init' > >>> client.c:(.text+0x7e): undefined reference to `SSL_load_error_strings' > >>> client.c:(.text+0x8c): undefined reference to `BIO_new_fp' > >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > >>> In function `PetscSSLDestroyContext': > >>> client.c:(.text+0xa5): undefined reference to `SSL_CTX_free' > >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > >>> In function `PetscHTTPSRequest': > >>> client.c:(.text+0xa9e): undefined reference to `SSL_write' > >>> client.c:(.text+0xaab): undefined reference to `SSL_get_error' > >>> client.c:(.text+0xb40): undefined reference to `SSL_read' > >>> client.c:(.text+0xb50): undefined reference to `SSL_get_error' > >>> client.c:(.text+0xbde): undefined reference to `SSL_free' > >>> client.c:(.text+0xc74): undefined reference to `SSL_shutdown' > >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): > >>> In function `PetscHTTPSConnect': > >>> client.c:(.text+0x105c): undefined reference to `SSL_new' > >>> client.c:(.text+0x106a): undefined reference to `BIO_new_socket' > >>> client.c:(.text+0x1079): undefined reference to `SSL_set_bio' > >>> client.c:(.text+0x1082): undefined reference to `SSL_connect' > >>> collect2: ld returned 1 exit status > >>> make: *** [link] Error 1 > >>> > >>> > >>> ****************************************************** > >>> > >>> From what I see, the SSL functionality is in client.o, which is part > >>> of libpetsc.a, which is included in the linking. > >>> > >>> There is no difference in whether I turn --with-ssl on or off... > >>> > >>> Does anybody have an idea what I am doing wrong? > >>> > >>> Best wishes and thank you for your time! > >>> > >>> Lukas > >>> > >> > >> > > > From hzhang at mcs.anl.gov Wed Nov 8 10:09:20 2017 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 8 Nov 2017 10:09:20 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Mark: > Hong, is > > 0-cells: 12 12 0 0 > > 1-cells: 20 20 0 0 > > 2-cells: 11 11 0 0 > > 3-cells: 2 2 0 0 > > from the old version? > In O-build on my macPro, I get the above. In g-build, I get > > 0-cells: 8 8 8 8 > 1-cells: 12 12 12 12 > 2-cells: 6 6 6 6 > 3-cells: 1 1 1 1 > I get this on linux machine. Do you know why? Hong > > On Tue, Nov 7, 2017 at 10:13 PM, Hong wrote: > >> Mark: >> I removed option '-ex56_dm_view'. >> Hong >> >> Humm, this looks a little odd, but it may be OK. Is this this diffing >>> with the old non-repartition data? (more below) >>> >>> On Tue, Nov 7, 2017 at 11:45 AM, Hong wrote: >>> >>>> Mark, >>>> The fix is merged to next branch for tests which show diff as >>>> >>>> ******* Testing: testexamples_PARMETIS ******* >>>> 5c5 >>>> < 1 SNES Function norm 1.983e-10 >>>> --- >>>> > 1 SNES Function norm 1.990e-10 >>>> 10,13c10,13 >>>> < 0-cells: 8 8 8 8 >>>> < 1-cells: 12 12 12 12 >>>> < 2-cells: 6 6 6 6 >>>> < 3-cells: 1 1 1 1 >>>> >>>> >>> I assume this is the old. >>> >>> >>>> --- >>>> > 0-cells: 12 12 0 0 >>>> > 1-cells: 20 20 0 0 >>>> > 2-cells: 11 11 0 0 >>>> > 3-cells: 2 2 0 0 >>>> 15,18c15,18 >>>> >>>> >>> and this is the new. >>> >>> This is funny because the processors are not fully populated. This can >>> happen on coarse grids and indeed it should happen in a test with good >>> coverage. >>> >>> I assume these diffs are views from coarse grids? That is, in the raw >>> output files do you see fully populated fine grids, with no diffs, and then >>> the diffs come on coarse grids. >>> >>> Repartitioning the coarse grids can change the coarsening, It is >>> possible that repartitioning causes faster coarsening (it does a little) >>> and this faster coarsening is tripping the aggregation switch, which gives >>> us empty processors. >>> >>> Am I understanding this correctly ... >>> >>> Thanks, >>> Mark >>> >>> >>>> < boundary: 1 strata with value/size (1 (23)) >>>> < Face Sets: 4 strata with value/size (1 (1), 2 (1), 4 (1), 6 (1)) >>>> < marker: 1 strata with value/size (1 (15)) >>>> < depth: 4 strata with value/size (0 (8), 1 (12), 2 (6), 3 (1)) >>>> --- >>>> > boundary: 1 strata with value/size (1 (39)) >>>> > Face Sets: 5 strata with value/size (1 (2), 2 (2), 4 (2), 5 (1), 6 (1)) >>>> > marker: 1 strata with value/size (1 (27)) >>>> > depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>> >>>> see http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2017/11/07/examples_full_next-tmp.log >>>> >>>> I guess parmetis produces random partition on different machines (I made output file for ex56_1 on my imac). Please take a look at the differences. If the outputs are correct, I will remove option '-ex56_dm_view' >>>> >>>> Hong >>>> >>>> >>>> On Sun, Nov 5, 2017 at 9:03 PM, Hong wrote: >>>> >>>>> Mark: >>>>> Bug is fixed in branch hzhang/fix-submat_samerowdist >>>>> https://bitbucket.org/petsc/petsc/branch/hzhang/fix-submat_samerowdist >>>>> >>>>> I also add the test runex56. Please test it and let me know if there >>>>> is a problem. >>>>> Hong >>>>> >>>>> Also, I have been using -petscpartition_type but now I see >>>>>> -pc_gamg_mat_partitioning_type. Is -petscpartition_type depreciated >>>>>> for GAMG? >>>>>> >>>>>> Is this some sort of auto generated portmanteau? I can not find >>>>>> pc_gamg_mat_partitioning_type in the source. >>>>>> >>>>>> On Thu, Nov 2, 2017 at 6:44 PM, Mark Adams wrote: >>>>>> >>>>>>> Great, thanks, >>>>>>> >>>>>>> And could you please add these parameters to a regression test? As I >>>>>>> recall we have with-parmetis regression test. >>>>>>> >>>>>>> On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: >>>>>>> >>>>>>>> Mark: >>>>>>>> I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( >>>>>>>> Now testing src/snes/examples/tutorials/ex56.c with your options, >>>>>>>> I can reproduce the error. >>>>>>>> I'll fix it. >>>>>>>> >>>>>>>> Hong >>>>>>>> >>>>>>>> Hong, >>>>>>>>> >>>>>>>>> I've tested with master and I get the same error. Maybe the >>>>>>>>> partitioning parameters are wrong. -pc_gamg_mat_partitioning_type is new to >>>>>>>>> me. >>>>>>>>> >>>>>>>>> Can you run this (snes ex56) w/o the error? >>>>>>>>> >>>>>>>>> >>>>>>>>> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ >>>>>>>>> make runex >>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 >>>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -matrap 0 >>>>>>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>> >>>>>>>>> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) >>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: >>>>>>>>> >>>>>>>>>> Mark : >>>>>>>>>> I realize that using maint or master branch, I cannot reproduce >>>>>>>>>> the same error. >>>>>>>>>> For this example, you must use a parallel partitioner, >>>>>>>>>> e.g.,'current' gives me following error: >>>>>>>>>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it >>>>>>>>>> currently only supports one domain per processor >>>>>>>>>> use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch >>>>>>>>>> for more than one subdomain per processor >>>>>>>>>> >>>>>>>>>> Please rebase your branch with maint or master, then see if you >>>>>>>>>> still have problem. >>>>>>>>>> >>>>>>>>>> Hong >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Mark, >>>>>>>>>>>> I can reproduce this in an old branch, but not in current maint >>>>>>>>>>>> and master. >>>>>>>>>>>> Which branch are you using to produce this error? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I am using a branch from Matt. Let me try to merge it with >>>>>>>>>>> master. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Hong >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I am able to reproduce this with snes ex56 with 2 processors >>>>>>>>>>>>> and adding -pc_gamg_repartition true >>>>>>>>>>>>> >>>>>>>>>>>>> I'm not sure how to fix it. >>>>>>>>>>>>> >>>>>>>>>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>>>>>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>>>>>>>>> PETSC_DIR=/Users/markadams/Codes/petsc >>>>>>>>>>>>> PETSC_ARCH=arch-macosx-gnu-g runex >>>>>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n >>>>>>>>>>>>> 2 ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>>>>> -mg_levels_pc_type jacobi -petscpartitioner_type simple -mat_block_size 3 >>>>>>>>>>>>> -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition >>>>>>>>>>>>> true >>>>>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>>>>> 1 KSP Residual norm 20.4696 >>>>>>>>>>>>> 2 KSP Residual norm 3.95009 >>>>>>>>>>>>> 3 KSP Residual norm 0.176181 >>>>>>>>>>>>> 4 KSP Residual norm 0.0208781 >>>>>>>>>>>>> 5 KSP Residual norm 0.00278873 >>>>>>>>>>>>> 6 KSP Residual norm 0.000482741 >>>>>>>>>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>>>>>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>>>>>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>>>>>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>>>>>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>>>>>>>>> 12 KSP Residual norm 1.989e-10 >>>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>>>>>>>>>> 1 SNES Function norm 1.990e-10 >>>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>>> iterations 1 >>>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>>> type: plex >>>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>>> 0-cells: 12 12 >>>>>>>>>>>>> 1-cells: 20 20 >>>>>>>>>>>>> 2-cells: 11 11 >>>>>>>>>>>>> 3-cells: 2 2 >>>>>>>>>>>>> Labels: >>>>>>>>>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 >>>>>>>>>>>>> (1), 6 (1)) >>>>>>>>>>>>> marker: 1 strata with value/size (1 (27)) >>>>>>>>>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 >>>>>>>>>>>>> (2)) >>>>>>>>>>>>> [0] 441 global equations, 147 vertices >>>>>>>>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>>>>>>>> 0 SNES Function norm 49.7106 >>>>>>>>>>>>> 0 KSP Residual norm 49.7106 >>>>>>>>>>>>> 1 KSP Residual norm 12.9252 >>>>>>>>>>>>> 2 KSP Residual norm 2.38019 >>>>>>>>>>>>> 3 KSP Residual norm 0.426307 >>>>>>>>>>>>> 4 KSP Residual norm 0.0692155 >>>>>>>>>>>>> 5 KSP Residual norm 0.0123092 >>>>>>>>>>>>> 6 KSP Residual norm 0.00184874 >>>>>>>>>>>>> 7 KSP Residual norm 0.000320761 >>>>>>>>>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>>>>>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>>>>>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>>>>>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>>>>>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>>>>>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>>>>>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>>>>>>>>> 15 KSP Residual norm 2.174e-10 >>>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>>>>>>>>>> 1 SNES Function norm 2.174e-10 >>>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>>> iterations 1 >>>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>>> type: plex >>>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>>> 0-cells: 45 45 >>>>>>>>>>>>> 1-cells: 96 96 >>>>>>>>>>>>> 2-cells: 68 68 >>>>>>>>>>>>> 3-cells: 16 16 >>>>>>>>>>>>> Labels: >>>>>>>>>>>>> marker: 1 strata with value/size (1 (129)) >>>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 (18), >>>>>>>>>>>>> 5 (9), 6 (9)) >>>>>>>>>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>>>>>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 >>>>>>>>>>>>> (16)) >>>>>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>>>> [0]PETSC ERROR: No support for this operation for this object >>>>>>>>>>>>> type >>>>>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>>>> [1]PETSC ERROR: No support for this operation for this object >>>>>>>>>>>>> type >>>>>>>>>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>>>>>>>>> rchurchi at pppl.gov> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Doing some additional testing, the issue goes away when >>>>>>>>>>>>>>> removing the gamg preconditioner line from the petsc.rc: >>>>>>>>>>>>>>> -pc_type gamg >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yea, this is GAMG setup. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is the code. findices is create with ISCreateStride, so >>>>>>>>>>>>>> it is sorted ... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Michael is repartitioning the coarse grids. Maybe we don't >>>>>>>>>>>>>> have a regression test with this... >>>>>>>>>>>>>> >>>>>>>>>>>>>> I will try to reproduce this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Michael: you can use hypre for now, or turn repartitioning >>>>>>>>>>>>>> off (eg, -fsa_fieldsplit_lambda_upper_pc_gamg_repartition >>>>>>>>>>>>>> false), but I'm not sure this will fix this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> You don't have hypre parameters for all of your all of your >>>>>>>>>>>>>> solvers. I think 'boomeramg' is the default pc_hypre_type. That should be >>>>>>>>>>>>>> good enough for you. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> { >>>>>>>>>>>>>> IS findices; >>>>>>>>>>>>>> PetscInt Istart,Iend; >>>>>>>>>>>>>> Mat Pnew; >>>>>>>>>>>>>> >>>>>>>>>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>>>>>>>>> &Iend);CHKERRQ(ierr); >>>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>>>>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>>> #endif >>>>>>>>>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>>>>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>>>>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>>>>>>>>> ierr = MatCreateSubMatrix(Pold, findices, >>>>>>>>>>>>>> new_eq_indices, MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>>>>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>>>>>>>>> >>>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>>>>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>>> #endif >>>>>>>>>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>>>>>>>>> >>>>>>>>>>>>>> /* output - repartitioned */ >>>>>>>>>>>>>> *a_P_inout = Pnew; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> OK, this might not be completely satisfactory, because it >>>>>>>>>>>>>>>>> doesn't show the partitioning or how the matrix is created, but this >>>>>>>>>>>>>>>>> reproduces the problem. I wrote out my matrix, Amat, from the larger >>>>>>>>>>>>>>>>> simulation, and load it in this script. This must be run with MPI rank >>>>>>>>>>>>>>>>> greater than 1. This may be some combination of my petsc.rc, because when I >>>>>>>>>>>>>>>>> use the PetscInitialize with it, it throws the error, but when using >>>>>>>>>>>>>>>>> default (PETSC_NULL_CHARACTER) it runs fine. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>>>>>>>>> It would be helpful if you can provide us a simple >>>>>>>>>>>>>>>>>> example that produces this example. >>>>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm running a Fortran code that was just changed over to >>>>>>>>>>>>>>>>>>> using petsc 3.8 (previously petsc 3.7.6). An error was thrown during a >>>>>>>>>>>>>>>>>>> KSPSetUp() call. The error is "unsorted iscol_local is not implemented yet" >>>>>>>>>>>>>>>>>>> (see full error below). I tried to trace down the difference in the source >>>>>>>>>>>>>>>>>>> files, but where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npets >>>>>>>>>>>>>>>>>>> cloc,nreal,nreal,ierr) >>>>>>>>>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not implemented >>>>>>>>>>>>>>>>>>> yet >>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, >>>>>>>>>>>>>>>>>>> unknown[62]PETSC ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() >>>>>>>>>>>>>>>>>>> line 3418 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line >>>>>>>>>>>>>>>>>>> 3247 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>> 8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Nov 8 11:01:38 2017 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 8 Nov 2017 12:01:38 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: On Wed, Nov 8, 2017 at 11:09 AM, Hong wrote: > Mark: > >> Hong, is >> > 0-cells: 12 12 0 0 >> > 1-cells: 20 20 0 0 >> > 2-cells: 11 11 0 0 >> > 3-cells: 2 2 0 0 >> >> from the old version? >> > In O-build on my macPro, I get the above. In g-build, I get > >> >> 0-cells: 8 8 8 8 >> 1-cells: 12 12 12 12 >> 2-cells: 6 6 6 6 >> 3-cells: 1 1 1 1 >> > I get this on linux machine. > Do you know why? > I can not reproduce your O output. I will look at it later. Valgrind is failing on me right now. I will look into it but can you valgrind it? > > Hong > >> >> On Tue, Nov 7, 2017 at 10:13 PM, Hong wrote: >> >>> Mark: >>> I removed option '-ex56_dm_view'. >>> Hong >>> >>> Humm, this looks a little odd, but it may be OK. Is this this diffing >>>> with the old non-repartition data? (more below) >>>> >>>> On Tue, Nov 7, 2017 at 11:45 AM, Hong wrote: >>>> >>>>> Mark, >>>>> The fix is merged to next branch for tests which show diff as >>>>> >>>>> ******* Testing: testexamples_PARMETIS ******* >>>>> 5c5 >>>>> < 1 SNES Function norm 1.983e-10 >>>>> --- >>>>> > 1 SNES Function norm 1.990e-10 >>>>> 10,13c10,13 >>>>> < 0-cells: 8 8 8 8 >>>>> < 1-cells: 12 12 12 12 >>>>> < 2-cells: 6 6 6 6 >>>>> < 3-cells: 1 1 1 1 >>>>> >>>>> >>>> I assume this is the old. >>>> >>>> >>>>> --- >>>>> > 0-cells: 12 12 0 0 >>>>> > 1-cells: 20 20 0 0 >>>>> > 2-cells: 11 11 0 0 >>>>> > 3-cells: 2 2 0 0 >>>>> 15,18c15,18 >>>>> >>>>> >>>> and this is the new. >>>> >>>> This is funny because the processors are not fully populated. This can >>>> happen on coarse grids and indeed it should happen in a test with good >>>> coverage. >>>> >>>> I assume these diffs are views from coarse grids? That is, in the raw >>>> output files do you see fully populated fine grids, with no diffs, and then >>>> the diffs come on coarse grids. >>>> >>>> Repartitioning the coarse grids can change the coarsening, It is >>>> possible that repartitioning causes faster coarsening (it does a little) >>>> and this faster coarsening is tripping the aggregation switch, which gives >>>> us empty processors. >>>> >>>> Am I understanding this correctly ... >>>> >>>> Thanks, >>>> Mark >>>> >>>> >>>>> < boundary: 1 strata with value/size (1 (23)) >>>>> < Face Sets: 4 strata with value/size (1 (1), 2 (1), 4 (1), 6 (1)) >>>>> < marker: 1 strata with value/size (1 (15)) >>>>> < depth: 4 strata with value/size (0 (8), 1 (12), 2 (6), 3 (1)) >>>>> --- >>>>> > boundary: 1 strata with value/size (1 (39)) >>>>> > Face Sets: 5 strata with value/size (1 (2), 2 (2), 4 (2), 5 (1), 6 (1)) >>>>> > marker: 1 strata with value/size (1 (27)) >>>>> > depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>>> >>>>> see http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2017/11/07/examples_full_next-tmp.log >>>>> >>>>> I guess parmetis produces random partition on different machines (I made output file for ex56_1 on my imac). Please take a look at the differences. If the outputs are correct, I will remove option '-ex56_dm_view' >>>>> >>>>> Hong >>>>> >>>>> >>>>> On Sun, Nov 5, 2017 at 9:03 PM, Hong wrote: >>>>> >>>>>> Mark: >>>>>> Bug is fixed in branch hzhang/fix-submat_samerowdist >>>>>> https://bitbucket.org/petsc/petsc/branch/hzhang/fix-submat_s >>>>>> amerowdist >>>>>> >>>>>> I also add the test runex56. Please test it and let me know if there >>>>>> is a problem. >>>>>> Hong >>>>>> >>>>>> Also, I have been using -petscpartition_type but now I see >>>>>>> -pc_gamg_mat_partitioning_type. Is -petscpartition_type depreciated >>>>>>> for GAMG? >>>>>>> >>>>>>> Is this some sort of auto generated portmanteau? I can not find >>>>>>> pc_gamg_mat_partitioning_type in the source. >>>>>>> >>>>>>> On Thu, Nov 2, 2017 at 6:44 PM, Mark Adams wrote: >>>>>>> >>>>>>>> Great, thanks, >>>>>>>> >>>>>>>> And could you please add these parameters to a regression test? As >>>>>>>> I recall we have with-parmetis regression test. >>>>>>>> >>>>>>>> On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: >>>>>>>> >>>>>>>>> Mark: >>>>>>>>> I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( >>>>>>>>> Now testing src/snes/examples/tutorials/ex56.c with your options, >>>>>>>>> I can reproduce the error. >>>>>>>>> I'll fix it. >>>>>>>>> >>>>>>>>> Hong >>>>>>>>> >>>>>>>>> Hong, >>>>>>>>>> >>>>>>>>>> I've tested with master and I get the same error. Maybe the >>>>>>>>>> partitioning parameters are wrong. -pc_gamg_mat_partitioning_type is new to >>>>>>>>>> me. >>>>>>>>>> >>>>>>>>>> Can you run this (snes ex56) w/o the error? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ >>>>>>>>>> make runex >>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 >>>>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -matrap 0 >>>>>>>>>> -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition true >>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>> >>>>>>>>>> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) >>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>> [0]PETSC ERROR: No support for this operation for this object type >>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: >>>>>>>>>> >>>>>>>>>>> Mark : >>>>>>>>>>> I realize that using maint or master branch, I cannot reproduce >>>>>>>>>>> the same error. >>>>>>>>>>> For this example, you must use a parallel partitioner, >>>>>>>>>>> e.g.,'current' gives me following error: >>>>>>>>>>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it >>>>>>>>>>> currently only supports one domain per processor >>>>>>>>>>> use -pc_gamg_mat_partitioning_type parmetis or chaco or ptscotch >>>>>>>>>>> for more than one subdomain per processor >>>>>>>>>>> >>>>>>>>>>> Please rebase your branch with maint or master, then see if you >>>>>>>>>>> still have problem. >>>>>>>>>>> >>>>>>>>>>> Hong >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Mark, >>>>>>>>>>>>> I can reproduce this in an old branch, but not in current >>>>>>>>>>>>> maint and master. >>>>>>>>>>>>> Which branch are you using to produce this error? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I am using a branch from Matt. Let me try to merge it with >>>>>>>>>>>> master. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Hong >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I am able to reproduce this with snes ex56 with 2 processors >>>>>>>>>>>>>> and adding -pc_gamg_repartition true >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm not sure how to fix it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>>>>>>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>>>>>>>>>> PETSC_DIR=/Users/markadams/Codes/petsc >>>>>>>>>>>>>> PETSC_ARCH=arch-macosx-gnu-g runex >>>>>>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec >>>>>>>>>>>>>> -n 2 ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it >>>>>>>>>>>>>> 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>>>>>> -mg_levels_pc_type jacobi -petscpartitioner_type simple -mat_block_size 3 >>>>>>>>>>>>>> -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition >>>>>>>>>>>>>> true >>>>>>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>>>>>> 1 KSP Residual norm 20.4696 >>>>>>>>>>>>>> 2 KSP Residual norm 3.95009 >>>>>>>>>>>>>> 3 KSP Residual norm 0.176181 >>>>>>>>>>>>>> 4 KSP Residual norm 0.0208781 >>>>>>>>>>>>>> 5 KSP Residual norm 0.00278873 >>>>>>>>>>>>>> 6 KSP Residual norm 0.000482741 >>>>>>>>>>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>>>>>>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>>>>>>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>>>>>>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>>>>>>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>>>>>>>>>> 12 KSP Residual norm 1.989e-10 >>>>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>>>>>>>>>>> 1 SNES Function norm 1.990e-10 >>>>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>>>> iterations 1 >>>>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>>>> type: plex >>>>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>>>> 0-cells: 12 12 >>>>>>>>>>>>>> 1-cells: 20 20 >>>>>>>>>>>>>> 2-cells: 11 11 >>>>>>>>>>>>>> 3-cells: 2 2 >>>>>>>>>>>>>> Labels: >>>>>>>>>>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), 5 >>>>>>>>>>>>>> (1), 6 (1)) >>>>>>>>>>>>>> marker: 1 strata with value/size (1 (27)) >>>>>>>>>>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 >>>>>>>>>>>>>> (2)) >>>>>>>>>>>>>> [0] 441 global equations, 147 vertices >>>>>>>>>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>>>>>>>>> 0 SNES Function norm 49.7106 >>>>>>>>>>>>>> 0 KSP Residual norm 49.7106 >>>>>>>>>>>>>> 1 KSP Residual norm 12.9252 >>>>>>>>>>>>>> 2 KSP Residual norm 2.38019 >>>>>>>>>>>>>> 3 KSP Residual norm 0.426307 >>>>>>>>>>>>>> 4 KSP Residual norm 0.0692155 >>>>>>>>>>>>>> 5 KSP Residual norm 0.0123092 >>>>>>>>>>>>>> 6 KSP Residual norm 0.00184874 >>>>>>>>>>>>>> 7 KSP Residual norm 0.000320761 >>>>>>>>>>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>>>>>>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>>>>>>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>>>>>>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>>>>>>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>>>>>>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>>>>>>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>>>>>>>>>> 15 KSP Residual norm 2.174e-10 >>>>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>>>>>>>>>>> 1 SNES Function norm 2.174e-10 >>>>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>>>> iterations 1 >>>>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>>>> type: plex >>>>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>>>> 0-cells: 45 45 >>>>>>>>>>>>>> 1-cells: 96 96 >>>>>>>>>>>>>> 2-cells: 68 68 >>>>>>>>>>>>>> 3-cells: 16 16 >>>>>>>>>>>>>> Labels: >>>>>>>>>>>>>> marker: 1 strata with value/size (1 (129)) >>>>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 >>>>>>>>>>>>>> (18), 5 (9), 6 (9)) >>>>>>>>>>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>>>>>>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 >>>>>>>>>>>>>> (16)) >>>>>>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> [0]PETSC ERROR: No support for this operation for this object >>>>>>>>>>>>>> type >>>>>>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> [1]PETSC ERROR: No support for this operation for this object >>>>>>>>>>>>>> type >>>>>>>>>>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>>>>>>>>>> rchurchi at pppl.gov> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Doing some additional testing, the issue goes away when >>>>>>>>>>>>>>>> removing the gamg preconditioner line from the petsc.rc: >>>>>>>>>>>>>>>> -pc_type gamg >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yea, this is GAMG setup. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is the code. findices is create with ISCreateStride, >>>>>>>>>>>>>>> so it is sorted ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Michael is repartitioning the coarse grids. Maybe we don't >>>>>>>>>>>>>>> have a regression test with this... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I will try to reproduce this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Michael: you can use hypre for now, or turn repartitioning >>>>>>>>>>>>>>> off (eg, -fsa_fieldsplit_lambda_upper_pc_gamg_repartition >>>>>>>>>>>>>>> false), but I'm not sure this will fix this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You don't have hypre parameters for all of your all of your >>>>>>>>>>>>>>> solvers. I think 'boomeramg' is the default pc_hypre_type. That should be >>>>>>>>>>>>>>> good enough for you. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> IS findices; >>>>>>>>>>>>>>> PetscInt Istart,Iend; >>>>>>>>>>>>>>> Mat Pnew; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>>>>>>>>>> &Iend);CHKERRQ(ierr); >>>>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>>>>>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>>>>>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>>>>>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>>>>>>>>>> ierr = MatCreateSubMatrix(Pold, findices, >>>>>>>>>>>>>>> new_eq_indices, MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>>>>>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>>>>>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /* output - repartitioned */ >>>>>>>>>>>>>>> *a_P_inout = Pnew; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> OK, this might not be completely satisfactory, because it >>>>>>>>>>>>>>>>>> doesn't show the partitioning or how the matrix is created, but this >>>>>>>>>>>>>>>>>> reproduces the problem. I wrote out my matrix, Amat, from the larger >>>>>>>>>>>>>>>>>> simulation, and load it in this script. This must be run with MPI rank >>>>>>>>>>>>>>>>>> greater than 1. This may be some combination of my petsc.rc, because when I >>>>>>>>>>>>>>>>>> use the PetscInitialize with it, it throws the error, but when using >>>>>>>>>>>>>>>>>> default (PETSC_NULL_CHARACTER) it runs fine. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong >>>>>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>>>>>>>>>> It would be helpful if you can provide us a simple >>>>>>>>>>>>>>>>>>> example that produces this example. >>>>>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm running a Fortran code that was just changed over to >>>>>>>>>>>>>>>>>>>> using petsc 3.8 (previously petsc 3.7.6). An error was thrown during a >>>>>>>>>>>>>>>>>>>> KSPSetUp() call. The error is "unsorted iscol_local is not implemented yet" >>>>>>>>>>>>>>>>>>>> (see full error below). I tried to trace down the difference in the source >>>>>>>>>>>>>>>>>>>> files, but where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npets >>>>>>>>>>>>>>>>>>>> cloc,nreal,nreal,ierr) >>>>>>>>>>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not >>>>>>>>>>>>>>>>>>>> implemented yet >>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, >>>>>>>>>>>>>>>>>>>> unknown[62]PETSC ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() >>>>>>>>>>>>>>>>>>>> line 3418 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line >>>>>>>>>>>>>>>>>>>> 3247 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 >>>>>>>>>>>>>>>>>>>> in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed Nov 8 11:24:02 2017 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 8 Nov 2017 11:24:02 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 ==30976== Invalid read of size 16 ==30976== at 0x8550946: dswap_k_NEHALEM (in /usr/lib/openblas-base/libblas.so.3) ==30976== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas.so.3) ==30976== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3.0) ==30976== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) ==30976== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) ==30976== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) ==30976== by 0x41056E: main (ex56.c:395) ==30976== Address 0xdc650d0 is 52,480 bytes inside a block of size 52,488 alloc'd ==30976== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==30976== by 0x51590F6: PetscMallocAlign (mal.c:39) ==30976== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) ==30976== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) ==30976== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) ==30976== by 0x41056E: main (ex56.c:395) You can fix it on branch hzhang/fix-submat_samerowdist. Hong On Wed, Nov 8, 2017 at 11:01 AM, Mark Adams wrote: > > > On Wed, Nov 8, 2017 at 11:09 AM, Hong wrote: > >> Mark: >> >>> Hong, is >>> > 0-cells: 12 12 0 0 >>> > 1-cells: 20 20 0 0 >>> > 2-cells: 11 11 0 0 >>> > 3-cells: 2 2 0 0 >>> >>> from the old version? >>> >> In O-build on my macPro, I get the above. In g-build, I get >> >>> >>> 0-cells: 8 8 8 8 >>> 1-cells: 12 12 12 12 >>> 2-cells: 6 6 6 6 >>> 3-cells: 1 1 1 1 >>> >> I get this on linux machine. >> Do you know why? >> > > I can not reproduce your O output. I will look at it later. > > Valgrind is failing on me right now. I will look into it but can you > valgrind it? > > >> >> Hong >> >>> >>> On Tue, Nov 7, 2017 at 10:13 PM, Hong wrote: >>> >>>> Mark: >>>> I removed option '-ex56_dm_view'. >>>> Hong >>>> >>>> Humm, this looks a little odd, but it may be OK. Is this this diffing >>>>> with the old non-repartition data? (more below) >>>>> >>>>> On Tue, Nov 7, 2017 at 11:45 AM, Hong wrote: >>>>> >>>>>> Mark, >>>>>> The fix is merged to next branch for tests which show diff as >>>>>> >>>>>> ******* Testing: testexamples_PARMETIS ******* >>>>>> 5c5 >>>>>> < 1 SNES Function norm 1.983e-10 >>>>>> --- >>>>>> > 1 SNES Function norm 1.990e-10 >>>>>> 10,13c10,13 >>>>>> < 0-cells: 8 8 8 8 >>>>>> < 1-cells: 12 12 12 12 >>>>>> < 2-cells: 6 6 6 6 >>>>>> < 3-cells: 1 1 1 1 >>>>>> >>>>>> >>>>> I assume this is the old. >>>>> >>>>> >>>>>> --- >>>>>> > 0-cells: 12 12 0 0 >>>>>> > 1-cells: 20 20 0 0 >>>>>> > 2-cells: 11 11 0 0 >>>>>> > 3-cells: 2 2 0 0 >>>>>> 15,18c15,18 >>>>>> >>>>>> >>>>> and this is the new. >>>>> >>>>> This is funny because the processors are not fully populated. This can >>>>> happen on coarse grids and indeed it should happen in a test with good >>>>> coverage. >>>>> >>>>> I assume these diffs are views from coarse grids? That is, in the raw >>>>> output files do you see fully populated fine grids, with no diffs, and then >>>>> the diffs come on coarse grids. >>>>> >>>>> Repartitioning the coarse grids can change the coarsening, It is >>>>> possible that repartitioning causes faster coarsening (it does a little) >>>>> and this faster coarsening is tripping the aggregation switch, which gives >>>>> us empty processors. >>>>> >>>>> Am I understanding this correctly ... >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>>> < boundary: 1 strata with value/size (1 (23)) >>>>>> < Face Sets: 4 strata with value/size (1 (1), 2 (1), 4 (1), 6 (1)) >>>>>> < marker: 1 strata with value/size (1 (15)) >>>>>> < depth: 4 strata with value/size (0 (8), 1 (12), 2 (6), 3 (1)) >>>>>> --- >>>>>> > boundary: 1 strata with value/size (1 (39)) >>>>>> > Face Sets: 5 strata with value/size (1 (2), 2 (2), 4 (2), 5 (1), 6 (1)) >>>>>> > marker: 1 strata with value/size (1 (27)) >>>>>> > depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>>>> >>>>>> see http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2017/11/07/examples_full_next-tmp.log >>>>>> >>>>>> I guess parmetis produces random partition on different machines (I made output file for ex56_1 on my imac). Please take a look at the differences. If the outputs are correct, I will remove option '-ex56_dm_view' >>>>>> >>>>>> Hong >>>>>> >>>>>> >>>>>> On Sun, Nov 5, 2017 at 9:03 PM, Hong wrote: >>>>>> >>>>>>> Mark: >>>>>>> Bug is fixed in branch hzhang/fix-submat_samerowdist >>>>>>> https://bitbucket.org/petsc/petsc/branch/hzhang/fix-submat_s >>>>>>> amerowdist >>>>>>> >>>>>>> I also add the test runex56. Please test it and let me know if there >>>>>>> is a problem. >>>>>>> Hong >>>>>>> >>>>>>> Also, I have been using -petscpartition_type but now I see >>>>>>>> -pc_gamg_mat_partitioning_type. Is -petscpartition_type >>>>>>>> depreciated for GAMG? >>>>>>>> >>>>>>>> Is this some sort of auto generated portmanteau? I can not find >>>>>>>> pc_gamg_mat_partitioning_type in the source. >>>>>>>> >>>>>>>> On Thu, Nov 2, 2017 at 6:44 PM, Mark Adams wrote: >>>>>>>> >>>>>>>>> Great, thanks, >>>>>>>>> >>>>>>>>> And could you please add these parameters to a regression test? As >>>>>>>>> I recall we have with-parmetis regression test. >>>>>>>>> >>>>>>>>> On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: >>>>>>>>> >>>>>>>>>> Mark: >>>>>>>>>> I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( >>>>>>>>>> Now testing src/snes/examples/tutorials/ex56.c with your >>>>>>>>>> options, I can reproduce the error. >>>>>>>>>> I'll fix it. >>>>>>>>>> >>>>>>>>>> Hong >>>>>>>>>> >>>>>>>>>> Hong, >>>>>>>>>>> >>>>>>>>>>> I've tested with master and I get the same error. Maybe the >>>>>>>>>>> partitioning parameters are wrong. -pc_gamg_mat_partitioning_type is new to >>>>>>>>>>> me. >>>>>>>>>>> >>>>>>>>>>> Can you run this (snes ex56) w/o the error? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ >>>>>>>>>>> make runex >>>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n 4 >>>>>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>>> -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type parmetis >>>>>>>>>>> -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 >>>>>>>>>>> -pc_gamg_repartition true >>>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>>> >>>>>>>>>>> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 (8)) >>>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>> [0]PETSC ERROR: No support for this operation for this object >>>>>>>>>>> type >>>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Nov 2, 2017 at 1:35 PM, Hong wrote: >>>>>>>>>>> >>>>>>>>>>>> Mark : >>>>>>>>>>>> I realize that using maint or master branch, I cannot reproduce >>>>>>>>>>>> the same error. >>>>>>>>>>>> For this example, you must use a parallel partitioner, >>>>>>>>>>>> e.g.,'current' gives me following error: >>>>>>>>>>>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it >>>>>>>>>>>> currently only supports one domain per processor >>>>>>>>>>>> use -pc_gamg_mat_partitioning_type parmetis or chaco or >>>>>>>>>>>> ptscotch for more than one subdomain per processor >>>>>>>>>>>> >>>>>>>>>>>> Please rebase your branch with maint or master, then see if you >>>>>>>>>>>> still have problem. >>>>>>>>>>>> >>>>>>>>>>>> Hong >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Mark, >>>>>>>>>>>>>> I can reproduce this in an old branch, but not in current >>>>>>>>>>>>>> maint and master. >>>>>>>>>>>>>> Which branch are you using to produce this error? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I am using a branch from Matt. Let me try to merge it with >>>>>>>>>>>>> master. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Hong >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am able to reproduce this with snes ex56 with 2 processors >>>>>>>>>>>>>>> and adding -pc_gamg_repartition true >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm not sure how to fix it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>>>>>>>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>>>>>>>>>>> PETSC_DIR=/Users/markadams/Codes/petsc >>>>>>>>>>>>>>> PETSC_ARCH=arch-macosx-gnu-g runex >>>>>>>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec >>>>>>>>>>>>>>> -n 2 ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it >>>>>>>>>>>>>>> 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>>>>>>> -mg_levels_pc_type jacobi -petscpartitioner_type simple -mat_block_size 3 >>>>>>>>>>>>>>> -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition >>>>>>>>>>>>>>> true >>>>>>>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>>>>>>> 1 KSP Residual norm 20.4696 >>>>>>>>>>>>>>> 2 KSP Residual norm 3.95009 >>>>>>>>>>>>>>> 3 KSP Residual norm 0.176181 >>>>>>>>>>>>>>> 4 KSP Residual norm 0.0208781 >>>>>>>>>>>>>>> 5 KSP Residual norm 0.00278873 >>>>>>>>>>>>>>> 6 KSP Residual norm 0.000482741 >>>>>>>>>>>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>>>>>>>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>>>>>>>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>>>>>>>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>>>>>>>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>>>>>>>>>>> 12 KSP Residual norm 1.989e-10 >>>>>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>>>>>>>>>>>> 1 SNES Function norm 1.990e-10 >>>>>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>>>>> iterations 1 >>>>>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>>>>> type: plex >>>>>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>>>>> 0-cells: 12 12 >>>>>>>>>>>>>>> 1-cells: 20 20 >>>>>>>>>>>>>>> 2-cells: 11 11 >>>>>>>>>>>>>>> 3-cells: 2 2 >>>>>>>>>>>>>>> Labels: >>>>>>>>>>>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), >>>>>>>>>>>>>>> 5 (1), 6 (1)) >>>>>>>>>>>>>>> marker: 1 strata with value/size (1 (27)) >>>>>>>>>>>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 >>>>>>>>>>>>>>> (2)) >>>>>>>>>>>>>>> [0] 441 global equations, 147 vertices >>>>>>>>>>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>>>>>>>>>> 0 SNES Function norm 49.7106 >>>>>>>>>>>>>>> 0 KSP Residual norm 49.7106 >>>>>>>>>>>>>>> 1 KSP Residual norm 12.9252 >>>>>>>>>>>>>>> 2 KSP Residual norm 2.38019 >>>>>>>>>>>>>>> 3 KSP Residual norm 0.426307 >>>>>>>>>>>>>>> 4 KSP Residual norm 0.0692155 >>>>>>>>>>>>>>> 5 KSP Residual norm 0.0123092 >>>>>>>>>>>>>>> 6 KSP Residual norm 0.00184874 >>>>>>>>>>>>>>> 7 KSP Residual norm 0.000320761 >>>>>>>>>>>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>>>>>>>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>>>>>>>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>>>>>>>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>>>>>>>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>>>>>>>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>>>>>>>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>>>>>>>>>>> 15 KSP Residual norm 2.174e-10 >>>>>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>>>>>>>>>>>> 1 SNES Function norm 2.174e-10 >>>>>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>>>>> iterations 1 >>>>>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>>>>> type: plex >>>>>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>>>>> 0-cells: 45 45 >>>>>>>>>>>>>>> 1-cells: 96 96 >>>>>>>>>>>>>>> 2-cells: 68 68 >>>>>>>>>>>>>>> 3-cells: 16 16 >>>>>>>>>>>>>>> Labels: >>>>>>>>>>>>>>> marker: 1 strata with value/size (1 (129)) >>>>>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 >>>>>>>>>>>>>>> (18), 5 (9), 6 (9)) >>>>>>>>>>>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>>>>>>>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), 3 >>>>>>>>>>>>>>> (16)) >>>>>>>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> [0]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> [1]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>>>>>>>>>>> rchurchi at pppl.gov> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Doing some additional testing, the issue goes away when >>>>>>>>>>>>>>>>> removing the gamg preconditioner line from the petsc.rc: >>>>>>>>>>>>>>>>> -pc_type gamg >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yea, this is GAMG setup. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is the code. findices is create with ISCreateStride, >>>>>>>>>>>>>>>> so it is sorted ... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Michael is repartitioning the coarse grids. Maybe we don't >>>>>>>>>>>>>>>> have a regression test with this... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I will try to reproduce this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Michael: you can use hypre for now, or turn repartitioning >>>>>>>>>>>>>>>> off (eg, -fsa_fieldsplit_lambda_upper_pc_gamg_repartition >>>>>>>>>>>>>>>> false), but I'm not sure this will fix this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You don't have hypre parameters for all of your all of your >>>>>>>>>>>>>>>> solvers. I think 'boomeramg' is the default pc_hypre_type. That should be >>>>>>>>>>>>>>>> good enough for you. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> IS findices; >>>>>>>>>>>>>>>> PetscInt Istart,Iend; >>>>>>>>>>>>>>>> Mat Pnew; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>>>>>>>>>>> &Iend);CHKERRQ(ierr); >>>>>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>>>>>>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>>>>>>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>>>>>>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>>>>>>>>>>> ierr = MatCreateSubMatrix(Pold, findices, >>>>>>>>>>>>>>>> new_eq_indices, MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>>>>>>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>>>>>>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> /* output - repartitioned */ >>>>>>>>>>>>>>>> *a_P_inout = Pnew; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> OK, this might not be completely satisfactory, because it >>>>>>>>>>>>>>>>>>> doesn't show the partitioning or how the matrix is created, but this >>>>>>>>>>>>>>>>>>> reproduces the problem. I wrote out my matrix, Amat, from the larger >>>>>>>>>>>>>>>>>>> simulation, and load it in this script. This must be run with MPI rank >>>>>>>>>>>>>>>>>>> greater than 1. This may be some combination of my petsc.rc, because when I >>>>>>>>>>>>>>>>>>> use the PetscInitialize with it, it throws the error, but when using >>>>>>>>>>>>>>>>>>> default (PETSC_NULL_CHARACTER) it runs fine. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong < >>>>>>>>>>>>>>>>>>> hzhang at mcs.anl.gov> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>>>>>>>>>>> It would be helpful if you can provide us a simple >>>>>>>>>>>>>>>>>>>> example that produces this example. >>>>>>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I'm running a Fortran code that was just changed over >>>>>>>>>>>>>>>>>>>>> to using petsc 3.8 (previously petsc 3.7.6). An error was thrown during a >>>>>>>>>>>>>>>>>>>>> KSPSetUp() call. The error is "unsorted iscol_local is not implemented yet" >>>>>>>>>>>>>>>>>>>>> (see full error below). I tried to trace down the difference in the source >>>>>>>>>>>>>>>>>>>>> files, but where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npets >>>>>>>>>>>>>>>>>>>>> cloc,nreal,nreal,ierr) >>>>>>>>>>>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for >>>>>>>>>>>>>>>>>>>>> this object type >>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not >>>>>>>>>>>>>>>>>>>>> implemented yet >>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, >>>>>>>>>>>>>>>>>>>>> unknown[62]PETSC ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() >>>>>>>>>>>>>>>>>>>>> line 3418 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line >>>>>>>>>>>>>>>>>>>>> 3247 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 >>>>>>>>>>>>>>>>>>>>> in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Nov 8 18:04:36 2017 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 8 Nov 2017 19:04:36 -0500 Subject: [petsc-users] troubleshooting AMG, coupled navier stokes, large eigenvalues on coarsest level In-Reply-To: References: Message-ID: On Tue, Nov 7, 2017 at 3:00 PM, Mark Lohry wrote: > I've now got gamg running on matrix-free newton-krylov with the jacobian > provided by coloring finite differences (thanks again for the help). 3D > Poisson with 4th order DG or higher (35^2 blocks), gamg with default > settings is giving textbook convergence, which is great. Of course coupled > compressible navier-stokes is harder, and convergence is bad-to-nonexistent. > > > 1) Doc says "Equations must be ordered in ?vertex-major? ordering"; in my > discretization, each "node" has 5 coupled degrees of freedom (density, 3 x > momentum, energy). I'm ordering my unknowns: > > rho_i, rhou_i, rhov_i, rhow_i, Et_i, rho_i+1, rhou_i+1, ... e.g. > row-major matrix order if you wrote the unknowns [{rho}, {rhou}, ... ]. > > and then setting block size to 5. Is that correct? > yes > I've also tried using the actual sparsity of the matrix > Sorry, but using for what? > which has larger dense blocks (e.g. [35x5]^2), but neither seemed to help. > Do you mean the element Jacobian matrix has 35 "vertices"? You have 5x5 dense blocks (or at least you don't care about resolving any sparsity or you want to use block preconditioners. So your element Jacobians are [35x5]^2 dense(ish) matrices. This is not particularly useful for the solvers. > > > 2) With default settings, and with -pc_gamg_square_graph, > pc_gamg_sym_graph, agg_nsmooths 0 mentioned in the manual, the eigenvalue > estimates explode on the coarsest level, which I don't see with poisson: > > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 32 MPI processes > type: chebyshev > eigenvalue estimates used: min = 0.18994, max = 2.08935 > eigenvalues estimate via gmres min 0.00933256, max 1.8994 > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 32 MPI processes > type: chebyshev > eigenvalue estimates used: min = 0.165969, max = 1.82566 > eigenvalues estimate via gmres min 0.0290728, max 1.65969 > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 32 MPI processes > type: chebyshev > eigenvalue estimates used: min = 0.146479, max = 1.61126 > eigenvalues estimate via gmres min 0.204673, max 1.46479 > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 32 MPI processes > type: chebyshev > eigenvalue estimates used: min = 6.81977e+09, max = 7.50175e+10 > eigenvalues estimate via gmres min -2.76436e+12, max 6.81977e+10 > > What's happening here? (Full -ksp_view below) > This is on the full NS problem I assume. You need special solvers for this. Scalable solvers for NS is not trivial. > > 3) I'm not very familiar with chebyshev smoothers, but they're only for > SPD systems (?). > Yes, you should use Gauss-Seidel for non-symmetric systems. Or indefinite systems > Is this an inappropriate smoother for this problem? > > 4) With gmres, the preconditioned residual is ~10 orders larger than the > true residual; and the preconditioned residual drops while the true > residual rises. I'm assuming this means something very wrong? > Yes, your preconditioner is no good. > > 5) -pc_type hyper -pc_hypre_type boomeramg also works perfectly for the > poisson case, but hits NaN on the first cycle for NS. > > > You want to use the Schur complement PCs for NS. We have some support for PC where you give us a mass matrix. These are motivated by assuming there is a "commutation" (that is not true). These were developed by, among others, Andy Wathan, if you want to do a literature search. You will probably have to read the literature to understand the options and issues for your problem. Look at the PETSc manual and you should find a description of PETSc support for these PCs and some discussion of them and references. This should get you started. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Nov 8 19:02:45 2017 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 8 Nov 2017 20:02:45 -0500 Subject: [petsc-users] troubleshooting AMG, coupled navier stokes, large eigenvalues on coarsest level In-Reply-To: References: Message-ID: > > Do you mean the element Jacobian matrix has 35 "vertices"? You have 5x5 > dense blocks (or at least you don't care about resolving any sparsity or > you want to use block preconditioners. So your element Jacobians are > [35x5]^2 dense(ish) matrices. This is not particularly useful for the > solvers. Yes, the local element jacobians are (35x5)^2 ( (N+1)(N+2)(N+3)/6 vertices per Nth order element for 3D). They'd be 100% dense. So the real blocks of the full system jacobian are of that size, and not 5x5. Block ILU is pretty popular for these discretizations for obvious reasons. This is on the full NS problem I assume. You need special solvers for this. > Scalable solvers for NS is not trivial. Yes, compressible 3D NS. I'm aware scaling isn't trivial, but this isn't a scaling problem, I'm obviously computing a wrong preconditioner since it's actively hurting the linear solve. Yes, you should use Gauss-Seidel for non-symmetric systems. Or indefinite > systems Thanks! You want to use the Schur complement PCs for NS. We have some support for > PC where you give us a mass matrix. I'll look into it, but I'm not aware of Schur complement being used for compressible/coupled NS, only for incompressible/segregated. This is DG where the mass matrix is a block diagonal constant you typically just invert once at the beginning; is there a PC where this can be exploited? On Wed, Nov 8, 2017 at 7:04 PM, Mark Adams wrote: > > > On Tue, Nov 7, 2017 at 3:00 PM, Mark Lohry wrote: > >> I've now got gamg running on matrix-free newton-krylov with the jacobian >> provided by coloring finite differences (thanks again for the help). 3D >> Poisson with 4th order DG or higher (35^2 blocks), gamg with default >> settings is giving textbook convergence, which is great. Of course coupled >> compressible navier-stokes is harder, and convergence is bad-to-nonexistent. >> >> >> 1) Doc says "Equations must be ordered in ?vertex-major? ordering"; in my >> discretization, each "node" has 5 coupled degrees of freedom (density, 3 x >> momentum, energy). I'm ordering my unknowns: >> >> rho_i, rhou_i, rhov_i, rhow_i, Et_i, rho_i+1, rhou_i+1, ... e.g. >> row-major matrix order if you wrote the unknowns [{rho}, {rhou}, ... ]. >> >> and then setting block size to 5. Is that correct? >> > > yes > > >> I've also tried using the actual sparsity of the matrix >> > > Sorry, but using for what? > > >> which has larger dense blocks (e.g. [35x5]^2), but neither seemed to help. >> > > Do you mean the element Jacobian matrix has 35 "vertices"? You have 5x5 > dense blocks (or at least you don't care about resolving any sparsity or > you want to use block preconditioners. So your element Jacobians are > [35x5]^2 dense(ish) matrices. This is not particularly useful for the > solvers. > > >> >> >> 2) With default settings, and with -pc_gamg_square_graph, >> pc_gamg_sym_graph, agg_nsmooths 0 mentioned in the manual, the eigenvalue >> estimates explode on the coarsest level, which I don't see with poisson: >> >> Down solver (pre-smoother) on level 1 ------------------------------- >> KSP Object: (mg_levels_1_) 32 MPI processes >> type: chebyshev >> eigenvalue estimates used: min = 0.18994, max = 2.08935 >> eigenvalues estimate via gmres min 0.00933256, max 1.8994 >> Down solver (pre-smoother) on level 2 ------------------------------- >> KSP Object: (mg_levels_2_) 32 MPI processes >> type: chebyshev >> eigenvalue estimates used: min = 0.165969, max = 1.82566 >> eigenvalues estimate via gmres min 0.0290728, max 1.65969 >> Down solver (pre-smoother) on level 3 ------------------------------- >> KSP Object: (mg_levels_3_) 32 MPI processes >> type: chebyshev >> eigenvalue estimates used: min = 0.146479, max = 1.61126 >> eigenvalues estimate via gmres min 0.204673, max 1.46479 >> Down solver (pre-smoother) on level 4 ------------------------------- >> KSP Object: (mg_levels_4_) 32 MPI processes >> type: chebyshev >> eigenvalue estimates used: min = 6.81977e+09, max = 7.50175e+10 >> eigenvalues estimate via gmres min -2.76436e+12, max 6.81977e+10 >> >> What's happening here? (Full -ksp_view below) >> > > This is on the full NS problem I assume. You need special solvers for > this. Scalable solvers for NS is not trivial. > > >> >> 3) I'm not very familiar with chebyshev smoothers, but they're only for >> SPD systems (?). >> > > Yes, you should use Gauss-Seidel for non-symmetric systems. Or indefinite > systems > > >> Is this an inappropriate smoother for this problem? >> >> 4) With gmres, the preconditioned residual is ~10 orders larger than the >> true residual; and the preconditioned residual drops while the true >> residual rises. I'm assuming this means something very wrong? >> > > Yes, your preconditioner is no good. > > >> >> 5) -pc_type hyper -pc_hypre_type boomeramg also works perfectly for the >> poisson case, but hits NaN on the first cycle for NS. >> >> >> > You want to use the Schur complement PCs for NS. We have some support for > PC where you give us a mass matrix. These are motivated by assuming there > is a "commutation" (that is not true). These were developed by, among > others, Andy Wathan, if you want to do a literature search. You will > probably have to read the literature to understand the options and issues > for your problem. Look at the PETSc manual and you should find a > description of PETSc support for these PCs and some discussion of them and > references. This should get you started. > > Mark > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Nov 8 19:05:41 2017 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 8 Nov 2017 20:05:41 -0500 Subject: [petsc-users] GAMG advice In-Reply-To: <47a47b6b-ce8c-10f6-0ded-bf87e9af1bbd@dim.uchile.cl> References: <47a47b6b-ce8c-10f6-0ded-bf87e9af1bbd@dim.uchile.cl> Message-ID: > > > Now I'd like to try GAMG instead of ML. However, I don't know how to set > it up to get similar performance. > The obvious/naive > > -pc_type gamg > -pc_gamg_type agg > > # with and without > -pc_gamg_threshold 0.03 > -pc_mg_levels 3 > > This looks fine. I would not set the number of levels but if it helps then go for it. > converges very slowly on 1 proc and much worse on 8 (~200k dofs per > proc), for instance: > np = 1: > 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm > 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 > 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm > 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 > 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm > 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 > > np = 8: > 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm > 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 > 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > > A very high threshold seems to improve the GAMG PC, for instance with > 0.75 I get convergence to rtol=1e-6 after 744 iterations. > What else should I try? > Not sure. ML use the same algorithm as GAMG (so the threshold means the same thing pretty much). ML is a good solver and the leader, Ray Tuminaro, has had a lot of NS experience. But I'm not sure what the differences are that are resulting in this performance. * It looks like you are using sor for the coarse grid solver in gamg: Coarse grid solver -- level ------------------------------- KSP Object: (mg_levels_0_) 1 MPI processes type: preonly maximum iterations=2, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_0_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = You should/must use lu, like in ML. This will kill you. * smoothed aggregation vs unsmoothed: GAMG's view data does not say if it is smoothing. Damn, I need to fix that. For NS, you probably want unsmoothed (-pc_gamg_agg_nsmooths 0). I'm not sure what the ML parameter is for this nor do I know the default. It should make a noticable difference (good or bad). * Threshold for dropping small values from graph 0.75 -- this is crazy :) This is all that I can think of now. Mark > > I would very much appreciate any advice on configuring GAMG and > differences w.r.t ML to be taken into account (not a multigrid expert > though). > > Thanks, best wishes > David > > > ------ > ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3 > > KSP Object: 1 MPI processes > type: fgmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10000 > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > right preconditioning > using nonzero initial guess > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: gamg > MG: type is MULTIPLICATIVE, levels=1 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > GAMG specific options > Threshold for dropping small values from graph 0.75 > AGG specific options > Symmetric graph false > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_levels_0_) 1 MPI processes > type: preonly > maximum iterations=2, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_levels_0_) 1 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = > 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=1745224, cols=1745224 > total: nonzeros=99452608, allocated nonzeros=99452608 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 1037847 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=1745224, cols=1745224 > total: nonzeros=99452608, allocated nonzeros=99452608 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 1037847 nodes, limit used is 5 > > > ------ > ksp_view for -pc_type ml: > > KSP Object: 8 MPI processes > type: fgmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10000 > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > right preconditioning > using nonzero initial guess > using UNPRECONDITIONED norm type for convergence test > PC Object: 8 MPI processes > type: ml > MG: type is MULTIPLICATIVE, levels=3 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 8 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 8 MPI processes > type: redundant > Redundant preconditioner: First (color=0) of 8 PCs follows > KSP Object: (mg_coarse_redundant_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_redundant_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 10.4795 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=6822, cols=6822 > package used to perform factorization: petsc > total: nonzeros=9575688, allocated nonzeros=9575688 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=6822, cols=6822 > total: nonzeros=913758, allocated nonzeros=913758 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 8 MPI processes > type: mpiaij > rows=6822, cols=6822 > total: nonzeros=913758, allocated nonzeros=913758 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 8 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 8 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = > 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 8 MPI processes > type: mpiaij > rows=67087, cols=67087 > total: nonzeros=9722749, allocated nonzeros=9722749 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 8 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 8 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = > 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 8 MPI processes > type: mpiaij > rows=1745224, cols=1745224 > total: nonzeros=99452608, allocated nonzeros=99452608 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 126690 nodes, > limit used is 5 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 8 MPI processes > type: mpiaij > rows=1745224, cols=1745224 > total: nonzeros=99452608, allocated nonzeros=99452608 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 126690 nodes, limit > used is 5 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Nov 8 19:07:41 2017 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 8 Nov 2017 20:07:41 -0500 Subject: [petsc-users] GAMG advice In-Reply-To: <6169118C-34FE-491C-BCB4-A86BECCFBAA9@mcs.anl.gov> References: <47a47b6b-ce8c-10f6-0ded-bf87e9af1bbd@dim.uchile.cl> <991cd7c4-bb92-ed2c-193d-7232c1ff6199@dim.uchile.cl> <6169118C-34FE-491C-BCB4-A86BECCFBAA9@mcs.anl.gov> Message-ID: On Fri, Oct 20, 2017 at 11:10 PM, Barry Smith wrote: > > David, > > GAMG picks the number of levels based on how the coarsening process etc > proceeds. You cannot hardwire it to a particular value. Yes you can. GAMG will respect -pc_mg_levels N, but we don't recommend using it. > You can run with -info to get more info potentially on the decisions GAMG > is making. > this is noisy but grep on GAMG and you will see the levels and sizes, etc. > > Barry > > > On Oct 20, 2017, at 2:06 PM, David Nolte wrote: > > > > PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option > > was not taken into account: > > type: gamg > > MG: type is MULTIPLICATIVE, levels=1 cycles=v > > > > > > > > On 10/20/2017 03:32 PM, David Nolte wrote: > >> Dear all, > >> > >> I have some problems using GAMG as a preconditioner for (F)GMRES. > >> Background: I am solving the incompressible, unsteady Navier-Stokes > >> equations with a coupled mixed FEM approach, using P1/P1 elements for > >> velocity and pressure on an unstructured tetrahedron mesh with about > >> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, > >> hence, no zeros on the diagonal of the pressure block. Time > >> discretization with semi-implicit backward Euler. The flow is a > >> convection dominated flow through a nozzle. > >> > >> So far, for this setup, I have been quite happy with a simple FGMRES/ML > >> solver for the full system (rather bruteforce, I admit, but much faster > >> than any block/Schur preconditioners I tried): > >> > >> -ksp_converged_reason > >> -ksp_monitor_true_residual > >> -ksp_type fgmres > >> -ksp_rtol 1.0e-6 > >> -ksp_initial_guess_nonzero > >> > >> -pc_type ml > >> -pc_ml_Threshold 0.03 > >> -pc_ml_maxNlevels 3 > >> > >> This setup converges in ~100 iterations (see below the ksp_view output) > >> to rtol: > >> > >> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm > >> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 > >> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm > >> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 > >> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm > >> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 > >> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm > >> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 > >> > >> > >> Now I'd like to try GAMG instead of ML. However, I don't know how to set > >> it up to get similar performance. > >> The obvious/naive > >> > >> -pc_type gamg > >> -pc_gamg_type agg > >> > >> # with and without > >> -pc_gamg_threshold 0.03 > >> -pc_mg_levels 3 > >> > >> converges very slowly on 1 proc and much worse on 8 (~200k dofs per > >> proc), for instance: > >> np = 1: > >> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm > >> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 > >> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm > >> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 > >> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm > >> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 > >> > >> np = 8: > >> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm > >> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 > >> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >> > >> A very high threshold seems to improve the GAMG PC, for instance with > >> 0.75 I get convergence to rtol=1e-6 after 744 iterations. > >> What else should I try? > >> > >> I would very much appreciate any advice on configuring GAMG and > >> differences w.r.t ML to be taken into account (not a multigrid expert > >> though). > >> > >> Thanks, best wishes > >> David > >> > >> > >> ------ > >> ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3 > >> > >> KSP Object: 1 MPI processes > >> type: fgmres > >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > >> Orthogonalization with no iterative refinement > >> GMRES: happy breakdown tolerance 1e-30 > >> maximum iterations=10000 > >> tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > >> right preconditioning > >> using nonzero initial guess > >> using UNPRECONDITIONED norm type for convergence test > >> PC Object: 1 MPI processes > >> type: gamg > >> MG: type is MULTIPLICATIVE, levels=1 cycles=v > >> Cycles per PCApply=1 > >> Using Galerkin computed coarse grid matrices > >> GAMG specific options > >> Threshold for dropping small values from graph 0.75 > >> AGG specific options > >> Symmetric graph false > >> Coarse grid solver -- level ------------------------------- > >> KSP Object: (mg_levels_0_) 1 MPI processes > >> type: preonly > >> maximum iterations=2, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_levels_0_) 1 MPI processes > >> type: sor > >> SOR: type = local_symmetric, iterations = 1, local iterations = > >> 1, omega = 1. > >> linear system matrix = precond matrix: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=1745224, cols=1745224 > >> total: nonzeros=99452608, allocated nonzeros=99452608 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node routines: found 1037847 nodes, limit used is 5 > >> linear system matrix = precond matrix: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=1745224, cols=1745224 > >> total: nonzeros=99452608, allocated nonzeros=99452608 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node routines: found 1037847 nodes, limit used is 5 > >> > >> > >> ------ > >> ksp_view for -pc_type ml: > >> > >> KSP Object: 8 MPI processes > >> type: fgmres > >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > >> Orthogonalization with no iterative refinement > >> GMRES: happy breakdown tolerance 1e-30 > >> maximum iterations=10000 > >> tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > >> right preconditioning > >> using nonzero initial guess > >> using UNPRECONDITIONED norm type for convergence test > >> PC Object: 8 MPI processes > >> type: ml > >> MG: type is MULTIPLICATIVE, levels=3 cycles=v > >> Cycles per PCApply=1 > >> Using Galerkin computed coarse grid matrices > >> Coarse grid solver -- level ------------------------------- > >> KSP Object: (mg_coarse_) 8 MPI processes > >> type: preonly > >> maximum iterations=10000, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_coarse_) 8 MPI processes > >> type: redundant > >> Redundant preconditioner: First (color=0) of 8 PCs follows > >> KSP Object: (mg_coarse_redundant_) 1 MPI > processes > >> type: preonly > >> maximum iterations=10000, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_coarse_redundant_) 1 MPI processes > >> type: lu > >> LU: out-of-place factorization > >> tolerance for zero pivot 2.22045e-14 > >> using diagonal shift on blocks to prevent zero pivot > [INBLOCKS] > >> matrix ordering: nd > >> factor fill ratio given 5., needed 10.4795 > >> Factored matrix follows: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=6822, cols=6822 > >> package used to perform factorization: petsc > >> total: nonzeros=9575688, allocated nonzeros=9575688 > >> total number of mallocs used during MatSetValues > calls =0 > >> not using I-node routines > >> linear system matrix = precond matrix: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=6822, cols=6822 > >> total: nonzeros=913758, allocated nonzeros=913758 > >> total number of mallocs used during MatSetValues calls =0 > >> not using I-node routines > >> linear system matrix = precond matrix: > >> Mat Object: 8 MPI processes > >> type: mpiaij > >> rows=6822, cols=6822 > >> total: nonzeros=913758, allocated nonzeros=913758 > >> total number of mallocs used during MatSetValues calls =0 > >> not using I-node (on process 0) routines > >> Down solver (pre-smoother) on level 1 ------------------------------- > >> KSP Object: (mg_levels_1_) 8 MPI processes > >> type: richardson > >> Richardson: damping factor=1. > >> maximum iterations=2 > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using nonzero initial guess > >> using NONE norm type for convergence test > >> PC Object: (mg_levels_1_) 8 MPI processes > >> type: sor > >> SOR: type = local_symmetric, iterations = 1, local iterations = > >> 1, omega = 1. > >> linear system matrix = precond matrix: > >> Mat Object: 8 MPI processes > >> type: mpiaij > >> rows=67087, cols=67087 > >> total: nonzeros=9722749, allocated nonzeros=9722749 > >> total number of mallocs used during MatSetValues calls =0 > >> not using I-node (on process 0) routines > >> Up solver (post-smoother) same as down solver (pre-smoother) > >> Down solver (pre-smoother) on level 2 ------------------------------- > >> KSP Object: (mg_levels_2_) 8 MPI processes > >> type: richardson > >> Richardson: damping factor=1. > >> maximum iterations=2 > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using nonzero initial guess > >> using NONE norm type for convergence test > >> PC Object: (mg_levels_2_) 8 MPI processes > >> type: sor > >> SOR: type = local_symmetric, iterations = 1, local iterations = > >> 1, omega = 1. > >> linear system matrix = precond matrix: > >> Mat Object: 8 MPI processes > >> type: mpiaij > >> rows=1745224, cols=1745224 > >> total: nonzeros=99452608, allocated nonzeros=99452608 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node (on process 0) routines: found 126690 nodes, > >> limit used is 5 > >> Up solver (post-smoother) same as down solver (pre-smoother) > >> linear system matrix = precond matrix: > >> Mat Object: 8 MPI processes > >> type: mpiaij > >> rows=1745224, cols=1745224 > >> total: nonzeros=99452608, allocated nonzeros=99452608 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node (on process 0) routines: found 126690 nodes, limit > >> used is 5 > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Nov 8 19:11:25 2017 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 8 Nov 2017 20:11:25 -0500 Subject: [petsc-users] GAMG advice In-Reply-To: <0779aa51-17c8-0ef3-fd01-1413ee1225ea@dim.uchile.cl> References: <47a47b6b-ce8c-10f6-0ded-bf87e9af1bbd@dim.uchile.cl> <991cd7c4-bb92-ed2c-193d-7232c1ff6199@dim.uchile.cl> <6169118C-34FE-491C-BCB4-A86BECCFBAA9@mcs.anl.gov> <0779aa51-17c8-0ef3-fd01-1413ee1225ea@dim.uchile.cl> Message-ID: On Wed, Nov 1, 2017 at 5:45 PM, David Nolte wrote: > Thanks Barry. > By simply replacing chebychev by richardson I get similar performance > with GAMG and ML That too (I assumed you were using the same, I could not see cheby in your view data). I guess SOR works for the coarse grid solver because the coarse grid is small. It should help using lu. > (GAMG even slightly faster): > This is "random" fluctuations. > > -pc_type > gamg > > > > -pc_gamg_type > agg > > > > -pc_gamg_threshold > 0.03 > > > > -pc_gamg_square_graph 10 > -pc_gamg_sym_graph > -mg_levels_ksp_type > richardson > > > > -mg_levels_pc_type sor > > Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix > is asymmetric? yes, > For serial runs it doesn't seem to matter, yes, > but in > parallel the PC setup hangs (after calls of > PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set. > yep, > > David > > > On 10/21/2017 12:10 AM, Barry Smith wrote: > > David, > > > > GAMG picks the number of levels based on how the coarsening process > etc proceeds. You cannot hardwire it to a particular value. You can run > with -info to get more info potentially on the decisions GAMG is making. > > > > Barry > > > >> On Oct 20, 2017, at 2:06 PM, David Nolte wrote: > >> > >> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option > >> was not taken into account: > >> type: gamg > >> MG: type is MULTIPLICATIVE, levels=1 cycles=v > >> > >> > >> > >> On 10/20/2017 03:32 PM, David Nolte wrote: > >>> Dear all, > >>> > >>> I have some problems using GAMG as a preconditioner for (F)GMRES. > >>> Background: I am solving the incompressible, unsteady Navier-Stokes > >>> equations with a coupled mixed FEM approach, using P1/P1 elements for > >>> velocity and pressure on an unstructured tetrahedron mesh with about > >>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, > >>> hence, no zeros on the diagonal of the pressure block. Time > >>> discretization with semi-implicit backward Euler. The flow is a > >>> convection dominated flow through a nozzle. > >>> > >>> So far, for this setup, I have been quite happy with a simple FGMRES/ML > >>> solver for the full system (rather bruteforce, I admit, but much faster > >>> than any block/Schur preconditioners I tried): > >>> > >>> -ksp_converged_reason > >>> -ksp_monitor_true_residual > >>> -ksp_type fgmres > >>> -ksp_rtol 1.0e-6 > >>> -ksp_initial_guess_nonzero > >>> > >>> -pc_type ml > >>> -pc_ml_Threshold 0.03 > >>> -pc_ml_maxNlevels 3 > >>> > >>> This setup converges in ~100 iterations (see below the ksp_view output) > >>> to rtol: > >>> > >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm > >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 > >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm > >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 > >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm > >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 > >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm > >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 > >>> > >>> > >>> Now I'd like to try GAMG instead of ML. However, I don't know how to > set > >>> it up to get similar performance. > >>> The obvious/naive > >>> > >>> -pc_type gamg > >>> -pc_gamg_type agg > >>> > >>> # with and without > >>> -pc_gamg_threshold 0.03 > >>> -pc_mg_levels 3 > >>> > >>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per > >>> proc), for instance: > >>> np = 1: > >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm > >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 > >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm > >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 > >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm > >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 > >>> > >>> np = 8: > >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm > >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 > >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >>> > >>> A very high threshold seems to improve the GAMG PC, for instance with > >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations. > >>> What else should I try? > >>> > >>> I would very much appreciate any advice on configuring GAMG and > >>> differences w.r.t ML to be taken into account (not a multigrid expert > >>> though). > >>> > >>> Thanks, best wishes > >>> David > >>> > >>> > >>> ------ > >>> ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3 > >>> > >>> KSP Object: 1 MPI processes > >>> type: fgmres > >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > >>> Orthogonalization with no iterative refinement > >>> GMRES: happy breakdown tolerance 1e-30 > >>> maximum iterations=10000 > >>> tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > >>> right preconditioning > >>> using nonzero initial guess > >>> using UNPRECONDITIONED norm type for convergence test > >>> PC Object: 1 MPI processes > >>> type: gamg > >>> MG: type is MULTIPLICATIVE, levels=1 cycles=v > >>> Cycles per PCApply=1 > >>> Using Galerkin computed coarse grid matrices > >>> GAMG specific options > >>> Threshold for dropping small values from graph 0.75 > >>> AGG specific options > >>> Symmetric graph false > >>> Coarse grid solver -- level ------------------------------- > >>> KSP Object: (mg_levels_0_) 1 MPI processes > >>> type: preonly > >>> maximum iterations=2, initial guess is zero > >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >>> left preconditioning > >>> using NONE norm type for convergence test > >>> PC Object: (mg_levels_0_) 1 MPI processes > >>> type: sor > >>> SOR: type = local_symmetric, iterations = 1, local iterations = > >>> 1, omega = 1. > >>> linear system matrix = precond matrix: > >>> Mat Object: 1 MPI processes > >>> type: seqaij > >>> rows=1745224, cols=1745224 > >>> total: nonzeros=99452608, allocated nonzeros=99452608 > >>> total number of mallocs used during MatSetValues calls =0 > >>> using I-node routines: found 1037847 nodes, limit used is 5 > >>> linear system matrix = precond matrix: > >>> Mat Object: 1 MPI processes > >>> type: seqaij > >>> rows=1745224, cols=1745224 > >>> total: nonzeros=99452608, allocated nonzeros=99452608 > >>> total number of mallocs used during MatSetValues calls =0 > >>> using I-node routines: found 1037847 nodes, limit used is 5 > >>> > >>> > >>> ------ > >>> ksp_view for -pc_type ml: > >>> > >>> KSP Object: 8 MPI processes > >>> type: fgmres > >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > >>> Orthogonalization with no iterative refinement > >>> GMRES: happy breakdown tolerance 1e-30 > >>> maximum iterations=10000 > >>> tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > >>> right preconditioning > >>> using nonzero initial guess > >>> using UNPRECONDITIONED norm type for convergence test > >>> PC Object: 8 MPI processes > >>> type: ml > >>> MG: type is MULTIPLICATIVE, levels=3 cycles=v > >>> Cycles per PCApply=1 > >>> Using Galerkin computed coarse grid matrices > >>> Coarse grid solver -- level ------------------------------- > >>> KSP Object: (mg_coarse_) 8 MPI processes > >>> type: preonly > >>> maximum iterations=10000, initial guess is zero > >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >>> left preconditioning > >>> using NONE norm type for convergence test > >>> PC Object: (mg_coarse_) 8 MPI processes > >>> type: redundant > >>> Redundant preconditioner: First (color=0) of 8 PCs follows > >>> KSP Object: (mg_coarse_redundant_) 1 MPI > processes > >>> type: preonly > >>> maximum iterations=10000, initial guess is zero > >>> tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > >>> left preconditioning > >>> using NONE norm type for convergence test > >>> PC Object: (mg_coarse_redundant_) 1 MPI > processes > >>> type: lu > >>> LU: out-of-place factorization > >>> tolerance for zero pivot 2.22045e-14 > >>> using diagonal shift on blocks to prevent zero pivot > [INBLOCKS] > >>> matrix ordering: nd > >>> factor fill ratio given 5., needed 10.4795 > >>> Factored matrix follows: > >>> Mat Object: 1 MPI processes > >>> type: seqaij > >>> rows=6822, cols=6822 > >>> package used to perform factorization: petsc > >>> total: nonzeros=9575688, allocated nonzeros=9575688 > >>> total number of mallocs used during MatSetValues > calls =0 > >>> not using I-node routines > >>> linear system matrix = precond matrix: > >>> Mat Object: 1 MPI processes > >>> type: seqaij > >>> rows=6822, cols=6822 > >>> total: nonzeros=913758, allocated nonzeros=913758 > >>> total number of mallocs used during MatSetValues calls =0 > >>> not using I-node routines > >>> linear system matrix = precond matrix: > >>> Mat Object: 8 MPI processes > >>> type: mpiaij > >>> rows=6822, cols=6822 > >>> total: nonzeros=913758, allocated nonzeros=913758 > >>> total number of mallocs used during MatSetValues calls =0 > >>> not using I-node (on process 0) routines > >>> Down solver (pre-smoother) on level 1 ------------------------------ > - > >>> KSP Object: (mg_levels_1_) 8 MPI processes > >>> type: richardson > >>> Richardson: damping factor=1. > >>> maximum iterations=2 > >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >>> left preconditioning > >>> using nonzero initial guess > >>> using NONE norm type for convergence test > >>> PC Object: (mg_levels_1_) 8 MPI processes > >>> type: sor > >>> SOR: type = local_symmetric, iterations = 1, local iterations = > >>> 1, omega = 1. > >>> linear system matrix = precond matrix: > >>> Mat Object: 8 MPI processes > >>> type: mpiaij > >>> rows=67087, cols=67087 > >>> total: nonzeros=9722749, allocated nonzeros=9722749 > >>> total number of mallocs used during MatSetValues calls =0 > >>> not using I-node (on process 0) routines > >>> Up solver (post-smoother) same as down solver (pre-smoother) > >>> Down solver (pre-smoother) on level 2 ------------------------------ > - > >>> KSP Object: (mg_levels_2_) 8 MPI processes > >>> type: richardson > >>> Richardson: damping factor=1. > >>> maximum iterations=2 > >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >>> left preconditioning > >>> using nonzero initial guess > >>> using NONE norm type for convergence test > >>> PC Object: (mg_levels_2_) 8 MPI processes > >>> type: sor > >>> SOR: type = local_symmetric, iterations = 1, local iterations = > >>> 1, omega = 1. > >>> linear system matrix = precond matrix: > >>> Mat Object: 8 MPI processes > >>> type: mpiaij > >>> rows=1745224, cols=1745224 > >>> total: nonzeros=99452608, allocated nonzeros=99452608 > >>> total number of mallocs used during MatSetValues calls =0 > >>> using I-node (on process 0) routines: found 126690 nodes, > >>> limit used is 5 > >>> Up solver (post-smoother) same as down solver (pre-smoother) > >>> linear system matrix = precond matrix: > >>> Mat Object: 8 MPI processes > >>> type: mpiaij > >>> rows=1745224, cols=1745224 > >>> total: nonzeros=99452608, allocated nonzeros=99452608 > >>> total number of mallocs used during MatSetValues calls =0 > >>> using I-node (on process 0) routines: found 126690 nodes, limit > >>> used is 5 > >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Nov 8 19:40:42 2017 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 8 Nov 2017 20:40:42 -0500 Subject: [petsc-users] troubleshooting AMG, coupled navier stokes, large eigenvalues on coarsest level In-Reply-To: References: Message-ID: On Wed, Nov 8, 2017 at 8:02 PM, Mark Lohry wrote: > Do you mean the element Jacobian matrix has 35 "vertices"? You have 5x5 >> dense blocks (or at least you don't care about resolving any sparsity or >> you want to use block preconditioners. So your element Jacobians are >> [35x5]^2 dense(ish) matrices. This is not particularly useful for the >> solvers. > > > Yes, the local element jacobians are (35x5)^2 ( (N+1)(N+2)(N+3)/6 vertices > per Nth order element for 3D). They'd be 100% dense. So the real blocks of > the full system jacobian are of that size, and not 5x5. Block ILU is pretty > popular for these discretizations for obvious reasons. > OK, but don't set that blocks size. > > This is on the full NS problem I assume. You need special solvers for >> this. Scalable solvers for NS is not trivial. > > > Yes, compressible 3D NS. I'm aware scaling isn't trivial, but this isn't a > scaling problem, I'm obviously computing a wrong preconditioner since it's > actively hurting the linear solve. > By scaling I mean not a direct solver. > > Yes, you should use Gauss-Seidel for non-symmetric systems. Or indefinite >> systems > > > Thanks! > sor > > You want to use the Schur complement PCs for NS. We have some support for >> PC where you give us a mass matrix. > > > I'll look into it, but I'm not aware of Schur complement being used for > compressible/coupled NS, only for incompressible/segregated. > Really? OK, you've looked into it more than me. This is DG where the mass matrix is a block diagonal constant you typically > just invert once at the beginning; is there a PC where this can be > exploited? > Don't know but being able to explicitly invert this might let you compute an exact Schur complement matrix and then use algebraic solvers (like LU and AMG). > > > On Wed, Nov 8, 2017 at 7:04 PM, Mark Adams wrote: > >> >> >> On Tue, Nov 7, 2017 at 3:00 PM, Mark Lohry wrote: >> >>> I've now got gamg running on matrix-free newton-krylov with the jacobian >>> provided by coloring finite differences (thanks again for the help). 3D >>> Poisson with 4th order DG or higher (35^2 blocks), gamg with default >>> settings is giving textbook convergence, which is great. Of course coupled >>> compressible navier-stokes is harder, and convergence is bad-to-nonexistent. >>> >>> >>> 1) Doc says "Equations must be ordered in ?vertex-major? ordering"; in >>> my discretization, each "node" has 5 coupled degrees of freedom (density, 3 >>> x momentum, energy). I'm ordering my unknowns: >>> >>> rho_i, rhou_i, rhov_i, rhow_i, Et_i, rho_i+1, rhou_i+1, ... e.g. >>> row-major matrix order if you wrote the unknowns [{rho}, {rhou}, ... ]. >>> >>> and then setting block size to 5. Is that correct? >>> >> >> yes >> >> >>> I've also tried using the actual sparsity of the matrix >>> >> >> Sorry, but using for what? >> >> >>> which has larger dense blocks (e.g. [35x5]^2), but neither seemed to >>> help. >>> >> >> Do you mean the element Jacobian matrix has 35 "vertices"? You have 5x5 >> dense blocks (or at least you don't care about resolving any sparsity or >> you want to use block preconditioners. So your element Jacobians are >> [35x5]^2 dense(ish) matrices. This is not particularly useful for the >> solvers. >> >> >>> >>> >>> 2) With default settings, and with -pc_gamg_square_graph, >>> pc_gamg_sym_graph, agg_nsmooths 0 mentioned in the manual, the eigenvalue >>> estimates explode on the coarsest level, which I don't see with poisson: >>> >>> Down solver (pre-smoother) on level 1 ------------------------------- >>> KSP Object: (mg_levels_1_) 32 MPI processes >>> type: chebyshev >>> eigenvalue estimates used: min = 0.18994, max = 2.08935 >>> eigenvalues estimate via gmres min 0.00933256, max 1.8994 >>> Down solver (pre-smoother) on level 2 ------------------------------- >>> KSP Object: (mg_levels_2_) 32 MPI processes >>> type: chebyshev >>> eigenvalue estimates used: min = 0.165969, max = 1.82566 >>> eigenvalues estimate via gmres min 0.0290728, max 1.65969 >>> Down solver (pre-smoother) on level 3 ------------------------------- >>> KSP Object: (mg_levels_3_) 32 MPI processes >>> type: chebyshev >>> eigenvalue estimates used: min = 0.146479, max = 1.61126 >>> eigenvalues estimate via gmres min 0.204673, max 1.46479 >>> Down solver (pre-smoother) on level 4 ------------------------------- >>> KSP Object: (mg_levels_4_) 32 MPI processes >>> type: chebyshev >>> eigenvalue estimates used: min = 6.81977e+09, max = 7.50175e+10 >>> eigenvalues estimate via gmres min -2.76436e+12, max 6.81977e+10 >>> >>> What's happening here? (Full -ksp_view below) >>> >> >> This is on the full NS problem I assume. You need special solvers for >> this. Scalable solvers for NS is not trivial. >> >> >>> >>> 3) I'm not very familiar with chebyshev smoothers, but they're only for >>> SPD systems (?). >>> >> >> Yes, you should use Gauss-Seidel for non-symmetric systems. Or indefinite >> systems >> >> >>> Is this an inappropriate smoother for this problem? >>> >>> 4) With gmres, the preconditioned residual is ~10 orders larger than the >>> true residual; and the preconditioned residual drops while the true >>> residual rises. I'm assuming this means something very wrong? >>> >> >> Yes, your preconditioner is no good. >> >> >>> >>> 5) -pc_type hyper -pc_hypre_type boomeramg also works perfectly for the >>> poisson case, but hits NaN on the first cycle for NS. >>> >>> >>> >> You want to use the Schur complement PCs for NS. We have some support for >> PC where you give us a mass matrix. These are motivated by assuming there >> is a "commutation" (that is not true). These were developed by, among >> others, Andy Wathan, if you want to do a literature search. You will >> probably have to read the literature to understand the options and issues >> for your problem. Look at the PETSc manual and you should find a >> description of PETSc support for these PCs and some discussion of them and >> references. This should get you started. >> >> Mark >> >> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Nov 8 20:42:34 2017 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 8 Nov 2017 21:42:34 -0500 Subject: [petsc-users] troubleshooting AMG, coupled navier stokes, large eigenvalues on coarsest level In-Reply-To: References: Message-ID: > > >>> You want to use the Schur complement PCs for NS. We have some support >>> for PC where you give us a mass matrix. >> >> >> I'll look into it, but I'm not aware of Schur complement being used for >> compressible/coupled NS, only for incompressible/segregated. >> > > > Really? OK, you've looked into it more than me. > Correct me if I'm wrong, I haven't looked into it much at all. I just haven't encountered Schur complement in CFD outside of segregated SIMPLE-type algorithms (like in the ex70/PCFIELDSPLIT example.) Those segregated pressure-correction methods tend to get progressively worse as you become less elliptic / non-zero Mach. sor Makes sense, explicit RK methods are pretty decent smoothers for the nonlinear geometric multigrid approach. Thanks for the help, hopefully at least using proper smoothers in GAMG gets me a non-wrong preconditioner. On Wed, Nov 8, 2017 at 8:40 PM, Mark Adams wrote: > > > On Wed, Nov 8, 2017 at 8:02 PM, Mark Lohry wrote: > >> Do you mean the element Jacobian matrix has 35 "vertices"? You have 5x5 >>> dense blocks (or at least you don't care about resolving any sparsity or >>> you want to use block preconditioners. So your element Jacobians are >>> [35x5]^2 dense(ish) matrices. This is not particularly useful for the >>> solvers. >> >> >> Yes, the local element jacobians are (35x5)^2 ( (N+1)(N+2)(N+3)/6 >> vertices per Nth order element for 3D). They'd be 100% dense. So the real >> blocks of the full system jacobian are of that size, and not 5x5. Block ILU >> is pretty popular for these discretizations for obvious reasons. >> > > OK, but don't set that blocks size. > > >> >> This is on the full NS problem I assume. You need special solvers for >>> this. Scalable solvers for NS is not trivial. >> >> >> Yes, compressible 3D NS. I'm aware scaling isn't trivial, but this isn't >> a scaling problem, I'm obviously computing a wrong preconditioner since >> it's actively hurting the linear solve. >> > > By scaling I mean not a direct solver. > > >> >> Yes, you should use Gauss-Seidel for non-symmetric systems. Or indefinite >>> systems >> >> >> Thanks! >> > > sor > > >> >> You want to use the Schur complement PCs for NS. We have some support >>> for PC where you give us a mass matrix. >> >> >> I'll look into it, but I'm not aware of Schur complement being used for >> compressible/coupled NS, only for incompressible/segregated. >> > > > Really? OK, you've looked into it more than me. > > This is DG where the mass matrix is a block diagonal constant you >> typically just invert once at the beginning; is there a PC where this can >> be exploited? >> > > Don't know but being able to explicitly invert this might let you compute > an exact Schur complement matrix and then use algebraic solvers (like LU > and AMG). > > >> >> >> On Wed, Nov 8, 2017 at 7:04 PM, Mark Adams wrote: >> >>> >>> >>> On Tue, Nov 7, 2017 at 3:00 PM, Mark Lohry wrote: >>> >>>> I've now got gamg running on matrix-free newton-krylov with the >>>> jacobian provided by coloring finite differences (thanks again for the >>>> help). 3D Poisson with 4th order DG or higher (35^2 blocks), gamg with >>>> default settings is giving textbook convergence, which is great. Of course >>>> coupled compressible navier-stokes is harder, and convergence is >>>> bad-to-nonexistent. >>>> >>>> >>>> 1) Doc says "Equations must be ordered in ?vertex-major? ordering"; in >>>> my discretization, each "node" has 5 coupled degrees of freedom (density, 3 >>>> x momentum, energy). I'm ordering my unknowns: >>>> >>>> rho_i, rhou_i, rhov_i, rhow_i, Et_i, rho_i+1, rhou_i+1, ... e.g. >>>> row-major matrix order if you wrote the unknowns [{rho}, {rhou}, ... ]. >>>> >>>> and then setting block size to 5. Is that correct? >>>> >>> >>> yes >>> >>> >>>> I've also tried using the actual sparsity of the matrix >>>> >>> >>> Sorry, but using for what? >>> >>> >>>> which has larger dense blocks (e.g. [35x5]^2), but neither seemed to >>>> help. >>>> >>> >>> Do you mean the element Jacobian matrix has 35 "vertices"? You have 5x5 >>> dense blocks (or at least you don't care about resolving any sparsity or >>> you want to use block preconditioners. So your element Jacobians are >>> [35x5]^2 dense(ish) matrices. This is not particularly useful for the >>> solvers. >>> >>> >>>> >>>> >>>> 2) With default settings, and with -pc_gamg_square_graph, >>>> pc_gamg_sym_graph, agg_nsmooths 0 mentioned in the manual, the eigenvalue >>>> estimates explode on the coarsest level, which I don't see with poisson: >>>> >>>> Down solver (pre-smoother) on level 1 ------------------------------- >>>> KSP Object: (mg_levels_1_) 32 MPI processes >>>> type: chebyshev >>>> eigenvalue estimates used: min = 0.18994, max = 2.08935 >>>> eigenvalues estimate via gmres min 0.00933256, max 1.8994 >>>> Down solver (pre-smoother) on level 2 ------------------------------- >>>> KSP Object: (mg_levels_2_) 32 MPI processes >>>> type: chebyshev >>>> eigenvalue estimates used: min = 0.165969, max = 1.82566 >>>> eigenvalues estimate via gmres min 0.0290728, max 1.65969 >>>> Down solver (pre-smoother) on level 3 ------------------------------- >>>> KSP Object: (mg_levels_3_) 32 MPI processes >>>> type: chebyshev >>>> eigenvalue estimates used: min = 0.146479, max = 1.61126 >>>> eigenvalues estimate via gmres min 0.204673, max 1.46479 >>>> Down solver (pre-smoother) on level 4 ------------------------------- >>>> KSP Object: (mg_levels_4_) 32 MPI processes >>>> type: chebyshev >>>> eigenvalue estimates used: min = 6.81977e+09, max = 7.50175e+10 >>>> eigenvalues estimate via gmres min -2.76436e+12, max 6.81977e+10 >>>> >>>> What's happening here? (Full -ksp_view below) >>>> >>> >>> This is on the full NS problem I assume. You need special solvers for >>> this. Scalable solvers for NS is not trivial. >>> >>> >>>> >>>> 3) I'm not very familiar with chebyshev smoothers, but they're only for >>>> SPD systems (?). >>>> >>> >>> Yes, you should use Gauss-Seidel for non-symmetric systems. Or >>> indefinite systems >>> >>> >>>> Is this an inappropriate smoother for this problem? >>>> >>>> 4) With gmres, the preconditioned residual is ~10 orders larger than >>>> the true residual; and the preconditioned residual drops while the true >>>> residual rises. I'm assuming this means something very wrong? >>>> >>> >>> Yes, your preconditioner is no good. >>> >>> >>>> >>>> 5) -pc_type hyper -pc_hypre_type boomeramg also works perfectly for the >>>> poisson case, but hits NaN on the first cycle for NS. >>>> >>>> >>>> >>> You want to use the Schur complement PCs for NS. We have some support >>> for PC where you give us a mass matrix. These are motivated by assuming >>> there is a "commutation" (that is not true). These were developed by, among >>> others, Andy Wathan, if you want to do a literature search. You will >>> probably have to read the literature to understand the options and issues >>> for your problem. Look at the PETSc manual and you should find a >>> description of PETSc support for these PCs and some discussion of them and >>> references. This should get you started. >>> >>> Mark >>> >>> >>> >>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Nov 8 22:12:23 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 08 Nov 2017 21:12:23 -0700 Subject: [petsc-users] troubleshooting AMG, coupled navier stokes, large eigenvalues on coarsest level In-Reply-To: References: Message-ID: <87tvy4jdjs.fsf@jedbrown.org> Mark Lohry writes: >> >> >>>> You want to use the Schur complement PCs for NS. We have some support >>>> for PC where you give us a mass matrix. >>> >>> >>> I'll look into it, but I'm not aware of Schur complement being used for >>> compressible/coupled NS, only for incompressible/segregated. >>> >> >> >> Really? OK, you've looked into it more than me. >> > > Correct me if I'm wrong, I haven't looked into it much at all. I just > haven't encountered Schur complement in CFD outside of segregated > SIMPLE-type algorithms (like in the ex70/PCFIELDSPLIT example.) Those > segregated pressure-correction methods tend to get progressively worse as > you become less elliptic / non-zero Mach. There are more efficient split methods than SIMPLE, particularly for steady state or large time steps, but they are still low Mach solvers. Are you doing steady state or transient solves? RANS? There aren't good AMG solvers available for this kind of system; not for lack of trying but it remains a worthy area of research. From mlohry at gmail.com Wed Nov 8 23:04:06 2017 From: mlohry at gmail.com (Mark Lohry) Date: Thu, 9 Nov 2017 00:04:06 -0500 Subject: [petsc-users] troubleshooting AMG, coupled navier stokes, large eigenvalues on coarsest level In-Reply-To: <87tvy4jdjs.fsf@jedbrown.org> References: <87tvy4jdjs.fsf@jedbrown.org> Message-ID: > > There are more efficient split methods than SIMPLE, particularly for > steady state or large time steps, but they are still low Mach solvers. I only mention SIMPLE as a canonical segregated/pressure-based method; segregated methods aren't really my area. Are you doing steady state or transient solves? RANS? Varies -- I wrote a high order unstructured parallel DG code for coupled compressible NS that does fantastic with explicit time DNS, and I'm trying to get any kind of good implicit time or steady state solutions to attack the majority of problems where explicit is useless. Basically anything in the AIAA high order workshops is of interest to me. RANS, LES, and steady state are all of interest; I think steady state solutions in this area are probably doomed to use a pseudo-transient method to get there anyway so it's the same problem. I don't know if anyone in high order cfd has actually gotten a purely steady state formulation to converge on something nontrivial. There aren't good AMG solvers available for this kind of system; not for > lack of > trying but it remains a worthy area of research. Hence me using petsc as a solver library for research =), it's the best option for trying various kinds of algebraic solvers (seriously you devs have done fine work). Relatively generic AMG solvers have had massive success in the same problems for FV discretizations, so it seems hopeful they could extend to DG. I recall you working in this area, so any suggestions welcome. On Wed, Nov 8, 2017 at 11:12 PM, Jed Brown wrote: > Mark Lohry writes: > > >> > >> > >>>> You want to use the Schur complement PCs for NS. We have some support > >>>> for PC where you give us a mass matrix. > >>> > >>> > >>> I'll look into it, but I'm not aware of Schur complement being used for > >>> compressible/coupled NS, only for incompressible/segregated. > >>> > >> > >> > >> Really? OK, you've looked into it more than me. > >> > > > > Correct me if I'm wrong, I haven't looked into it much at all. I just > > haven't encountered Schur complement in CFD outside of segregated > > SIMPLE-type algorithms (like in the ex70/PCFIELDSPLIT example.) Those > > segregated pressure-correction methods tend to get progressively worse as > > you become less elliptic / non-zero Mach. > > There are more efficient split methods than SIMPLE, particularly for > steady state or large time steps, but they are still low Mach solvers. > Are you doing steady state or transient solves? RANS? There aren't > good AMG solvers available for this kind of system; not for lack of > trying but it remains a worthy area of research. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Wed Nov 8 23:14:04 2017 From: zakaryah at gmail.com (zakaryah .) Date: Thu, 9 Nov 2017 00:14:04 -0500 Subject: [petsc-users] Newton LS - better results on single processor Message-ID: Well the saga of my problem continues. As I described previously in an epic thread, I'm using the SNES to solve problems involving an elastic material on a rectangular grid, subjected to external forces. In any case, I'm occasionally getting poor convergence using Newton's method with line search. In troubleshooting by visualizing the residual, I saw that in data sets which had good convergence, the residual was nevertheless significantly larger along the boundary between different processors. Likewise, in data sets with poor convergence, the residual became very large on the boundary between different processors. The residual is not significantly larger on the physical boundary, i.e. the global boundary. When I run on a single process, convergence seems to be good on all data sets. Any clues to fix this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at unito.it Thu Nov 9 02:08:19 2017 From: matteo.semplice at unito.it (Matteo Semplice) Date: Thu, 9 Nov 2017 09:08:19 +0100 Subject: [petsc-users] DMPlex distribution and loops over cells/faces for finite volumes Message-ID: Hi. I am using a DMPLex to store a grid (at present 2d symplicial but will need to work more in general), but after distributing it, I am finding it hard to organize my loops over the cells/faces. First things first: the mesh distribution. I do ? DMPlexSetAdjacencyUseCone(dm, PETSC_TRUE); ? DMPlexSetAdjacencyUseClosure(dm, PETSC_FALSE); ? DMPlexDistribute(dm, 1, NULL, &dmDist); The 1 in DMPlexDistribute is correct to give me 1 layer of ghost cells, right? Using 2 instead of 1 should add more internal ghosts, right? Actually, this errors out with petsc3.7.7 from debian/stable... If you deem it worth, I will test again with petsc3.8. Secondly, looping over cells. I do ? DMPlexGetHeightStratum(dm, 0, &cStart, &cEnd); ? DMPlexGetHybridBounds(dm, &cPhysical, NULL, NULL, NULL); ? for (int c = cStart; c < cEnd; ++c) { ????? if ((cPhysical>=0)&&(c>=cPhysical)) ??????? {code for ghost cells outside physical domain boundary} ????? else if ( ??? ) ??????? {code for ghost cells} ????? else ??????? {code for cells owned by this process} ? } ? What should I use as a test for ghost cells outside processor boundary? I have seen that (on a 2d symplicial mesh) reading the ghost label with DMGetLabelValue(dm, "ghost", c, &ghost) gives the value -1 for inner cells and domain boundary ghosts but 2 for processor boundary ghosts. Is this the correct test? What is the meaning of ghost=2? In general, should I test for any value >=0? Lastly, since I will do finite volumes, I'd like to loop over the faces to compute fluxes. DMPlexGetHeightStratum(dm, 1, &fStart, &fEnd) gives me the start/endpoints for faces, but they include also faces between ghost cells that are therefore external to the cells owned by the processor. I see that for faces on processor boundary, ghost=-1 on one process and ghost=2 on the other process, which I like. However ghost=-1 also on faces between ghosts cells and therefore to compute fluxes I should test for the ghost value of the face and for the ghost value of the cells on both sides... Is there a better way to loop over faces? Thanks in advance for any suggestion. ??? Matteo From lukas.drinkt.thee at gmail.com Thu Nov 9 05:31:12 2017 From: lukas.drinkt.thee at gmail.com (Lukas van de Wiel) Date: Thu, 9 Nov 2017 12:31:12 +0100 Subject: [petsc-users] linking PETSC with SSL In-Reply-To: References: Message-ID: Hi Satish, I am working my way through linking our real code to PETSc 3.8.1. The improvement for integration with Fortran is great. use petscksp makes a world of difference. With compliments and thanks! Lukas On 11/8/17, Satish Balay wrote: > Glad it works! Thanks for the update. > > Satish > > On Wed, 8 Nov 2017, Lukas van de Wiel wrote: > >> Ah, never mind, It works in 3.8.1! >> Had to change my makefile also, of course... :-\ >> >> Thanks and have a great day! >> >> Lukas >> >> >> On 11/8/17, Lukas van de Wiel wrote: >> > Hi Satish, >> > >> > thank you for your quick reply! >> > >> > I have installed 3.8.1, but the problem persists. >> > I have attached configure.log and test.log. >> > >> > Does that reveal anything? >> > >> > Cheers and thanks again >> > >> > Lukas >> > >> > On 11/8/17, Satish Balay wrote: >> >> Hm --with-ssl=0 should work. Can you do a fresh/clean build and see if >> >> the >> >> problem persists? >> >> >> >> If so send configure.log make.log and test.log for this build. >> >> >> >> BTW: 3.8 has better support for fortran usage - so you might consider >> >> upgrading all the way to it. >> >> >> >> https://www.mcs.anl.gov/petsc/documentation/changes/38.html >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html#UsingFortran >> >> >> >> Satish >> >> >> >> On Wed, 8 Nov 2017, Lukas van de Wiel wrote: >> >> >> >>> Good day, >> >>> >> >>> during an upgrade of PETSc (from 3.4.2 to 3.7.7) and some changes >> >>> needed to use MUMPs I ran into a problem with SSL that I could not >> >>> really figure out. >> >>> >> >>> ******************************************************** >> >>> I have configured PETSc 3.7.7 using: (explicitly note the option >> >>> --with-ssl=0 ) >> >>> >> >>> >> >>> ./configure \ >> >>> COPTFLAGS='-O3 -march=native -mtune=native' \ >> >>> CXXOPTFLAGS='-O3 -march=native -mtune=native' \ >> >>> FOPTFLAGS='-O3 -march=native -mtune=native' \ >> >>> --with-debugging=0 \ >> >>> --with-x=0 \ >> >>> --with-ssl=0 \ >> >>> --with-shared-libraries=0 \ >> >>> --download-metis \ >> >>> --download-parmetis \ >> >>> --download-fblaslapack \ >> >>> --download-scalapack \ >> >>> --download-openmpi \ >> >>> --download-mumps \ >> >>> --download-hypre \ >> >>> --download-ptscotch >> >>> >> >>> >> >>> ******************************************************** >> >>> Next I have a tiny Fortran program to illustrate the problem: >> >>> >> >>> >> >>> program sslhuh >> >>> implicit none >> >>> #include "petsc/finclude/petscsys.h" >> >>> write(*,*) "Hello World" >> >>> end program >> >>> >> >>> >> >>> >> >>> ******************************************************** >> >>> I compile it using makefile: >> >>> >> >>> >> >>> petscDir = /net/home/gtecton/sw_dev/petsc-3.7.7 >> >>> >> >>> all: compile link >> >>> >> >>> compile: sslhuh.F >> >>> $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ >> >>> -c \ >> >>> -o sslhuh.o \ >> >>> -std=f2008 \ >> >>> -ffree-form \ >> >>> -I$(petscDir)/include \ >> >>> -I$(petscDir)/include/petsc/finclude \ >> >>> sslhuh.F >> >>> >> >>> link: sslhuh.o >> >>> $(petscDir)/linux-gnu-x86_64/bin/mpif90 \ >> >>> -o sslhuh \ >> >>> sslhuh.o \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libpetsc.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libHYPRE.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libzmumps.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libsmumps.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libdmumps.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libcmumps.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libmumps_common.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libesmumps.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libparmetis.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libmetis.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libpord.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libscalapack.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libflapack.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libfblas.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotch.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotcherr.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotcherrexit.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libptscotchparmetis.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libscotch.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libscotcherr.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libscotcherrexit.a \ >> >>> $(petscDir)/linux-gnu-x86_64/lib/libscotchmetis.a \ >> >>> -ldl >> >>> >> >>> >> >>> ******************************************************** >> >>> Compiling works fine. >> >>> Linking gives me the error: >> >>> >> >>> >> >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >> >>> In function `PetscSSLInitializeContext': >> >>> client.c:(.text+0x3b): undefined reference to `SSLv23_method' >> >>> client.c:(.text+0x43): undefined reference to `SSL_CTX_new' >> >>> client.c:(.text+0x5a): undefined reference to `SSL_CTX_ctrl' >> >>> client.c:(.text+0x79): undefined reference to `SSL_library_init' >> >>> client.c:(.text+0x7e): undefined reference to >> >>> `SSL_load_error_strings' >> >>> client.c:(.text+0x8c): undefined reference to `BIO_new_fp' >> >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >> >>> In function `PetscSSLDestroyContext': >> >>> client.c:(.text+0xa5): undefined reference to `SSL_CTX_free' >> >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >> >>> In function `PetscHTTPSRequest': >> >>> client.c:(.text+0xa9e): undefined reference to `SSL_write' >> >>> client.c:(.text+0xaab): undefined reference to `SSL_get_error' >> >>> client.c:(.text+0xb40): undefined reference to `SSL_read' >> >>> client.c:(.text+0xb50): undefined reference to `SSL_get_error' >> >>> client.c:(.text+0xbde): undefined reference to `SSL_free' >> >>> client.c:(.text+0xc74): undefined reference to `SSL_shutdown' >> >>> /net/home/gtecton/sw_dev/petsc-3.7.7/linux-gnu-x86_64/lib/libpetsc.a(client.o): >> >>> In function `PetscHTTPSConnect': >> >>> client.c:(.text+0x105c): undefined reference to `SSL_new' >> >>> client.c:(.text+0x106a): undefined reference to `BIO_new_socket' >> >>> client.c:(.text+0x1079): undefined reference to `SSL_set_bio' >> >>> client.c:(.text+0x1082): undefined reference to `SSL_connect' >> >>> collect2: ld returned 1 exit status >> >>> make: *** [link] Error 1 >> >>> >> >>> >> >>> ****************************************************** >> >>> >> >>> From what I see, the SSL functionality is in client.o, which is part >> >>> of libpetsc.a, which is included in the linking. >> >>> >> >>> There is no difference in whether I turn --with-ssl on or off... >> >>> >> >>> Does anybody have an idea what I am doing wrong? >> >>> >> >>> Best wishes and thank you for your time! >> >>> >> >>> Lukas >> >>> >> >> >> >> >> > >> > > From knepley at gmail.com Thu Nov 9 07:44:44 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 9 Nov 2017 08:44:44 -0500 Subject: [petsc-users] Newton LS - better results on single processor In-Reply-To: References: Message-ID: On Thu, Nov 9, 2017 at 12:14 AM, zakaryah . wrote: > Well the saga of my problem continues. As I described previously in an > epic thread, I'm using the SNES to solve problems involving an elastic > material on a rectangular grid, subjected to external forces. In any case, > I'm occasionally getting poor convergence using Newton's method with line > search. In troubleshooting by visualizing the residual, I saw that in data > sets which had good convergence, the residual was nevertheless > significantly larger along the boundary between different processors. > Likewise, in data sets with poor convergence, the residual became very > large on the boundary between different processors. The residual is not > significantly larger on the physical boundary, i.e. the global boundary. > When I run on a single process, convergence seems to be good on all data > sets. > > Any clues to fix this? > It sounds like something is wrong with communication across domains: - If this is FEM, it sounds like you are not adding contributions from the other domain to shared vertices/edges/faces - If this is FDM/FVM, maybe the ghosts are not updated What DM are you using? Are you using the Local assembly functions (FormFunctionLocal), or just FormFunction()? Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From florian.kauer at koalo.de Thu Nov 9 08:13:51 2017 From: florian.kauer at koalo.de (Florian Kauer) Date: Thu, 9 Nov 2017 15:13:51 +0100 Subject: [petsc-users] SNES without SNESSetJacobian, snes_fd or snes_mf Message-ID: <73fe939b-e1e1-67a0-5e1f-224902e975e7@koalo.de> Hi, what is the SNES solver actually doing when you do not provide a jacobian and not explicitly select either finite differencing approximation or matrix-free Newton-Krylov method? I just noticed that I mistakenly did this, but a good solution is found anyway (and fast). So what is actually happening? Simple fixed-point iteration? Greetings, Florian From mfadams at lbl.gov Thu Nov 9 08:46:25 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 9 Nov 2017 09:46:25 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: OK, well, just go with the Linux machine for the regression test. I will keep trying to reproduce this on my Mac with an O build. On Wed, Nov 8, 2017 at 12:24 PM, Hong wrote: > mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 > -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol > 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg > -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason > -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type > chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 > -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi > -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 > > ==30976== Invalid read of size 16 > ==30976== at 0x8550946: dswap_k_NEHALEM (in /usr/lib/openblas-base/ > libblas.so.3) > ==30976== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas.so.3) > ==30976== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3.0) > ==30976== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) > ==30976== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > ==30976== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > ==30976== by 0x41056E: main (ex56.c:395) > ==30976== Address 0xdc650d0 is 52,480 bytes inside a block of size 52,488 > alloc'd > ==30976== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_ > memcheck-amd64-linux.so) > ==30976== by 0x51590F6: PetscMallocAlign (mal.c:39) > ==30976== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) > ==30976== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > ==30976== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > ==30976== by 0x41056E: main (ex56.c:395) > > You can fix it on branch hzhang/fix-submat_samerowdist. > > Hong > > > On Wed, Nov 8, 2017 at 11:01 AM, Mark Adams wrote: > >> >> >> On Wed, Nov 8, 2017 at 11:09 AM, Hong wrote: >> >>> Mark: >>> >>>> Hong, is >>>> > 0-cells: 12 12 0 0 >>>> > 1-cells: 20 20 0 0 >>>> > 2-cells: 11 11 0 0 >>>> > 3-cells: 2 2 0 0 >>>> >>>> from the old version? >>>> >>> In O-build on my macPro, I get the above. In g-build, I get >>> >>>> >>>> 0-cells: 8 8 8 8 >>>> 1-cells: 12 12 12 12 >>>> 2-cells: 6 6 6 6 >>>> 3-cells: 1 1 1 1 >>>> >>> I get this on linux machine. >>> Do you know why? >>> >> >> I can not reproduce your O output. I will look at it later. >> >> Valgrind is failing on me right now. I will look into it but can you >> valgrind it? >> >> >>> >>> Hong >>> >>>> >>>> On Tue, Nov 7, 2017 at 10:13 PM, Hong wrote: >>>> >>>>> Mark: >>>>> I removed option '-ex56_dm_view'. >>>>> Hong >>>>> >>>>> Humm, this looks a little odd, but it may be OK. Is this this diffing >>>>>> with the old non-repartition data? (more below) >>>>>> >>>>>> On Tue, Nov 7, 2017 at 11:45 AM, Hong wrote: >>>>>> >>>>>>> Mark, >>>>>>> The fix is merged to next branch for tests which show diff as >>>>>>> >>>>>>> ******* Testing: testexamples_PARMETIS ******* >>>>>>> 5c5 >>>>>>> < 1 SNES Function norm 1.983e-10 >>>>>>> --- >>>>>>> > 1 SNES Function norm 1.990e-10 >>>>>>> 10,13c10,13 >>>>>>> < 0-cells: 8 8 8 8 >>>>>>> < 1-cells: 12 12 12 12 >>>>>>> < 2-cells: 6 6 6 6 >>>>>>> < 3-cells: 1 1 1 1 >>>>>>> >>>>>>> >>>>>> I assume this is the old. >>>>>> >>>>>> >>>>>>> --- >>>>>>> > 0-cells: 12 12 0 0 >>>>>>> > 1-cells: 20 20 0 0 >>>>>>> > 2-cells: 11 11 0 0 >>>>>>> > 3-cells: 2 2 0 0 >>>>>>> 15,18c15,18 >>>>>>> >>>>>>> >>>>>> and this is the new. >>>>>> >>>>>> This is funny because the processors are not fully populated. This >>>>>> can happen on coarse grids and indeed it should happen in a test with good >>>>>> coverage. >>>>>> >>>>>> I assume these diffs are views from coarse grids? That is, in the raw >>>>>> output files do you see fully populated fine grids, with no diffs, and then >>>>>> the diffs come on coarse grids. >>>>>> >>>>>> Repartitioning the coarse grids can change the coarsening, It is >>>>>> possible that repartitioning causes faster coarsening (it does a little) >>>>>> and this faster coarsening is tripping the aggregation switch, which gives >>>>>> us empty processors. >>>>>> >>>>>> Am I understanding this correctly ... >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>>> >>>>>>> < boundary: 1 strata with value/size (1 (23)) >>>>>>> < Face Sets: 4 strata with value/size (1 (1), 2 (1), 4 (1), 6 (1)) >>>>>>> < marker: 1 strata with value/size (1 (15)) >>>>>>> < depth: 4 strata with value/size (0 (8), 1 (12), 2 (6), 3 (1)) >>>>>>> --- >>>>>>> > boundary: 1 strata with value/size (1 (39)) >>>>>>> > Face Sets: 5 strata with value/size (1 (2), 2 (2), 4 (2), 5 (1), 6 (1)) >>>>>>> > marker: 1 strata with value/size (1 (27)) >>>>>>> > depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>>>>> >>>>>>> see http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2017/11/07/examples_full_next-tmp.log >>>>>>> >>>>>>> I guess parmetis produces random partition on different machines (I made output file for ex56_1 on my imac). Please take a look at the differences. If the outputs are correct, I will remove option '-ex56_dm_view' >>>>>>> >>>>>>> Hong >>>>>>> >>>>>>> >>>>>>> On Sun, Nov 5, 2017 at 9:03 PM, Hong wrote: >>>>>>> >>>>>>>> Mark: >>>>>>>> Bug is fixed in branch hzhang/fix-submat_samerowdist >>>>>>>> https://bitbucket.org/petsc/petsc/branch/hzhang/fix-submat_s >>>>>>>> amerowdist >>>>>>>> >>>>>>>> I also add the test runex56. Please test it and let me know if >>>>>>>> there is a problem. >>>>>>>> Hong >>>>>>>> >>>>>>>> Also, I have been using -petscpartition_type but now I see >>>>>>>>> -pc_gamg_mat_partitioning_type. Is -petscpartition_type >>>>>>>>> depreciated for GAMG? >>>>>>>>> >>>>>>>>> Is this some sort of auto generated portmanteau? I can not find >>>>>>>>> pc_gamg_mat_partitioning_type in the source. >>>>>>>>> >>>>>>>>> On Thu, Nov 2, 2017 at 6:44 PM, Mark Adams >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Great, thanks, >>>>>>>>>> >>>>>>>>>> And could you please add these parameters to a regression test? >>>>>>>>>> As I recall we have with-parmetis regression test. >>>>>>>>>> >>>>>>>>>> On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: >>>>>>>>>> >>>>>>>>>>> Mark: >>>>>>>>>>> I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( >>>>>>>>>>> Now testing src/snes/examples/tutorials/ex56.c with your >>>>>>>>>>> options, I can reproduce the error. >>>>>>>>>>> I'll fix it. >>>>>>>>>>> >>>>>>>>>>> Hong >>>>>>>>>>> >>>>>>>>>>> Hong, >>>>>>>>>>>> >>>>>>>>>>>> I've tested with master and I get the same error. Maybe the >>>>>>>>>>>> partitioning parameters are wrong. -pc_gamg_mat_partitioning_type is new to >>>>>>>>>>>> me. >>>>>>>>>>>> >>>>>>>>>>>> Can you run this (snes ex56) w/o the error? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ >>>>>>>>>>>> make runex >>>>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n >>>>>>>>>>>> 4 ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>>>> -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type parmetis >>>>>>>>>>>> -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 >>>>>>>>>>>> -pc_gamg_repartition true >>>>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>>>> >>>>>>>>>>>> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 >>>>>>>>>>>> (8)) >>>>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>>> [0]PETSC ERROR: No support for this operation for this object >>>>>>>>>>>> type >>>>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Nov 2, 2017 at 1:35 PM, Hong >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Mark : >>>>>>>>>>>>> I realize that using maint or master branch, I cannot >>>>>>>>>>>>> reproduce the same error. >>>>>>>>>>>>> For this example, you must use a parallel partitioner, >>>>>>>>>>>>> e.g.,'current' gives me following error: >>>>>>>>>>>>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it >>>>>>>>>>>>> currently only supports one domain per processor >>>>>>>>>>>>> use -pc_gamg_mat_partitioning_type parmetis or chaco or >>>>>>>>>>>>> ptscotch for more than one subdomain per processor >>>>>>>>>>>>> >>>>>>>>>>>>> Please rebase your branch with maint or master, then see if >>>>>>>>>>>>> you still have problem. >>>>>>>>>>>>> >>>>>>>>>>>>> Hong >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Mark, >>>>>>>>>>>>>>> I can reproduce this in an old branch, but not in current >>>>>>>>>>>>>>> maint and master. >>>>>>>>>>>>>>> Which branch are you using to produce this error? >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am using a branch from Matt. Let me try to merge it with >>>>>>>>>>>>>> master. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am able to reproduce this with snes ex56 with 2 >>>>>>>>>>>>>>>> processors and adding -pc_gamg_repartition true >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm not sure how to fix it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>>>>>>>>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>>>>>>>>>>>> PETSC_DIR=/Users/markadams/Codes/petsc >>>>>>>>>>>>>>>> PETSC_ARCH=arch-macosx-gnu-g runex >>>>>>>>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec >>>>>>>>>>>>>>>> -n 2 ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it >>>>>>>>>>>>>>>> 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>>>>>>>> -mg_levels_pc_type jacobi -petscpartitioner_type simple -mat_block_size 3 >>>>>>>>>>>>>>>> -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition >>>>>>>>>>>>>>>> true >>>>>>>>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>>>>>>>> 1 KSP Residual norm 20.4696 >>>>>>>>>>>>>>>> 2 KSP Residual norm 3.95009 >>>>>>>>>>>>>>>> 3 KSP Residual norm 0.176181 >>>>>>>>>>>>>>>> 4 KSP Residual norm 0.0208781 >>>>>>>>>>>>>>>> 5 KSP Residual norm 0.00278873 >>>>>>>>>>>>>>>> 6 KSP Residual norm 0.000482741 >>>>>>>>>>>>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>>>>>>>>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>>>>>>>>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>>>>>>>>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>>>>>>>>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>>>>>>>>>>>> 12 KSP Residual norm 1.989e-10 >>>>>>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 12 >>>>>>>>>>>>>>>> 1 SNES Function norm 1.990e-10 >>>>>>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>>>>>> iterations 1 >>>>>>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>>>>>> type: plex >>>>>>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>>>>>> 0-cells: 12 12 >>>>>>>>>>>>>>>> 1-cells: 20 20 >>>>>>>>>>>>>>>> 2-cells: 11 11 >>>>>>>>>>>>>>>> 3-cells: 2 2 >>>>>>>>>>>>>>>> Labels: >>>>>>>>>>>>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 (2), >>>>>>>>>>>>>>>> 5 (1), 6 (1)) >>>>>>>>>>>>>>>> marker: 1 strata with value/size (1 (27)) >>>>>>>>>>>>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), >>>>>>>>>>>>>>>> 3 (2)) >>>>>>>>>>>>>>>> [0] 441 global equations, 147 vertices >>>>>>>>>>>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>>>>>>>>>>> 0 SNES Function norm 49.7106 >>>>>>>>>>>>>>>> 0 KSP Residual norm 49.7106 >>>>>>>>>>>>>>>> 1 KSP Residual norm 12.9252 >>>>>>>>>>>>>>>> 2 KSP Residual norm 2.38019 >>>>>>>>>>>>>>>> 3 KSP Residual norm 0.426307 >>>>>>>>>>>>>>>> 4 KSP Residual norm 0.0692155 >>>>>>>>>>>>>>>> 5 KSP Residual norm 0.0123092 >>>>>>>>>>>>>>>> 6 KSP Residual norm 0.00184874 >>>>>>>>>>>>>>>> 7 KSP Residual norm 0.000320761 >>>>>>>>>>>>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>>>>>>>>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>>>>>>>>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>>>>>>>>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>>>>>>>>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>>>>>>>>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>>>>>>>>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>>>>>>>>>>>> 15 KSP Residual norm 2.174e-10 >>>>>>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 15 >>>>>>>>>>>>>>>> 1 SNES Function norm 2.174e-10 >>>>>>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>>>>>> iterations 1 >>>>>>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>>>>>> type: plex >>>>>>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>>>>>> 0-cells: 45 45 >>>>>>>>>>>>>>>> 1-cells: 96 96 >>>>>>>>>>>>>>>> 2-cells: 68 68 >>>>>>>>>>>>>>>> 3-cells: 16 16 >>>>>>>>>>>>>>>> Labels: >>>>>>>>>>>>>>>> marker: 1 strata with value/size (1 (129)) >>>>>>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 >>>>>>>>>>>>>>>> (18), 5 (9), 6 (9)) >>>>>>>>>>>>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>>>>>>>>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), >>>>>>>>>>>>>>>> 3 (16)) >>>>>>>>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>> [0]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>> [1]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams >>>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>>>>>>>>>>>> rchurchi at pppl.gov> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Doing some additional testing, the issue goes away when >>>>>>>>>>>>>>>>>> removing the gamg preconditioner line from the petsc.rc: >>>>>>>>>>>>>>>>>> -pc_type gamg >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yea, this is GAMG setup. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This is the code. findices is create with ISCreateStride, >>>>>>>>>>>>>>>>> so it is sorted ... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Michael is repartitioning the coarse grids. Maybe we don't >>>>>>>>>>>>>>>>> have a regression test with this... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I will try to reproduce this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Michael: you can use hypre for now, or turn repartitioning >>>>>>>>>>>>>>>>> off (eg, -fsa_fieldsplit_lambda_upper_pc_gamg_repartition >>>>>>>>>>>>>>>>> false), but I'm not sure this will fix this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You don't have hypre parameters for all of your all of >>>>>>>>>>>>>>>>> your solvers. I think 'boomeramg' is the default pc_hypre_type. That should >>>>>>>>>>>>>>>>> be good enough for you. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> IS findices; >>>>>>>>>>>>>>>>> PetscInt Istart,Iend; >>>>>>>>>>>>>>>>> Mat Pnew; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>>>>>>>>>>>> &Iend);CHKERRQ(ierr); >>>>>>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>>>>>>>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>>>>>>>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>>>>>>>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>>>>>>>>>>>> ierr = MatCreateSubMatrix(Pold, findices, >>>>>>>>>>>>>>>>> new_eq_indices, MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>>>>>>>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>>>>>>>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /* output - repartitioned */ >>>>>>>>>>>>>>>>> *a_P_inout = Pnew; >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> OK, this might not be completely satisfactory, because >>>>>>>>>>>>>>>>>>>> it doesn't show the partitioning or how the matrix is created, but this >>>>>>>>>>>>>>>>>>>> reproduces the problem. I wrote out my matrix, Amat, from the larger >>>>>>>>>>>>>>>>>>>> simulation, and load it in this script. This must be run with MPI rank >>>>>>>>>>>>>>>>>>>> greater than 1. This may be some combination of my petsc.rc, because when I >>>>>>>>>>>>>>>>>>>> use the PetscInitialize with it, it throws the error, but when using >>>>>>>>>>>>>>>>>>>> default (PETSC_NULL_CHARACTER) it runs fine. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong < >>>>>>>>>>>>>>>>>>>> hzhang at mcs.anl.gov> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>>>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>>>>>>>>>>>> It would be helpful if you can provide us a simple >>>>>>>>>>>>>>>>>>>>> example that produces this example. >>>>>>>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'm running a Fortran code that was just changed over >>>>>>>>>>>>>>>>>>>>>> to using petsc 3.8 (previously petsc 3.7.6). An error was thrown during a >>>>>>>>>>>>>>>>>>>>>> KSPSetUp() call. The error is "unsorted iscol_local is not implemented yet" >>>>>>>>>>>>>>>>>>>>>> (see full error below). I tried to trace down the difference in the source >>>>>>>>>>>>>>>>>>>>>> files, but where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npets >>>>>>>>>>>>>>>>>>>>>> cloc,nreal,nreal,ierr) >>>>>>>>>>>>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_ >>>>>>>>>>>>>>>>>>>>>> NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for >>>>>>>>>>>>>>>>>>>>>> this object type >>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not >>>>>>>>>>>>>>>>>>>>>> implemented yet >>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, >>>>>>>>>>>>>>>>>>>>>> unknown[62]PETSC ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() >>>>>>>>>>>>>>>>>>>>>> line 3418 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line >>>>>>>>>>>>>>>>>>>>>> 3247 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>>>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 >>>>>>>>>>>>>>>>>>>>>> in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yann.jobic at univ-amu.fr Thu Nov 9 09:35:15 2017 From: yann.jobic at univ-amu.fr (Yann JOBIC) Date: Thu, 9 Nov 2017 16:35:15 +0100 Subject: [petsc-users] multi level octree AMR Message-ID: <0c4b331e-f56e-f376-053c-32cce996cd86@univ-amu.fr> Hi, I succeed in having a basic AMR running on my FE advection/diffusion problem ( https://mycore.core-cloud.net/index.php/s/DmiiTwKUpV9z5qL) I now want to have multiple levels in my octree AMR from p4est. I tried a lot with IS tagged arrays, but i didn't succeed so far. When i get from VecTaggerAbsoluteSetBox an IS array of the cells, it seems that the DM cell numbering is changed after multiple time iterations (from TSSolve), even if the DM is not changed at all. I tried to identify the cells to refine/coarse with ISDifference and ISExpand, from two different time solutions of the same DM. That made me wondering if i'm using the correct tool (IS array). Am i in the right direction ? Thanks, Regards, Yann From jed at jedbrown.org Thu Nov 9 09:38:15 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 09 Nov 2017 08:38:15 -0700 Subject: [petsc-users] SNES without SNESSetJacobian, snes_fd or snes_mf In-Reply-To: <73fe939b-e1e1-67a0-5e1f-224902e975e7@koalo.de> References: <73fe939b-e1e1-67a0-5e1f-224902e975e7@koalo.de> Message-ID: <87fu9njwd4.fsf@jedbrown.org> Florian Kauer writes: > Hi, > what is the SNES solver actually doing when you do not provide a > jacobian and not explicitly select either finite differencing > approximation or matrix-free Newton-Krylov method? > > I just noticed that I mistakenly did this, but a good solution is found > anyway (and fast). So what is actually happening? Simple fixed-point > iteration? Presumably you are using a DM that can provide a coloring, in which case the sparse Jacobian is assembled using finite differencing with coloring. From knepley at gmail.com Thu Nov 9 10:03:37 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 9 Nov 2017 11:03:37 -0500 Subject: [petsc-users] DMPlex distribution and loops over cells/faces for finite volumes In-Reply-To: References: Message-ID: On Thu, Nov 9, 2017 at 3:08 AM, Matteo Semplice wrote: > Hi. > I am using a DMPLex to store a grid (at present 2d symplicial but will > need to work more in general), but after distributing it, I am finding it > hard to organize my loops over the cells/faces. > > First things first: the mesh distribution. I do > > DMPlexSetAdjacencyUseCone(dm, PETSC_TRUE); > DMPlexSetAdjacencyUseClosure(dm, PETSC_FALSE); > This gives you adjacent cells, but not their boundary. Usually what you want for FV. > DMPlexDistribute(dm, 1, NULL, &dmDist); > > The 1 in DMPlexDistribute is correct to give me 1 layer of ghost cells, > right? > Yes. > Using 2 instead of 1 should add more internal ghosts, right? Actually, > this errors out with petsc3.7.7 from debian/stable... If you deem it worth, > I will test again with petsc3.8. > It should not error. If possible, send me the mesh please. I have some tests where 2 is successful. > Secondly, looping over cells. I do > > DMPlexGetHeightStratum(dm, 0, &cStart, &cEnd); > DMPlexGetHybridBounds(dm, &cPhysical, NULL, NULL, NULL); > for (int c = cStart; c < cEnd; ++c) { > if ((cPhysical>=0)&&(c>=cPhysical)) > {code for ghost cells outside physical domain boundary} > else if ( ??? ) > {code for ghost cells} > I am not sure why you want to draw this distinction, but you can check to see if the cell is present in the pointSF. If it is, then its not owned by the process. Here is the kind of FV loop I do (for example in TS ex11): https://bitbucket.org/petsc/petsc/src/f9fed41fd6c28d171f9c644f97b15591be9df7d6/src/snes/utils/dmplexsnes.c?at=master&fileviewer=file-view-default#dmplexsnes.c-1634 > else > {code for cells owned by this process} > } > > What should I use as a test for ghost cells outside processor boundary? > I have seen that (on a 2d symplicial mesh) reading the ghost label with > DMGetLabelValue(dm, "ghost", c, &ghost) gives the value -1 for inner cells > and domain boundary ghosts but 2 for processor boundary ghosts. Is this the > correct test? What is the meaning of ghost=2? In general, should I test for > any value >=0? > Yes, see the link I gave. > Lastly, since I will do finite volumes, I'd like to loop over the faces to > compute fluxes. > DMPlexGetHeightStratum(dm, 1, &fStart, &fEnd) > gives me the start/endpoints for faces, but they include also faces > between ghost cells that are therefore external to the cells owned by the > processor. I see that for faces on processor boundary, ghost=-1 on one > process and ghost=2 on the other process, which I like. However ghost=-1 > also on faces between ghosts cells and therefore to compute fluxes I should > test for the ghost value of the face and for the ghost value of the cells > on both sides... Is there a better way to loop over faces? > See the link. Thanks, Matt > Thanks in advance for any suggestion. > > Matteo > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Thu Nov 9 10:16:22 2017 From: zakaryah at gmail.com (zakaryah .) Date: Thu, 9 Nov 2017 11:16:22 -0500 Subject: [petsc-users] Newton LS - better results on single processor In-Reply-To: References: Message-ID: Thanks Stefano, I will try what you suggest. ?Matt - my DM is a composite between the redundant field (loading coefficient, which is included in the Newton solve in Riks' method) and the displacements, which are represented by a 3D DA with 3 dof. I am using finite difference. Probably my problem comes from confusion over how the composite DM is organized. I am using FormFunction()?, and within that I call DMCompositeGetLocalVectors(), DMCompositeScatter(), DMDAVecGetArray(), and for the Jacobian, DMCompositeGetLocalISs() and MatGetLocalSubmatrix() to split J into Jbb, Jbh, Jhb, and Jhh, where b is the loading coefficient, and h is the displacements). The values of each submatrix are set using MatSetValuesLocal(). ?I'm most suspicious of the part of the Jacobian routine where I calculate the rows of Jhb, the columns of Jbh, and the corresponding values. I take the DA coordinates and ix,iy,iz, then calculate the row of Jhb as ((((iz-info->gzs)*info->gym + (iy-info->gys))*info->gxm + (ix-info->gxs))*info->dof+c), where info is the DA local info and c is the degree of freedom. The same calculation is performed for the column of Jbh. I suspect that the indexing of the DA vector is not so simple, but I don't know for a fact that I'm doing this incorrectly nor how to do this properly. ?Thanks for all the help!? On Nov 9, 2017 8:44 AM, "Matthew Knepley" wrote: On Thu, Nov 9, 2017 at 12:14 AM, zakaryah . wrote: > Well the saga of my problem continues. As I described previously in an > epic thread, I'm using the SNES to solve problems involving an elastic > material on a rectangular grid, subjected to external forces. In any case, > I'm occasionally getting poor convergence using Newton's method with line > search. In troubleshooting by visualizing the residual, I saw that in data > sets which had good convergence, the residual was nevertheless > significantly larger along the boundary between different processors. > Likewise, in data sets with poor convergence, the residual became very > large on the boundary between different processors. The residual is not > significantly larger on the physical boundary, i.e. the global boundary. > When I run on a single process, convergence seems to be good on all data > sets. > > Any clues to fix this? > It sounds like something is wrong with communication across domains: - If this is FEM, it sounds like you are not adding contributions from the other domain to shared vertices/edges/faces - If this is FDM/FVM, maybe the ghosts are not updated What DM are you using? Are you using the Local assembly functions (FormFunctionLocal), or just FormFunction()? Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yann.jobic at univ-amu.fr Thu Nov 9 12:20:18 2017 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Thu, 9 Nov 2017 19:20:18 +0100 Subject: [petsc-users] DMPlexVecGetClosure for DM forest Message-ID: Hello, I'm trying to access to the values of a p4est forest. I know how to do that by converting my forest to a DMPlex, and then use DMPlexVecGetClosure over the converted DM. However, i want to assign a label and access to the values of the forest directly. Is it possible ? Thanks, Yann --- L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le logiciel antivirus Avast. https://www.avast.com/antivirus From knepley at gmail.com Thu Nov 9 12:22:20 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 9 Nov 2017 13:22:20 -0500 Subject: [petsc-users] DMPlexVecGetClosure for DM forest In-Reply-To: References: Message-ID: On Thu, Nov 9, 2017 at 1:20 PM, Yann Jobic wrote: > Hello, > I'm trying to access to the values of a p4est forest. > I know how to do that by converting my forest to a DMPlex, and then use > DMPlexVecGetClosure over the converted DM. > However, i want to assign a label and access to the values of the forest > directly. > What do you mean by "directly". p4est only has topology. We use a Section to map points to values, just like Plex. Thanks, Matt > Is it possible ? > Thanks, > Yann > > > --- > L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le > logiciel antivirus Avast. > https://www.avast.com/antivirus > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Nov 9 12:56:40 2017 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 9 Nov 2017 12:56:40 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Mark: > OK, well, just go with the Linux machine for the regression test. I will > keep trying to reproduce this on my Mac with an O build. > Valgrind error occurs on linux machines with g-build. I cannot merge this branch to maint until the bug is fixed. Hong > > On Wed, Nov 8, 2017 at 12:24 PM, Hong wrote: > >> mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 >> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi >> -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >> >> ==30976== Invalid read of size 16 >> ==30976== at 0x8550946: dswap_k_NEHALEM (in >> /usr/lib/openblas-base/libblas.so.3) >> ==30976== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >> .so.3) >> ==30976== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3.0) >> ==30976== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >> ==30976== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >> ==30976== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >> ==30976== by 0x41056E: main (ex56.c:395) >> ==30976== Address 0xdc650d0 is 52,480 bytes inside a block of size >> 52,488 alloc'd >> ==30976== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >> mcheck-amd64-linux.so) >> ==30976== by 0x51590F6: PetscMallocAlign (mal.c:39) >> ==30976== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >> ==30976== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >> ==30976== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >> ==30976== by 0x41056E: main (ex56.c:395) >> >> You can fix it on branch hzhang/fix-submat_samerowdist. >> >> Hong >> >> >> On Wed, Nov 8, 2017 at 11:01 AM, Mark Adams wrote: >> >>> >>> >>> On Wed, Nov 8, 2017 at 11:09 AM, Hong wrote: >>> >>>> Mark: >>>> >>>>> Hong, is >>>>> > 0-cells: 12 12 0 0 >>>>> > 1-cells: 20 20 0 0 >>>>> > 2-cells: 11 11 0 0 >>>>> > 3-cells: 2 2 0 0 >>>>> >>>>> from the old version? >>>>> >>>> In O-build on my macPro, I get the above. In g-build, I get >>>> >>>>> >>>>> 0-cells: 8 8 8 8 >>>>> 1-cells: 12 12 12 12 >>>>> 2-cells: 6 6 6 6 >>>>> 3-cells: 1 1 1 1 >>>>> >>>> I get this on linux machine. >>>> Do you know why? >>>> >>> >>> I can not reproduce your O output. I will look at it later. >>> >>> Valgrind is failing on me right now. I will look into it but can you >>> valgrind it? >>> >>> >>>> >>>> Hong >>>> >>>>> >>>>> On Tue, Nov 7, 2017 at 10:13 PM, Hong wrote: >>>>> >>>>>> Mark: >>>>>> I removed option '-ex56_dm_view'. >>>>>> Hong >>>>>> >>>>>> Humm, this looks a little odd, but it may be OK. Is this this >>>>>>> diffing with the old non-repartition data? (more below) >>>>>>> >>>>>>> On Tue, Nov 7, 2017 at 11:45 AM, Hong wrote: >>>>>>> >>>>>>>> Mark, >>>>>>>> The fix is merged to next branch for tests which show diff as >>>>>>>> >>>>>>>> ******* Testing: testexamples_PARMETIS ******* >>>>>>>> 5c5 >>>>>>>> < 1 SNES Function norm 1.983e-10 >>>>>>>> --- >>>>>>>> > 1 SNES Function norm 1.990e-10 >>>>>>>> 10,13c10,13 >>>>>>>> < 0-cells: 8 8 8 8 >>>>>>>> < 1-cells: 12 12 12 12 >>>>>>>> < 2-cells: 6 6 6 6 >>>>>>>> < 3-cells: 1 1 1 1 >>>>>>>> >>>>>>>> >>>>>>> I assume this is the old. >>>>>>> >>>>>>> >>>>>>>> --- >>>>>>>> > 0-cells: 12 12 0 0 >>>>>>>> > 1-cells: 20 20 0 0 >>>>>>>> > 2-cells: 11 11 0 0 >>>>>>>> > 3-cells: 2 2 0 0 >>>>>>>> 15,18c15,18 >>>>>>>> >>>>>>>> >>>>>>> and this is the new. >>>>>>> >>>>>>> This is funny because the processors are not fully populated. This >>>>>>> can happen on coarse grids and indeed it should happen in a test with good >>>>>>> coverage. >>>>>>> >>>>>>> I assume these diffs are views from coarse grids? That is, in the >>>>>>> raw output files do you see fully populated fine grids, with no diffs, and >>>>>>> then the diffs come on coarse grids. >>>>>>> >>>>>>> Repartitioning the coarse grids can change the coarsening, It is >>>>>>> possible that repartitioning causes faster coarsening (it does a little) >>>>>>> and this faster coarsening is tripping the aggregation switch, which gives >>>>>>> us empty processors. >>>>>>> >>>>>>> Am I understanding this correctly ... >>>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>>> >>>>>>>> < boundary: 1 strata with value/size (1 (23)) >>>>>>>> < Face Sets: 4 strata with value/size (1 (1), 2 (1), 4 (1), 6 (1)) >>>>>>>> < marker: 1 strata with value/size (1 (15)) >>>>>>>> < depth: 4 strata with value/size (0 (8), 1 (12), 2 (6), 3 (1)) >>>>>>>> --- >>>>>>>> > boundary: 1 strata with value/size (1 (39)) >>>>>>>> > Face Sets: 5 strata with value/size (1 (2), 2 (2), 4 (2), 5 (1), 6 (1)) >>>>>>>> > marker: 1 strata with value/size (1 (27)) >>>>>>>> > depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), 3 (2)) >>>>>>>> >>>>>>>> see http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2017/11/07/examples_full_next-tmp.log >>>>>>>> >>>>>>>> I guess parmetis produces random partition on different machines (I made output file for ex56_1 on my imac). Please take a look at the differences. If the outputs are correct, I will remove option '-ex56_dm_view' >>>>>>>> >>>>>>>> Hong >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Nov 5, 2017 at 9:03 PM, Hong wrote: >>>>>>>> >>>>>>>>> Mark: >>>>>>>>> Bug is fixed in branch hzhang/fix-submat_samerowdist >>>>>>>>> https://bitbucket.org/petsc/petsc/branch/hzhang/fix-submat_s >>>>>>>>> amerowdist >>>>>>>>> >>>>>>>>> I also add the test runex56. Please test it and let me know if >>>>>>>>> there is a problem. >>>>>>>>> Hong >>>>>>>>> >>>>>>>>> Also, I have been using -petscpartition_type but now I see >>>>>>>>>> -pc_gamg_mat_partitioning_type. Is -petscpartition_type >>>>>>>>>> depreciated for GAMG? >>>>>>>>>> >>>>>>>>>> Is this some sort of auto generated portmanteau? I can not find >>>>>>>>>> pc_gamg_mat_partitioning_type in the source. >>>>>>>>>> >>>>>>>>>> On Thu, Nov 2, 2017 at 6:44 PM, Mark Adams >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Great, thanks, >>>>>>>>>>> >>>>>>>>>>> And could you please add these parameters to a regression test? >>>>>>>>>>> As I recall we have with-parmetis regression test. >>>>>>>>>>> >>>>>>>>>>> On Thu, Nov 2, 2017 at 6:35 PM, Hong wrote: >>>>>>>>>>> >>>>>>>>>>>> Mark: >>>>>>>>>>>> I used petsc/src/ksp/ksp/examples/tutorials/ex56.c :-( >>>>>>>>>>>> Now testing src/snes/examples/tutorials/ex56.c with your >>>>>>>>>>>> options, I can reproduce the error. >>>>>>>>>>>> I'll fix it. >>>>>>>>>>>> >>>>>>>>>>>> Hong >>>>>>>>>>>> >>>>>>>>>>>> Hong, >>>>>>>>>>>>> >>>>>>>>>>>>> I've tested with master and I get the same error. Maybe the >>>>>>>>>>>>> partitioning parameters are wrong. -pc_gamg_mat_partitioning_type is new to >>>>>>>>>>>>> me. >>>>>>>>>>>>> >>>>>>>>>>>>> Can you run this (snes ex56) w/o the error? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 17:33 *master* *= ~*/Codes/petsc/src/snes/examples/tutorials*$ >>>>>>>>>>>>> make runex >>>>>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -n >>>>>>>>>>>>> 4 ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>>>>> -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type parmetis >>>>>>>>>>>>> -mat_block_size 3 -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 >>>>>>>>>>>>> -pc_gamg_repartition true >>>>>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>>>>> >>>>>>>>>>>>> depth: 4 strata with value/size (0 (27), 1 (54), 2 (36), 3 >>>>>>>>>>>>> (8)) >>>>>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>>>> [0]PETSC ERROR: No support for this operation for this object >>>>>>>>>>>>> type >>>>>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Nov 2, 2017 at 1:35 PM, Hong >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Mark : >>>>>>>>>>>>>> I realize that using maint or master branch, I cannot >>>>>>>>>>>>>> reproduce the same error. >>>>>>>>>>>>>> For this example, you must use a parallel partitioner, >>>>>>>>>>>>>> e.g.,'current' gives me following error: >>>>>>>>>>>>>> [0]PETSC ERROR: This is the DEFAULT NO-OP partitioner, it >>>>>>>>>>>>>> currently only supports one domain per processor >>>>>>>>>>>>>> use -pc_gamg_mat_partitioning_type parmetis or chaco or >>>>>>>>>>>>>> ptscotch for more than one subdomain per processor >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please rebase your branch with maint or master, then see if >>>>>>>>>>>>>> you still have problem. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hong >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Nov 2, 2017 at 11:07 AM, Hong >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Mark, >>>>>>>>>>>>>>>> I can reproduce this in an old branch, but not in current >>>>>>>>>>>>>>>> maint and master. >>>>>>>>>>>>>>>> Which branch are you using to produce this error? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am using a branch from Matt. Let me try to merge it with >>>>>>>>>>>>>>> master. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Nov 2, 2017 at 9:28 AM, Mark Adams >>>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am able to reproduce this with snes ex56 with 2 >>>>>>>>>>>>>>>>> processors and adding -pc_gamg_repartition true >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm not sure how to fix it. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 10:26 1 knepley/feature-plex-boxmesh-create *= >>>>>>>>>>>>>>>>> ~/Codes/petsc/src/snes/examples/tutorials$ make >>>>>>>>>>>>>>>>> PETSC_DIR=/Users/markadams/Codes/petsc >>>>>>>>>>>>>>>>> PETSC_ARCH=arch-macosx-gnu-g runex >>>>>>>>>>>>>>>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec >>>>>>>>>>>>>>>>> -n 2 ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it >>>>>>>>>>>>>>>>> 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>>>>>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>>>>>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>>>>>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>>>>>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason >>>>>>>>>>>>>>>>> -snes_monitor_short -ksp_monitor_short -snes_converged_reason >>>>>>>>>>>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>>>>>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>>>>>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>>>>>>>>> -mg_levels_pc_type jacobi -petscpartitioner_type simple -mat_block_size 3 >>>>>>>>>>>>>>>>> -matrap 0 -matptap_scalable -ex56_dm_view -run_type 1 -pc_gamg_repartition >>>>>>>>>>>>>>>>> true >>>>>>>>>>>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>>>>>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>>>>>>>>>>> 0 SNES Function norm 122.396 >>>>>>>>>>>>>>>>> 0 KSP Residual norm 122.396 >>>>>>>>>>>>>>>>> 1 KSP Residual norm 20.4696 >>>>>>>>>>>>>>>>> 2 KSP Residual norm 3.95009 >>>>>>>>>>>>>>>>> 3 KSP Residual norm 0.176181 >>>>>>>>>>>>>>>>> 4 KSP Residual norm 0.0208781 >>>>>>>>>>>>>>>>> 5 KSP Residual norm 0.00278873 >>>>>>>>>>>>>>>>> 6 KSP Residual norm 0.000482741 >>>>>>>>>>>>>>>>> 7 KSP Residual norm 4.68085e-05 >>>>>>>>>>>>>>>>> 8 KSP Residual norm 5.42381e-06 >>>>>>>>>>>>>>>>> 9 KSP Residual norm 5.12785e-07 >>>>>>>>>>>>>>>>> 10 KSP Residual norm 2.60389e-08 >>>>>>>>>>>>>>>>> 11 KSP Residual norm 4.96201e-09 >>>>>>>>>>>>>>>>> 12 KSP Residual norm 1.989e-10 >>>>>>>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations >>>>>>>>>>>>>>>>> 12 >>>>>>>>>>>>>>>>> 1 SNES Function norm 1.990e-10 >>>>>>>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>>>>>>> iterations 1 >>>>>>>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>>>>>>> type: plex >>>>>>>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>>>>>>> 0-cells: 12 12 >>>>>>>>>>>>>>>>> 1-cells: 20 20 >>>>>>>>>>>>>>>>> 2-cells: 11 11 >>>>>>>>>>>>>>>>> 3-cells: 2 2 >>>>>>>>>>>>>>>>> Labels: >>>>>>>>>>>>>>>>> boundary: 1 strata with value/size (1 (39)) >>>>>>>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (2), 2 (2), 3 >>>>>>>>>>>>>>>>> (2), 5 (1), 6 (1)) >>>>>>>>>>>>>>>>> marker: 1 strata with value/size (1 (27)) >>>>>>>>>>>>>>>>> depth: 4 strata with value/size (0 (12), 1 (20), 2 (11), >>>>>>>>>>>>>>>>> 3 (2)) >>>>>>>>>>>>>>>>> [0] 441 global equations, 147 vertices >>>>>>>>>>>>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>>>>>>>>>>>> 0 SNES Function norm 49.7106 >>>>>>>>>>>>>>>>> 0 KSP Residual norm 49.7106 >>>>>>>>>>>>>>>>> 1 KSP Residual norm 12.9252 >>>>>>>>>>>>>>>>> 2 KSP Residual norm 2.38019 >>>>>>>>>>>>>>>>> 3 KSP Residual norm 0.426307 >>>>>>>>>>>>>>>>> 4 KSP Residual norm 0.0692155 >>>>>>>>>>>>>>>>> 5 KSP Residual norm 0.0123092 >>>>>>>>>>>>>>>>> 6 KSP Residual norm 0.00184874 >>>>>>>>>>>>>>>>> 7 KSP Residual norm 0.000320761 >>>>>>>>>>>>>>>>> 8 KSP Residual norm 5.48957e-05 >>>>>>>>>>>>>>>>> 9 KSP Residual norm 9.90089e-06 >>>>>>>>>>>>>>>>> 10 KSP Residual norm 1.5127e-06 >>>>>>>>>>>>>>>>> 11 KSP Residual norm 2.82192e-07 >>>>>>>>>>>>>>>>> 12 KSP Residual norm 4.62364e-08 >>>>>>>>>>>>>>>>> 13 KSP Residual norm 7.99573e-09 >>>>>>>>>>>>>>>>> 14 KSP Residual norm 1.3028e-09 >>>>>>>>>>>>>>>>> 15 KSP Residual norm 2.174e-10 >>>>>>>>>>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations >>>>>>>>>>>>>>>>> 15 >>>>>>>>>>>>>>>>> 1 SNES Function norm 2.174e-10 >>>>>>>>>>>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>>>>>>>>>>> iterations 1 >>>>>>>>>>>>>>>>> DM Object: Mesh (ex56_) 2 MPI processes >>>>>>>>>>>>>>>>> type: plex >>>>>>>>>>>>>>>>> Mesh in 3 dimensions: >>>>>>>>>>>>>>>>> 0-cells: 45 45 >>>>>>>>>>>>>>>>> 1-cells: 96 96 >>>>>>>>>>>>>>>>> 2-cells: 68 68 >>>>>>>>>>>>>>>>> 3-cells: 16 16 >>>>>>>>>>>>>>>>> Labels: >>>>>>>>>>>>>>>>> marker: 1 strata with value/size (1 (129)) >>>>>>>>>>>>>>>>> Face Sets: 5 strata with value/size (1 (18), 2 (18), 3 >>>>>>>>>>>>>>>>> (18), 5 (9), 6 (9)) >>>>>>>>>>>>>>>>> boundary: 1 strata with value/size (1 (141)) >>>>>>>>>>>>>>>>> depth: 4 strata with value/size (0 (45), 1 (96), 2 (68), >>>>>>>>>>>>>>>>> 3 (16)) >>>>>>>>>>>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>>>>>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>>>>>>>>>>> 0 SNES Function norm 17.9091 >>>>>>>>>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>>> [0]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>>>> [0]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>>> [1]PETSC ERROR: No support for this operation for this >>>>>>>>>>>>>>>>> object type >>>>>>>>>>>>>>>>> [1]PETSC ERROR: unsorted iscol_local is not implemented yet >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Nov 2, 2017 at 9:51 AM, Mark Adams < >>>>>>>>>>>>>>>>> mfadams at lbl.gov> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Nov 1, 2017 at 9:36 PM, Randy Michael Churchill < >>>>>>>>>>>>>>>>>> rchurchi at pppl.gov> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Doing some additional testing, the issue goes away when >>>>>>>>>>>>>>>>>>> removing the gamg preconditioner line from the petsc.rc: >>>>>>>>>>>>>>>>>>> -pc_type gamg >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yea, this is GAMG setup. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This is the code. findices is create with >>>>>>>>>>>>>>>>>> ISCreateStride, so it is sorted ... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Michael is repartitioning the coarse grids. Maybe we >>>>>>>>>>>>>>>>>> don't have a regression test with this... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I will try to reproduce this. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Michael: you can use hypre for now, or turn >>>>>>>>>>>>>>>>>> repartitioning off (eg, -fsa_fieldsplit_lambda_upper_pc_gamg_repartition >>>>>>>>>>>>>>>>>> false), but I'm not sure this will fix this. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> You don't have hypre parameters for all of your all of >>>>>>>>>>>>>>>>>> your solvers. I think 'boomeramg' is the default pc_hypre_type. That should >>>>>>>>>>>>>>>>>> be good enough for you. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> IS findices; >>>>>>>>>>>>>>>>>> PetscInt Istart,Iend; >>>>>>>>>>>>>>>>>> Mat Pnew; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ierr = MatGetOwnershipRange(Pold, &Istart, >>>>>>>>>>>>>>>>>> &Iend);CHKERRQ(ierr); >>>>>>>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>>>>>>> ierr = PetscLogEventBegin(petsc_gamg_ >>>>>>>>>>>>>>>>>> setup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>>>>> ierr = ISCreateStride(comm,Iend-Istar >>>>>>>>>>>>>>>>>> t,Istart,1,&findices);CHKERRQ(ierr); >>>>>>>>>>>>>>>>>> ierr = ISSetBlockSize(findices,f_bs);CHKERRQ(ierr); >>>>>>>>>>>>>>>>>> ierr = MatCreateSubMatrix(Pold, findices, >>>>>>>>>>>>>>>>>> new_eq_indices, MAT_INITIAL_MATRIX, &Pnew);CHKERRQ(ierr); >>>>>>>>>>>>>>>>>> ierr = ISDestroy(&findices);CHKERRQ(ierr); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> #if defined PETSC_GAMG_USE_LOG >>>>>>>>>>>>>>>>>> ierr = PetscLogEventEnd(petsc_gamg_se >>>>>>>>>>>>>>>>>> tup_events[SET15],0,0,0,0);CHKERRQ(ierr); >>>>>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>>>>> ierr = MatDestroy(a_P_inout);CHKERRQ(ierr); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> /* output - repartitioned */ >>>>>>>>>>>>>>>>>> *a_P_inout = Pnew; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Nov 1, 2017 at 8:23 PM, Hong >>>>>>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>>>>>> Thanks, I'll check it tomorrow. >>>>>>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> OK, this might not be completely satisfactory, because >>>>>>>>>>>>>>>>>>>>> it doesn't show the partitioning or how the matrix is created, but this >>>>>>>>>>>>>>>>>>>>> reproduces the problem. I wrote out my matrix, Amat, from the larger >>>>>>>>>>>>>>>>>>>>> simulation, and load it in this script. This must be run with MPI rank >>>>>>>>>>>>>>>>>>>>> greater than 1. This may be some combination of my petsc.rc, because when I >>>>>>>>>>>>>>>>>>>>> use the PetscInitialize with it, it throws the error, but when using >>>>>>>>>>>>>>>>>>>>> default (PETSC_NULL_CHARACTER) it runs fine. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 9:58 AM, Hong < >>>>>>>>>>>>>>>>>>>>> hzhang at mcs.anl.gov> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Randy: >>>>>>>>>>>>>>>>>>>>>> It could be a bug or a missing feature in our new >>>>>>>>>>>>>>>>>>>>>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>>>>>>>>>>>>>>>>>>>>> It would be helpful if you can provide us a simple >>>>>>>>>>>>>>>>>>>>>> example that produces this example. >>>>>>>>>>>>>>>>>>>>>> Hong >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I'm running a Fortran code that was just changed over >>>>>>>>>>>>>>>>>>>>>>> to using petsc 3.8 (previously petsc 3.7.6). An error was thrown during a >>>>>>>>>>>>>>>>>>>>>>> KSPSetUp() call. The error is "unsorted iscol_local is not implemented yet" >>>>>>>>>>>>>>>>>>>>>>> (see full error below). I tried to trace down the difference in the source >>>>>>>>>>>>>>>>>>>>>>> files, but where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) >>>>>>>>>>>>>>>>>>>>>>> doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It >>>>>>>>>>>>>>>>>>>>>>> seems the error is that the order of the columns locally are unsorted, >>>>>>>>>>>>>>>>>>>>>>> though I don't think I specify a column order in the creation of the matrix: >>>>>>>>>>>>>>>>>>>>>>> call MatCreate(this%comm,AA,ierr) >>>>>>>>>>>>>>>>>>>>>>> call MatSetSizes(AA,npetscloc,npets >>>>>>>>>>>>>>>>>>>>>>> cloc,nreal,nreal,ierr) >>>>>>>>>>>>>>>>>>>>>>> call MatSetType(AA,MATAIJ,ierr) >>>>>>>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>>>>>>> call MatGetOwnershipRange(AA,low,high,ierr) >>>>>>>>>>>>>>>>>>>>>>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>>>>>>>>>>>>>>>>>>>>>> call getNNZ(grid,npetscloc,low,high >>>>>>>>>>>>>>>>>>>>>>> ,d_nnz,o_nnz,this%xgc_petsc,nreal,ierr) >>>>>>>>>>>>>>>>>>>>>>> call MatSeqAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,ierr) >>>>>>>>>>>>>>>>>>>>>>> call MatMPIAIJSetPreallocation(AA,P >>>>>>>>>>>>>>>>>>>>>>> ETSC_NULL_INTEGER,d_nnz,PETSC_ >>>>>>>>>>>>>>>>>>>>>>> NULL_INTEGER,o_nnz,ierr) >>>>>>>>>>>>>>>>>>>>>>> deallocate(d_nnz,o_nnz) >>>>>>>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_IGNORE_OFF >>>>>>>>>>>>>>>>>>>>>>> _PROC_ENTRIES,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>>>>>>> call MatSetOption(AA,MAT_KEEP_NONZE >>>>>>>>>>>>>>>>>>>>>>> RO_PATTERN,PETSC_TRUE,ierr) >>>>>>>>>>>>>>>>>>>>>>> call MatSetup(AA,ierr) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>>>>>>>> -------------------------------- >>>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: No support for this operation for >>>>>>>>>>>>>>>>>>>>>>> this object type >>>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: unsorted iscol_local is not >>>>>>>>>>>>>>>>>>>>>>> implemented yet >>>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>>>>>>>>>>>>>>>>>>>> ocumentation/faq.html for trouble shooting. >>>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, >>>>>>>>>>>>>>>>>>>>>>> unknown[62]PETSC ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() >>>>>>>>>>>>>>>>>>>>>>> line 3418 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line >>>>>>>>>>>>>>>>>>>>>>> 3247 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>>> 8.0/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 >>>>>>>>>>>>>>>>>>>>>>> in /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line >>>>>>>>>>>>>>>>>>>>>>> 383 in /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>>>>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in >>>>>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/pc/interface/precon.c >>>>>>>>>>>>>>>>>>>>>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in >>>>>>>>>>>>>>>>>>>>>>> /global/u1/r/rchurchi/petsc/3. >>>>>>>>>>>>>>>>>>>>>>> 8.0/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> R. Michael Churchill >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dnolte at dim.uchile.cl Thu Nov 9 13:19:38 2017 From: dnolte at dim.uchile.cl (David Nolte) Date: Thu, 9 Nov 2017 16:19:38 -0300 Subject: [petsc-users] GAMG advice In-Reply-To: References: <47a47b6b-ce8c-10f6-0ded-bf87e9af1bbd@dim.uchile.cl> <991cd7c4-bb92-ed2c-193d-7232c1ff6199@dim.uchile.cl> <6169118C-34FE-491C-BCB4-A86BECCFBAA9@mcs.anl.gov> <0779aa51-17c8-0ef3-fd01-1413ee1225ea@dim.uchile.cl> Message-ID: <2e817bc2-132d-87c6-c521-43149ce3cdcc@dim.uchile.cl> Hi Mark, thanks for clarifying. When I wrote the initial question I had somehow overlooked the fact that the GAMG standard smoother was Chebychev while ML uses SOR. All the other comments concerning threshold etc were based on this mistake. The following settings work quite well, of course LU is used on the coarse level. ??? -pc_type gamg ??? -pc_gamg_type agg ??? -pc_gamg_threshold 0.03 ??? -pc_gamg_square_graph 10??? ??? # no effect ? ??? -pc_gamg_sym_graph ??? -mg_levels_ksp_type richardson ??? -mg_levels_pc_type sor -pc_gamg_agg_nsmooths 0 does not seem to improve the convergence. The ksp view now looks like this: (does this seem reasonable?) KSP Object: 4 MPI processes ? type: fgmres ??? GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement ??? GMRES: happy breakdown tolerance 1e-30 ? maximum iterations=10000 ? tolerances:? relative=1e-06, absolute=1e-50, divergence=10000. ? right preconditioning ? using nonzero initial guess ? using UNPRECONDITIONED norm type for convergence test PC Object: 4 MPI processes ? type: gamg ??? MG: type is MULTIPLICATIVE, levels=5 cycles=v ????? Cycles per PCApply=1 ????? Using Galerkin computed coarse grid matrices ????? GAMG specific options ??????? Threshold for dropping small values from graph 0.03 ??????? AGG specific options ????????? Symmetric graph true ? Coarse grid solver -- level ------------------------------- ??? KSP Object:??? (mg_coarse_)???? 4 MPI processes ????? type: preonly ????? maximum iterations=10000, initial guess is zero ????? tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. ????? left preconditioning ????? using NONE norm type for convergence test ??? PC Object:??? (mg_coarse_)???? 4 MPI processes ????? type: bjacobi ??????? block Jacobi: number of blocks = 4 ??????? Local solve is same for all blocks, in the following KSP and PC objects: ????? KSP Object:????? (mg_coarse_sub_)?????? 1 MPI processes ??????? type: preonly ??????? maximum iterations=1, initial guess is zero ??????? tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. ??????? left preconditioning ??????? using NONE norm type for convergence test ????? PC Object:????? (mg_coarse_sub_)?????? 1 MPI processes ??????? type: lu ????????? LU: out-of-place factorization ????????? tolerance for zero pivot 2.22045e-14 ????????? using diagonal shift on blocks to prevent zero pivot [INBLOCKS] ????????? matrix ordering: nd ????????? factor fill ratio given 5., needed 1. ??????????? Factored matrix follows: ????????????? Mat Object:?????????????? 1 MPI processes ??????????????? type: seqaij ??????????????? rows=38, cols=38 ??????????????? package used to perform factorization: petsc ??????????????? total: nonzeros=1444, allocated nonzeros=1444 ??????????????? total number of mallocs used during MatSetValues calls =0 ????????????????? using I-node routines: found 8 nodes, limit used is 5 ??????? linear system matrix = precond matrix: ??????? Mat Object:???????? 1 MPI processes ????????? type: seqaij ????????? rows=38, cols=38 ????????? total: nonzeros=1444, allocated nonzeros=1444 ????????? total number of mallocs used during MatSetValues calls =0 ??????????? using I-node routines: found 8 nodes, limit used is 5 ????? linear system matrix = precond matrix: ????? Mat Object:?????? 4 MPI processes ??????? type: mpiaij ??????? rows=38, cols=38 ??????? total: nonzeros=1444, allocated nonzeros=1444 ??????? total number of mallocs used during MatSetValues calls =0 ????????? using I-node (on process 0) routines: found 8 nodes, limit used is 5 ? Down solver (pre-smoother) on level 1 ------------------------------- ??? KSP Object:??? (mg_levels_1_)???? 4 MPI processes ????? type: richardson ??????? Richardson: damping factor=1. ????? maximum iterations=2 ????? tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. ????? left preconditioning ????? using nonzero initial guess ????? using NONE norm type for convergence test ??? PC Object:??? (mg_levels_1_)???? 4 MPI processes ????? type: sor ??????? SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. ????? linear system matrix = precond matrix: ????? Mat Object:?????? 4 MPI processes ??????? type: mpiaij ??????? rows=168, cols=168 ??????? total: nonzeros=19874, allocated nonzeros=19874 ??????? total number of mallocs used during MatSetValues calls =0 ????????? using I-node (on process 0) routines: found 17 nodes, limit used is 5 ? Up solver (post-smoother) same as down solver (pre-smoother) ? Down solver (pre-smoother) on level 2 ------------------------------- ??? KSP Object:??? (mg_levels_2_)???? 4 MPI processes ????? type: richardson ??????? Richardson: damping factor=1. ????? maximum iterations=2 ????? tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. ????? left preconditioning ????? using nonzero initial guess ????? using NONE norm type for convergence test ??? PC Object:??? (mg_levels_2_)???? 4 MPI processes ????? type: sor ??????? SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. ????? linear system matrix = precond matrix: ????? Mat Object:?????? 4 MPI processes ??????? type: mpiaij ??????? rows=3572, cols=3572 ??????? total: nonzeros=963872, allocated nonzeros=963872 ??????? total number of mallocs used during MatSetValues calls =0 ????????? not using I-node (on process 0) routines ? Up solver (post-smoother) same as down solver (pre-smoother) ? Down solver (pre-smoother) on level 3 ------------------------------- ??? KSP Object:??? (mg_levels_3_)???? 4 MPI processes ????? type: richardson ??????? Richardson: damping factor=1. ????? maximum iterations=2 ????? tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. ????? left preconditioning ????? using nonzero initial guess ????? using NONE norm type for convergence test ??? PC Object:??? (mg_levels_3_)???? 4 MPI processes ????? type: sor ??????? SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. ????? linear system matrix = precond matrix: ????? Mat Object:?????? 4 MPI processes ??????? type: mpiaij ??????? rows=21179, cols=21179 ??????? total: nonzeros=1060605, allocated nonzeros=1060605 ??????? total number of mallocs used during MatSetValues calls =0 ????????? not using I-node (on process 0) routines ? Up solver (post-smoother) same as down solver (pre-smoother) ? Down solver (pre-smoother) on level 4 ------------------------------- ??? KSP Object:??? (mg_levels_4_)???? 4 MPI processes ????? type: richardson ??????? Richardson: damping factor=1. ????? maximum iterations=2 ????? tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. ????? left preconditioning ????? using nonzero initial guess ????? using NONE norm type for convergence test ??? PC Object:??? (mg_levels_4_)???? 4 MPI processes ????? type: sor ??????? SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. ????? linear system matrix = precond matrix: ????? Mat Object:?????? 4 MPI processes ??????? type: mpiaij ??????? rows=1745224, cols=1745224 ??????? total: nonzeros=99452608, allocated nonzeros=99452608 ??????? total number of mallocs used during MatSetValues calls =0 ????????? using I-node (on process 0) routines: found 254433 nodes, limit used is 5 ? Up solver (post-smoother) same as down solver (pre-smoother) ? linear system matrix = precond matrix: ? Mat Object:?? 4 MPI processes ??? type: mpiaij ??? rows=1745224, cols=1745224 ??? total: nonzeros=99452608, allocated nonzeros=99452608 ??? total number of mallocs used during MatSetValues calls =0 ????? using I-node (on process 0) routines: found 254433 nodes, limit used is 5 Thanks, David On 11/08/2017 10:11 PM, Mark Adams wrote: > > > On Wed, Nov 1, 2017 at 5:45 PM, David Nolte > wrote: > > Thanks Barry. > By simply replacing chebychev by richardson I get similar performance > with GAMG and ML > > > That too (I assumed you were using the same, I could not see cheby in > your view data). > > I guess SOR works for the coarse grid solver because the coarse grid > is small. It should help using lu. > ? > > (GAMG even slightly faster): > > > This is "random" fluctuations. > ? > > > -pc_type > gamg??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? > > -pc_gamg_type > agg??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? > > -pc_gamg_threshold > 0.03????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? > > -pc_gamg_square_graph 10 > -pc_gamg_sym_graph > -mg_levels_ksp_type > richardson?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? > > -mg_levels_pc_type sor > > Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix > is asymmetric? > > > yes, > ? > > For serial runs it doesn't seem to matter, > > > yes, > ? > > but in > parallel the PC setup hangs (after calls of > PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set. > > > yep, > ? > > > David > > > On 10/21/2017 12:10 AM, Barry Smith wrote: > >? ?David, > > > >? ? GAMG picks the number of levels based on how the coarsening > process etc proceeds. You cannot hardwire it to a particular > value. You can run with -info to get more info potentially on the > decisions GAMG is making. > > > >? ?Barry > > > >> On Oct 20, 2017, at 2:06 PM, David Nolte > wrote: > >> > >> PS: I didn't realize at first, it looks as if the -pc_mg_levels > 3 option > >> was not taken into account: > >> type: gamg > >>? ? ?MG: type is MULTIPLICATIVE, levels=1 cycles=v > >> > >> > >> > >> On 10/20/2017 03:32 PM, David Nolte wrote: > >>> Dear all, > >>> > >>> I have some problems using GAMG as a preconditioner for (F)GMRES. > >>> Background: I am solving the incompressible, unsteady > Navier-Stokes > >>> equations with a coupled mixed FEM approach, using P1/P1 > elements for > >>> velocity and pressure on an unstructured tetrahedron mesh with > about > >>> 2mio DOFs (and up to 15mio). The method is stabilized with > SUPG/PSPG, > >>> hence, no zeros on the diagonal of the pressure block. Time > >>> discretization with semi-implicit backward Euler. The flow is a > >>> convection dominated flow through a nozzle. > >>> > >>> So far, for this setup, I have been quite happy with a simple > FGMRES/ML > >>> solver for the full system (rather bruteforce, I admit, but > much faster > >>> than any block/Schur preconditioners I tried): > >>> > >>>? ? ?-ksp_converged_reason > >>>? ? ?-ksp_monitor_true_residual > >>>? ? ?-ksp_type fgmres > >>>? ? ?-ksp_rtol 1.0e-6 > >>>? ? ?-ksp_initial_guess_nonzero > >>> > >>>? ? ?-pc_type ml > >>>? ? ?-pc_ml_Threshold 0.03 > >>>? ? ?-pc_ml_maxNlevels 3 > >>> > >>> This setup converges in ~100 iterations (see below the > ksp_view output) > >>> to rtol: > >>> > >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true > resid norm > >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 > >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true > resid norm > >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 > >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true > resid norm > >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 > >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true > resid norm > >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 > >>> > >>> > >>> Now I'd like to try GAMG instead of ML. However, I don't know > how to set > >>> it up to get similar performance. > >>> The obvious/naive > >>> > >>>? ? ?-pc_type gamg > >>>? ? ?-pc_gamg_type agg > >>> > >>> # with and without > >>>? ? ?-pc_gamg_threshold 0.03 > >>>? ? ?-pc_mg_levels 3 > >>> > >>> converges very slowly on 1 proc and much worse on 8 (~200k > dofs per > >>> proc), for instance: > >>> np = 1: > >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true > resid norm > >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 > >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true > resid norm > >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 > >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true > resid norm > >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 > >>> > >>> np = 8: > >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true > resid norm > >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 > >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true > resid norm > >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true > resid norm > >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >>> > >>> A very high threshold seems to improve the GAMG PC, for > instance with > >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations. > >>> What else should I try? > >>> > >>> I would very much appreciate any advice on configuring GAMG and > >>> differences w.r.t ML to be taken into account (not a multigrid > expert > >>> though). > >>> > >>> Thanks, best wishes > >>> David > >>> > >>> > >>> ------ > >>> ksp_view for -pc_type gamg? ? ? -pc_gamg_threshold 0.75 > -pc_mg_levels 3 > >>> > >>> KSP Object: 1 MPI processes > >>>? ?type: fgmres > >>>? ? ?GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > >>> Orthogonalization with no iterative refinement > >>>? ? ?GMRES: happy breakdown tolerance 1e-30 > >>>? ?maximum iterations=10000 > >>>? ?tolerances:? relative=1e-06, absolute=1e-50, divergence=10000. > >>>? ?right preconditioning > >>>? ?using nonzero initial guess > >>>? ?using UNPRECONDITIONED norm type for convergence test > >>> PC Object: 1 MPI processes > >>>? ?type: gamg > >>>? ? ?MG: type is MULTIPLICATIVE, levels=1 cycles=v > >>>? ? ? ?Cycles per PCApply=1 > >>>? ? ? ?Using Galerkin computed coarse grid matrices > >>>? ? ? ?GAMG specific options > >>>? ? ? ? ?Threshold for dropping small values from graph 0.75 > >>>? ? ? ? ?AGG specific options > >>>? ? ? ? ? ?Symmetric graph false > >>>? ?Coarse grid solver -- level ------------------------------- > >>>? ? ?KSP Object:? ? (mg_levels_0_)? ? ?1 MPI processes > >>>? ? ? ?type: preonly > >>>? ? ? ?maximum iterations=2, initial guess is zero > >>>? ? ? ?tolerances:? relative=1e-05, absolute=1e-50, > divergence=10000. > >>>? ? ? ?left preconditioning > >>>? ? ? ?using NONE norm type for convergence test > >>>? ? ?PC Object:? ? (mg_levels_0_)? ? ?1 MPI processes > >>>? ? ? ?type: sor > >>>? ? ? ? ?SOR: type = local_symmetric, iterations = 1, local > iterations = > >>> 1, omega = 1. > >>>? ? ? ?linear system matrix = precond matrix: > >>>? ? ? ?Mat Object:? ? ? ?1 MPI processes > >>>? ? ? ? ?type: seqaij > >>>? ? ? ? ?rows=1745224, cols=1745224 > >>>? ? ? ? ?total: nonzeros=99452608, allocated nonzeros=99452608 > >>>? ? ? ? ?total number of mallocs used during MatSetValues calls =0 > >>>? ? ? ? ? ?using I-node routines: found 1037847 nodes, limit > used is 5 > >>>? ?linear system matrix = precond matrix: > >>>? ?Mat Object:? ?1 MPI processes > >>>? ? ?type: seqaij > >>>? ? ?rows=1745224, cols=1745224 > >>>? ? ?total: nonzeros=99452608, allocated nonzeros=99452608 > >>>? ? ?total number of mallocs used during MatSetValues calls =0 > >>>? ? ? ?using I-node routines: found 1037847 nodes, limit used is 5 > >>> > >>> > >>> ------ > >>> ksp_view for -pc_type ml: > >>> > >>> KSP Object: 8 MPI processes > >>>? ?type: fgmres > >>>? ? ?GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > >>> Orthogonalization with no iterative refinement > >>>? ? ?GMRES: happy breakdown tolerance 1e-30 > >>>? ?maximum iterations=10000 > >>>? ?tolerances:? relative=1e-06, absolute=1e-50, divergence=10000. > >>>? ?right preconditioning > >>>? ?using nonzero initial guess > >>>? ?using UNPRECONDITIONED norm type for convergence test > >>> PC Object: 8 MPI processes > >>>? ?type: ml > >>>? ? ?MG: type is MULTIPLICATIVE, levels=3 cycles=v > >>>? ? ? ?Cycles per PCApply=1 > >>>? ? ? ?Using Galerkin computed coarse grid matrices > >>>? ?Coarse grid solver -- level ------------------------------- > >>>? ? ?KSP Object:? ? (mg_coarse_)? ? ?8 MPI processes > >>>? ? ? ?type: preonly > >>>? ? ? ?maximum iterations=10000, initial guess is zero > >>>? ? ? ?tolerances:? relative=1e-05, absolute=1e-50, > divergence=10000. > >>>? ? ? ?left preconditioning > >>>? ? ? ?using NONE norm type for convergence test > >>>? ? ?PC Object:? ? (mg_coarse_)? ? ?8 MPI processes > >>>? ? ? ?type: redundant > >>>? ? ? ? ?Redundant preconditioner: First (color=0) of 8 PCs follows > >>>? ? ? ? ?KSP Object:? ? ? ? (mg_coarse_redundant_)? ? ? ? ?1 > MPI processes > >>>? ? ? ? ? ?type: preonly > >>>? ? ? ? ? ?maximum iterations=10000, initial guess is zero > >>>? ? ? ? ? ?tolerances:? relative=1e-05, absolute=1e-50, > divergence=10000. > >>>? ? ? ? ? ?left preconditioning > >>>? ? ? ? ? ?using NONE norm type for convergence test > >>>? ? ? ? ?PC Object:? ? ? ? (mg_coarse_redundant_)? ? ? ? ?1 MPI > processes > >>>? ? ? ? ? ?type: lu > >>>? ? ? ? ? ? ?LU: out-of-place factorization > >>>? ? ? ? ? ? ?tolerance for zero pivot 2.22045e-14 > >>>? ? ? ? ? ? ?using diagonal shift on blocks to prevent zero > pivot [INBLOCKS] > >>>? ? ? ? ? ? ?matrix ordering: nd > >>>? ? ? ? ? ? ?factor fill ratio given 5., needed 10.4795 > >>>? ? ? ? ? ? ? ?Factored matrix follows: > >>>? ? ? ? ? ? ? ? ?Mat Object:? ? ? ? ? ? ? ? ?1 MPI processes > >>>? ? ? ? ? ? ? ? ? ?type: seqaij > >>>? ? ? ? ? ? ? ? ? ?rows=6822, cols=6822 > >>>? ? ? ? ? ? ? ? ? ?package used to perform factorization: petsc > >>>? ? ? ? ? ? ? ? ? ?total: nonzeros=9575688, allocated > nonzeros=9575688 > >>>? ? ? ? ? ? ? ? ? ?total number of mallocs used during > MatSetValues calls =0 > >>>? ? ? ? ? ? ? ? ? ? ?not using I-node routines > >>>? ? ? ? ? ?linear system matrix = precond matrix: > >>>? ? ? ? ? ?Mat Object:? ? ? ? ? ?1 MPI processes > >>>? ? ? ? ? ? ?type: seqaij > >>>? ? ? ? ? ? ?rows=6822, cols=6822 > >>>? ? ? ? ? ? ?total: nonzeros=913758, allocated nonzeros=913758 > >>>? ? ? ? ? ? ?total number of mallocs used during MatSetValues > calls =0 > >>>? ? ? ? ? ? ? ?not using I-node routines > >>>? ? ? ?linear system matrix = precond matrix: > >>>? ? ? ?Mat Object:? ? ? ?8 MPI processes > >>>? ? ? ? ?type: mpiaij > >>>? ? ? ? ?rows=6822, cols=6822 > >>>? ? ? ? ?total: nonzeros=913758, allocated nonzeros=913758 > >>>? ? ? ? ?total number of mallocs used during MatSetValues calls =0 > >>>? ? ? ? ? ?not using I-node (on process 0) routines > >>>? ?Down solver (pre-smoother) on level 1 > ------------------------------- > >>>? ? ?KSP Object:? ? (mg_levels_1_)? ? ?8 MPI processes > >>>? ? ? ?type: richardson > >>>? ? ? ? ?Richardson: damping factor=1. > >>>? ? ? ?maximum iterations=2 > >>>? ? ? ?tolerances:? relative=1e-05, absolute=1e-50, > divergence=10000. > >>>? ? ? ?left preconditioning > >>>? ? ? ?using nonzero initial guess > >>>? ? ? ?using NONE norm type for convergence test > >>>? ? ?PC Object:? ? (mg_levels_1_)? ? ?8 MPI processes > >>>? ? ? ?type: sor > >>>? ? ? ? ?SOR: type = local_symmetric, iterations = 1, local > iterations = > >>> 1, omega = 1. > >>>? ? ? ?linear system matrix = precond matrix: > >>>? ? ? ?Mat Object:? ? ? ?8 MPI processes > >>>? ? ? ? ?type: mpiaij > >>>? ? ? ? ?rows=67087, cols=67087 > >>>? ? ? ? ?total: nonzeros=9722749, allocated nonzeros=9722749 > >>>? ? ? ? ?total number of mallocs used during MatSetValues calls =0 > >>>? ? ? ? ? ?not using I-node (on process 0) routines > >>>? ?Up solver (post-smoother) same as down solver (pre-smoother) > >>>? ?Down solver (pre-smoother) on level 2 > ------------------------------- > >>>? ? ?KSP Object:? ? (mg_levels_2_)? ? ?8 MPI processes > >>>? ? ? ?type: richardson > >>>? ? ? ? ?Richardson: damping factor=1. > >>>? ? ? ?maximum iterations=2 > >>>? ? ? ?tolerances:? relative=1e-05, absolute=1e-50, > divergence=10000. > >>>? ? ? ?left preconditioning > >>>? ? ? ?using nonzero initial guess > >>>? ? ? ?using NONE norm type for convergence test > >>>? ? ?PC Object:? ? (mg_levels_2_)? ? ?8 MPI processes > >>>? ? ? ?type: sor > >>>? ? ? ? ?SOR: type = local_symmetric, iterations = 1, local > iterations = > >>> 1, omega = 1. > >>>? ? ? ?linear system matrix = precond matrix: > >>>? ? ? ?Mat Object:? ? ? ?8 MPI processes > >>>? ? ? ? ?type: mpiaij > >>>? ? ? ? ?rows=1745224, cols=1745224 > >>>? ? ? ? ?total: nonzeros=99452608, allocated nonzeros=99452608 > >>>? ? ? ? ?total number of mallocs used during MatSetValues calls =0 > >>>? ? ? ? ? ?using I-node (on process 0) routines: found 126690 > nodes, > >>> limit used is 5 > >>>? ?Up solver (post-smoother) same as down solver (pre-smoother) > >>>? ?linear system matrix = precond matrix: > >>>? ?Mat Object:? ?8 MPI processes > >>>? ? ?type: mpiaij > >>>? ? ?rows=1745224, cols=1745224 > >>>? ? ?total: nonzeros=99452608, allocated nonzeros=99452608 > >>>? ? ?total number of mallocs used during MatSetValues calls =0 > >>>? ? ? ?using I-node (on process 0) routines: found 126690 > nodes, limit > >>> used is 5 > >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From zakaryah at gmail.com Thu Nov 9 15:33:38 2017 From: zakaryah at gmail.com (zakaryah .) Date: Thu, 9 Nov 2017 16:33:38 -0500 Subject: [petsc-users] Newton LS - better results on single processor In-Reply-To: References: Message-ID: Hi Stefano - when I referred to the iterations, I was trying to point out that my method solves a series of nonlinear systems, with the solution to the first problem being used to initialize the state vector for the second problem, etc. The reason I mentioned that was I thought perhaps I can expect the residuals from single process solve to differ from the residuals from multiprocess solve by a very small amount, say machine precision or the tolerance of the KSP/SNES, that would be fine normally. But, if there is a possibility that those differences are somehow amplified by each of the iterations (solution->initial state), that could explain what I see. I agree that it is more likely that I have a bug in my code but I'm having trouble finding it. I ran a small problem with -pc_type redundant -redundant_pc_type lu, as you suggested. What I think is the relevant portion of the output is here (i.e. there are small differences in the KSP residuals and SNES residuals): -n 1, first "iteration" as described above: 0 SNES Function norm 6.053565720454e-02 0 KSP Residual norm 4.883115701982e-05 0 KSP preconditioned resid norm 4.883115701982e-05 true resid norm 6.053565720454e-02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 8.173640409069e-20 1 KSP preconditioned resid norm 8.173640409069e-20 true resid norm 1.742143029296e-16 ||r(i)||/||b|| 2.877879104227e-15 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: Using full step: fnorm 6.053565720454e-02 gnorm 2.735518570862e-07 1 SNES Function norm 2.735518570862e-07 0 KSP Residual norm 1.298536630766e-10 0 KSP preconditioned resid norm 1.298536630766e-10 true resid norm 2.735518570862e-07 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 2.152782096751e-25 1 KSP preconditioned resid norm 2.152782096751e-25 true resid norm 4.755555202641e-22 ||r(i)||/||b|| 1.738447420279e-15 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: Using full step: fnorm 2.735518570862e-07 gnorm 1.917989238989e-17 2 SNES Function norm 1.917989238989e-17 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 -n 2, first "iteration" as described above: 0 SNES Function norm 6.053565720454e-02 0 KSP Residual norm 4.883115701982e-05 0 KSP preconditioned resid norm 4.883115701982e-05 true resid norm 6.053565720454e-02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.007084240718e-19 1 KSP preconditioned resid norm 1.007084240718e-19 true resid norm 1.868472589717e-16 ||r(i)||/||b|| 3.086565300520e-15 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: Using full step: fnorm 6.053565720454e-02 gnorm 2.735518570379e-07 1 SNES Function norm 2.735518570379e-07 0 KSP Residual norm 1.298536630342e-10 0 KSP preconditioned resid norm 1.298536630342e-10 true resid norm 2.735518570379e-07 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.885083482938e-25 1 KSP preconditioned resid norm 1.885083482938e-25 true resid norm 4.735707460766e-22 ||r(i)||/||b|| 1.731191852267e-15 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: Using full step: fnorm 2.735518570379e-07 gnorm 1.851472273258e-17 2 SNES Function norm 1.851472273258e-17 -n 1, final "iteration": 0 SNES Function norm 9.695669610792e+01 0 KSP Residual norm 7.898912593878e-03 0 KSP preconditioned resid norm 7.898912593878e-03 true resid norm 9.695669610792e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.720960785852e-17 1 KSP preconditioned resid norm 1.720960785852e-17 true resid norm 1.237111121391e-13 ||r(i)||/||b|| 1.275941911237e-15 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: Using full step: fnorm 9.695669610792e+01 gnorm 1.026572731653e-01 1 SNES Function norm 1.026572731653e-01 0 KSP Residual norm 1.382450412926e-04 0 KSP preconditioned resid norm 1.382450412926e-04 true resid norm 1.026572731653e-01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 5.018078565710e-20 1 KSP preconditioned resid norm 5.018078565710e-20 true resid norm 9.031463071676e-17 ||r(i)||/||b|| 8.797684560673e-16 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: Using full step: fnorm 1.026572731653e-01 gnorm 7.982937980399e-06 2 SNES Function norm 7.982937980399e-06 0 KSP Residual norm 4.223898196692e-08 0 KSP preconditioned resid norm 4.223898196692e-08 true resid norm 7.982937980399e-06 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.038123933240e-22 1 KSP preconditioned resid norm 1.038123933240e-22 true resid norm 3.213931469966e-20 ||r(i)||/||b|| 4.026000800530e-15 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: Using full step: fnorm 7.982937980399e-06 gnorm 9.776066323463e-13 3 SNES Function norm 9.776066323463e-13 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3 -n 2, final "iteration": 0 SNES Function norm 9.695669610792e+01 0 KSP Residual norm 7.898912593878e-03 0 KSP preconditioned resid norm 7.898912593878e-03 true resid norm 9.695669610792e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.752819851736e-17 1 KSP preconditioned resid norm 1.752819851736e-17 true resid norm 1.017605437996e-13 ||r(i)||/||b|| 1.049546322064e-15 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: Using full step: fnorm 9.695669610792e+01 gnorm 1.026572731655e-01 1 SNES Function norm 1.026572731655e-01 0 KSP Residual norm 1.382450412926e-04 0 KSP preconditioned resid norm 1.382450412926e-04 true resid norm 1.026572731655e-01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.701690118486e-19 1 KSP preconditioned resid norm 1.701690118486e-19 true resid norm 9.077679331860e-17 ||r(i)||/||b|| 8.842704517606e-16 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: Using full step: fnorm 1.026572731655e-01 gnorm 7.982937883350e-06 2 SNES Function norm 7.982937883350e-06 0 KSP Residual norm 4.223898196594e-08 0 KSP preconditioned resid norm 4.223898196594e-08 true resid norm 7.982937883350e-06 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.471638984554e-23 1 KSP preconditioned resid norm 1.471638984554e-23 true resid norm 2.483672977401e-20 ||r(i)||/||b|| 3.111226735938e-15 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: Using full step: fnorm 7.982937883350e-06 gnorm 1.019121417798e-12 3 SNES Function norm 1.019121417798e-12 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3 Of course these differences are still very small, but this is only true for such a small problem size. For a regular sized problem, the differences at the final iteration can exceed 1 and even 100 at a particular grid point (i.e. in a sense that doesn't scale with problem size). I also compared -n 1 and -n 2 with the -snes_monitor_solution -ksp_view_rhs -ksp_view_mat -ksp_view_solution options on a tiny problem (5x5x5), and I was not able to find any differences in the Jacobian or the vectors, but I'm suspicious that this could be due to the output format, because even for the tiny problem there are non-trivial differences in the residuals of both the SNES and the KSP. In all cases, the differences in the residuals are localized to the boundary between parts of the displacement vector owned by the two processes. The SNES residual with -n 2 typically looks discontinuous across that boundary. On Thu, Nov 9, 2017 at 11:16 AM, zakaryah . wrote: > Thanks Stefano, I will try what you suggest. > > ?Matt - my DM is a composite between the redundant field (loading > coefficient, which is included in the Newton solve in Riks' method) and the > displacements, which are represented by a 3D DA with 3 dof. I am using > finite difference. > > Probably my problem comes from confusion over how the composite DM is > organized. I am using FormFunction()?, and within that I call > DMCompositeGetLocalVectors(), DMCompositeScatter(), DMDAVecGetArray(), and > for the Jacobian, DMCompositeGetLocalISs() and MatGetLocalSubmatrix() to > split J into Jbb, Jbh, Jhb, and Jhh, where b is the loading coefficient, > and h is the displacements). The values of each submatrix are set using > MatSetValuesLocal(). > > ?I'm most suspicious of the part of the Jacobian routine where I calculate > the rows of Jhb, the columns of Jbh, and the corresponding values. I take > the DA coordinates and ix,iy,iz, then calculate the row of Jhb as ((((iz-info->gzs)*info->gym > + (iy-info->gys))*info->gxm + (ix-info->gxs))*info->dof+c), where info is > the DA local info and c is the degree of freedom. The same calculation is > performed for the column of Jbh. I suspect that the indexing of the DA > vector is not so simple, but I don't know for a fact that I'm doing this > incorrectly nor how to do this properly. > > ?Thanks for all the help!? > > > On Nov 9, 2017 8:44 AM, "Matthew Knepley" wrote: > > On Thu, Nov 9, 2017 at 12:14 AM, zakaryah . wrote: > >> Well the saga of my problem continues. As I described previously in an >> epic thread, I'm using the SNES to solve problems involving an elastic >> material on a rectangular grid, subjected to external forces. In any case, >> I'm occasionally getting poor convergence using Newton's method with line >> search. In troubleshooting by visualizing the residual, I saw that in data >> sets which had good convergence, the residual was nevertheless >> significantly larger along the boundary between different processors. >> Likewise, in data sets with poor convergence, the residual became very >> large on the boundary between different processors. The residual is not >> significantly larger on the physical boundary, i.e. the global boundary. >> When I run on a single process, convergence seems to be good on all data >> sets. >> >> Any clues to fix this? >> > > It sounds like something is wrong with communication across domains: > > - If this is FEM, it sounds like you are not adding contributions from > the other domain to shared vertices/edges/faces > > - If this is FDM/FVM, maybe the ghosts are not updated > > What DM are you using? Are you using the Local assembly functions > (FormFunctionLocal), or just FormFunction()? > > Thanks, > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Nov 9 22:09:08 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 10 Nov 2017 04:09:08 +0000 Subject: [petsc-users] Newton methods that converge all the time In-Reply-To: References: Message-ID: <128FB4CE-01C0-4E1C-A423-01AA215ACB87@mcs.anl.gov> Henrik, Please describe in some detail how you are handling phase change. If you have if () tests of any sort in your FormFunction() or FormJacobian() this can kill Newton's method. If you are using "variable switching" this WILL kill Newtons' method. Are you monkeying with phase definitions in TSPostStep or with SNESLineSearchSetPostCheck(). This will also kill Newton's method. Barry > On Nov 7, 2017, at 3:19 AM, Buesing, Henrik wrote: > > Dear all, > > I am solving a system of nonlinear, transient PDEs. I am using Newton?s method in every time step to solve the nonlinear algebraic equations. Of course, Newton?s method only converges if the initial guess is sufficiently close to the solution. > > This is often not the case and Newton?s method diverges. Then, I reduce the time step and try again. This can become prohibitively costly, if the time steps get very small. I am thus looking for variants of Newton?s method, which have a bigger convergence radius or ideally converge all the time. > > I tried out the pseudo-timestepping described in http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex1f.F.html. > > However, this does converge even worse. I am seeing breakdown when I have phase changes (e.g. liquid to two-phase). > > I was under the impression that pseudo-timestepping should converge better. Thus, my question: > > Am I doing something wrong or is it possible that Newton?s method converges and pseudo-timestepping does not? > > Thank you for any insight on this. > > Henrik > > > > > -- > Dipl.-Math. Henrik B?sing > Institute for Applied Geophysics and Geothermal Energy > E.ON Energy Research Center > RWTH Aachen University > ------------------------------------------------------ > Mathieustr. 10 | Tel +49 (0)241 80 49907 > 52074 Aachen, Germany | Fax +49 (0)241 80 49889 > ------------------------------------------------------ > http://www.eonerc.rwth-aachen.de/GGE > hbuesing at eonerc.rwth-aachen.de > ------------------------------------------------------ From bsmith at mcs.anl.gov Thu Nov 9 22:09:08 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 10 Nov 2017 04:09:08 +0000 Subject: [petsc-users] Help with PETSc signal handling In-Reply-To: <3357209.g0sAtxkE7J@moose> References: <1536133.zgCtiWFWTP@moose> <3357209.g0sAtxkE7J@moose> Message-ID: <7F04AB5F-971F-4CC7-948D-D3D530244291@mcs.anl.gov> > On Nov 7, 2017, at 6:47 AM, Gard Spreemann wrote: > > On Tuesday 7 November 2017 07:35:36 CET Mark Adams wrote: >> PETSc's signal handler is for segvs, etc. I don't know the details but I >> don't think we care about external signals. > > I see. I'll sketch what I'm trying to achieve in case someone can > think of another approach. > > I have some long-running SLEPc eigenvalue computations, and I'd like > to have SLURM signal my program that its time limit is drawing > near. In that case, my problem would set a flag and before the next > iteration of the SLEPc eigenvalue solver it would give up and save the > eigenvalues it has so far managed to obtain. > > The only workaround I can think of would be to let my program keep > track of its time limit on its own and check it at each iteration. Hmm, I would be more inclined to go with a model that avoided signals. Signals are best avoided, unless absolutely necessary, plus you need to have some external script/program to send in the signal requiring added complexity to your work flow. Why not have an input option of the time limit for the run and each "iteration" or whatever (maybe in a post step function) check if you getting close to the time-limit and then do what you need. If each "iteration" takes a long time and you are afraid the program will end before it checks the next time at the end of an iteration you could track record the time of each iteration so your monitor routine knows how long a timestep takes and can estimate if another timestep is possible within the current amount of time you have available and if there is not enough time triggers the saving. Barry > > There is no intention for PETSc to have handling of user-defined > signals? > > > Thanks. > > -- Gard > > From bsmith at mcs.anl.gov Thu Nov 9 22:09:09 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 10 Nov 2017 04:09:09 +0000 Subject: [petsc-users] Coloring of a finite volume unstructured mesh In-Reply-To: References: <2d31-5a01ce80-f-394b2b00@130530600> Message-ID: <6A33718C-48C3-4064-B8CD-77CA370B9DE1@mcs.anl.gov> To use the PETSc coloring based Jacobian computer (which uses finite differences) you absolutely have to be able to provide the nonzero structure of the Jacobian. Now once you provide the nonzero structure of the Jacobian the PETSc MatColoring routines can actually compute the coloring for you. So in other words you need not worry about the coloring at all, you just need to worry about providing the nonzero structure. Since you are using a finite volume method presumably all your coupling is between faces? In this case explicitly computing the nonzero structure of the Jacobian is probably pretty straightforward and you should just do it. Barry > On Nov 7, 2017, at 10:13 AM, Matthew Knepley wrote: > > On Tue, Nov 7, 2017 at 10:18 AM, SIERRA-AUSIN Javier wrote: > Hi, > > I would like to ask you concerning the computation of the Jacobian matrix via finite difference and coloring of the connectivity graph. > I wonder whether it is possible or not to color the Jacobian matrix of a given solver that evaluates the RHS with its associated connectivity in the global indeces of my solver (not PETSc). > As well, if it is possible to do this from an already partioned domain in parallel. > All of this is better explained in this post : https://scicomp.stackexchange.com/questions/28209/linking-petsc-with-an-already-parallel-in-house-finite-volume-solver > > The simplest thing you can do is to use the finite-difference Jacobian action (MatMFFD). This is setup automatically by SNES > if you give a FormFunction pointer, but no FormJacobian routine. Just tell the PETSc Vecs to use your ParMetis layout (by > setting the local sizes), and it should run fine in SNES. > > However, usually you need some kind of preconditioning. Thus you either have to form the Jacobian or some approximation. If > you cannot form an approximation, then you can use coloring. Once option is to create a DMPlex with your mesh information. > This can be done in parallel after you have already partitioned with ParMetis (as long as you know the "overlap" of vertices, or > adjacency of cells). Then the coloring can be done automatically using that DM information. Otherwise, you will have to supply > a coloring to the SNES. > > Thanks, > > Matt > > Thanks in advance, > > Javier. > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From bsmith at mcs.anl.gov Thu Nov 9 22:09:10 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 10 Nov 2017 04:09:10 +0000 Subject: [petsc-users] Newton LS - better results on single processor In-Reply-To: References: Message-ID: <6A14B327-7D3E-4A0E-81D1-0595DEC2B099@mcs.anl.gov> > On Nov 9, 2017, at 3:33 PM, zakaryah . wrote: > > Hi Stefano - when I referred to the iterations, I was trying to point out that my method solves a series of nonlinear systems, with the solution to the first problem being used to initialize the state vector for the second problem, etc. The reason I mentioned that was I thought perhaps I can expect the residuals from single process solve to differ from the residuals from multiprocess solve by a very small amount, say machine precision or the tolerance of the KSP/SNES, that would be fine normally. But, if there is a possibility that those differences are somehow amplified by each of the iterations (solution->initial state), that could explain what I see. > > I agree that it is more likely that I have a bug in my code but I'm having trouble finding it. Run a tiny problem on one and two processes with LU linear solver and the same mesh. So in the first case all values live on the first process and in the second the same first half live on one process and the second half on the second process. Now track the values in the actual vectors and matrices. For example you can just put in VecView() and MatView() on all objects you pass into the solver and then put them in the SNESComputeFunction/Jacobian routines. Print both the vectors inputed to these routines and the vectors/matrices created in the routines. The output differences from the two runs should be small, determine when they significantly vary. This will tell you the likely location of the bug in your source code. (For example if certain values of the Jacobian differ) Good luck, I've done this plenty of times and if it is a "parallelization" bug this will help you find it much faster than guessing where the problem is and trying code inspect to find the bug. Barry > > I ran a small problem with -pc_type redundant -redundant_pc_type lu, as you suggested. What I think is the relevant portion of the output is here (i.e. there are small differences in the KSP residuals and SNES residuals): > > -n 1, first "iteration" as described above: > > 0 SNES Function norm 6.053565720454e-02 > 0 KSP Residual norm 4.883115701982e-05 > > 0 KSP preconditioned resid norm 4.883115701982e-05 true resid norm 6.053565720454e-02 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP Residual norm 8.173640409069e-20 > > 1 KSP preconditioned resid norm 8.173640409069e-20 true resid norm 1.742143029296e-16 ||r(i)||/||b|| 2.877879104227e-15 > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > Line search: Using full step: fnorm 6.053565720454e-02 gnorm 2.735518570862e-07 > > 1 SNES Function norm 2.735518570862e-07 > > 0 KSP Residual norm 1.298536630766e-10 > > 0 KSP preconditioned resid norm 1.298536630766e-10 true resid norm 2.735518570862e-07 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP Residual norm 2.152782096751e-25 > > 1 KSP preconditioned resid norm 2.152782096751e-25 true resid norm 4.755555202641e-22 ||r(i)||/||b|| 1.738447420279e-15 > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > Line search: Using full step: fnorm 2.735518570862e-07 gnorm 1.917989238989e-17 > > 2 SNES Function norm 1.917989238989e-17 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 > > > > -n 2, first "iteration" as described above: > > 0 SNES Function norm 6.053565720454e-02 > > 0 KSP Residual norm 4.883115701982e-05 > > 0 KSP preconditioned resid norm 4.883115701982e-05 true resid norm 6.053565720454e-02 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP Residual norm 1.007084240718e-19 > > 1 KSP preconditioned resid norm 1.007084240718e-19 true resid norm 1.868472589717e-16 ||r(i)||/||b|| 3.086565300520e-15 > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > Line search: Using full step: fnorm 6.053565720454e-02 gnorm 2.735518570379e-07 > > 1 SNES Function norm 2.735518570379e-07 > > 0 KSP Residual norm 1.298536630342e-10 > > 0 KSP preconditioned resid norm 1.298536630342e-10 true resid norm 2.735518570379e-07 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP Residual norm 1.885083482938e-25 > > 1 KSP preconditioned resid norm 1.885083482938e-25 true resid norm 4.735707460766e-22 ||r(i)||/||b|| 1.731191852267e-15 > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > Line search: Using full step: fnorm 2.735518570379e-07 gnorm 1.851472273258e-17 > > > 2 SNES Function norm 1.851472273258e-17 > > > -n 1, final "iteration": > 0 SNES Function norm 9.695669610792e+01 > > 0 KSP Residual norm 7.898912593878e-03 > > 0 KSP preconditioned resid norm 7.898912593878e-03 true resid norm 9.695669610792e+01 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP Residual norm 1.720960785852e-17 > > 1 KSP preconditioned resid norm 1.720960785852e-17 true resid norm 1.237111121391e-13 ||r(i)||/||b|| 1.275941911237e-15 > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > Line search: Using full step: fnorm 9.695669610792e+01 gnorm 1.026572731653e-01 > > 1 SNES Function norm 1.026572731653e-01 > > 0 KSP Residual norm 1.382450412926e-04 > > 0 KSP preconditioned resid norm 1.382450412926e-04 true resid norm 1.026572731653e-01 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP Residual norm 5.018078565710e-20 > > 1 KSP preconditioned resid norm 5.018078565710e-20 true resid norm 9.031463071676e-17 ||r(i)||/||b|| 8.797684560673e-16 > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > Line search: Using full step: fnorm 1.026572731653e-01 gnorm 7.982937980399e-06 > > 2 SNES Function norm 7.982937980399e-06 > > 0 KSP Residual norm 4.223898196692e-08 > > 0 KSP preconditioned resid norm 4.223898196692e-08 true resid norm 7.982937980399e-06 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP Residual norm 1.038123933240e-22 > > 1 KSP preconditioned resid norm 1.038123933240e-22 true resid norm 3.213931469966e-20 ||r(i)||/||b|| 4.026000800530e-15 > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > Line search: Using full step: fnorm 7.982937980399e-06 gnorm 9.776066323463e-13 > > 3 SNES Function norm 9.776066323463e-13 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3 > > -n 2, final "iteration": > > 0 SNES Function norm 9.695669610792e+01 > > 0 KSP Residual norm 7.898912593878e-03 > > 0 KSP preconditioned resid norm 7.898912593878e-03 true resid norm 9.695669610792e+01 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP Residual norm 1.752819851736e-17 > > 1 KSP preconditioned resid norm 1.752819851736e-17 true resid norm 1.017605437996e-13 ||r(i)||/||b|| 1.049546322064e-15 > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > Line search: Using full step: fnorm 9.695669610792e+01 gnorm 1.026572731655e-01 > > 1 SNES Function norm 1.026572731655e-01 > > 0 KSP Residual norm 1.382450412926e-04 > > 0 KSP preconditioned resid norm 1.382450412926e-04 true resid norm 1.026572731655e-01 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP Residual norm 1.701690118486e-19 > > 1 KSP preconditioned resid norm 1.701690118486e-19 true resid norm 9.077679331860e-17 ||r(i)||/||b|| 8.842704517606e-16 > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > Line search: Using full step: fnorm 1.026572731655e-01 gnorm 7.982937883350e-06 > > 2 SNES Function norm 7.982937883350e-06 > > 0 KSP Residual norm 4.223898196594e-08 > > 0 KSP preconditioned resid norm 4.223898196594e-08 true resid norm 7.982937883350e-06 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP Residual norm 1.471638984554e-23 > > 1 KSP preconditioned resid norm 1.471638984554e-23 true resid norm 2.483672977401e-20 ||r(i)||/||b|| 3.111226735938e-15 > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > Line search: Using full step: fnorm 7.982937883350e-06 gnorm 1.019121417798e-12 > > 3 SNES Function norm 1.019121417798e-12 > > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3 > > > > Of course these differences are still very small, but this is only true for such a small problem size. For a regular sized problem, the differences at the final iteration can exceed 1 and even 100 at a particular grid point (i.e. in a sense that doesn't scale with problem size). > > I also compared -n 1 and -n 2 with the -snes_monitor_solution -ksp_view_rhs -ksp_view_mat -ksp_view_solution options on a tiny problem (5x5x5), and I was not able to find any differences in the Jacobian or the vectors, but I'm suspicious that this could be due to the output format, because even for the tiny problem there are non-trivial differences in the residuals of both the SNES and the KSP. > > In all cases, the differences in the residuals are localized to the boundary between parts of the displacement vector owned by the two processes. The SNES residual with -n 2 typically looks discontinuous across that boundary. > > > On Thu, Nov 9, 2017 at 11:16 AM, zakaryah . wrote: > Thanks Stefano, I will try what you suggest. > > ?Matt - my DM is a composite between the redundant field (loading coefficient, which is included in the Newton solve in Riks' method) and the displacements, which are represented by a 3D DA with 3 dof. I am using finite difference. > > Probably my problem comes from confusion over how the composite DM is organized. I am using FormFunction()?, and within that I call DMCompositeGetLocalVectors(), DMCompositeScatter(), DMDAVecGetArray(), and for the Jacobian, DMCompositeGetLocalISs() and MatGetLocalSubmatrix() to split J into Jbb, Jbh, Jhb, and Jhh, where b is the loading coefficient, and h is the displacements). The values of each submatrix are set using MatSetValuesLocal(). > > ?I'm most suspicious of the part of the Jacobian routine where I calculate the rows of Jhb, the columns of Jbh, and the corresponding values. I take the DA coordinates and ix,iy,iz, then calculate the row of Jhb as ((((iz-info->gzs)*info->gym + (iy-info->gys))*info->gxm + (ix-info->gxs))*info->dof+c), where info is the DA local info and c is the degree of freedom. The same calculation is performed for the column of Jbh. I suspect that the indexing of the DA vector is not so simple, but I don't know for a fact that I'm doing this incorrectly nor how to do this properly. > > ?Thanks for all the help!? > > > On Nov 9, 2017 8:44 AM, "Matthew Knepley" wrote: > On Thu, Nov 9, 2017 at 12:14 AM, zakaryah . wrote: > Well the saga of my problem continues. As I described previously in an epic thread, I'm using the SNES to solve problems involving an elastic material on a rectangular grid, subjected to external forces. In any case, I'm occasionally getting poor convergence using Newton's method with line search. In troubleshooting by visualizing the residual, I saw that in data sets which had good convergence, the residual was nevertheless significantly larger along the boundary between different processors. Likewise, in data sets with poor convergence, the residual became very large on the boundary between different processors. The residual is not significantly larger on the physical boundary, i.e. the global boundary. When I run on a single process, convergence seems to be good on all data sets. > > Any clues to fix this? > > It sounds like something is wrong with communication across domains: > > - If this is FEM, it sounds like you are not adding contributions from the other domain to shared vertices/edges/faces > > - If this is FDM/FVM, maybe the ghosts are not updated > > What DM are you using? Are you using the Local assembly functions (FormFunctionLocal), or just FormFunction()? > > Thanks, > > Matt > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > From jroman at dsic.upv.es Fri Nov 10 01:03:38 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 10 Nov 2017 08:03:38 +0100 Subject: [petsc-users] Help with PETSc signal handling In-Reply-To: <7F04AB5F-971F-4CC7-948D-D3D530244291@mcs.anl.gov> References: <1536133.zgCtiWFWTP@moose> <3357209.g0sAtxkE7J@moose> <7F04AB5F-971F-4CC7-948D-D3D530244291@mcs.anl.gov> Message-ID: <8563D4A7-C674-4A26-95E8-4377C9D32AA8@dsic.upv.es> This SLEPc example uses PetscTime() to exit EPSSolve() after a given elapsed time. http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex29.c.html Jose > El 10 nov 2017, a las 5:09, Smith, Barry F. escribi?: > > > >> On Nov 7, 2017, at 6:47 AM, Gard Spreemann wrote: >> >> On Tuesday 7 November 2017 07:35:36 CET Mark Adams wrote: >>> PETSc's signal handler is for segvs, etc. I don't know the details but I >>> don't think we care about external signals. >> >> I see. I'll sketch what I'm trying to achieve in case someone can >> think of another approach. >> >> I have some long-running SLEPc eigenvalue computations, and I'd like >> to have SLURM signal my program that its time limit is drawing >> near. In that case, my problem would set a flag and before the next >> iteration of the SLEPc eigenvalue solver it would give up and save the >> eigenvalues it has so far managed to obtain. >> >> The only workaround I can think of would be to let my program keep >> track of its time limit on its own and check it at each iteration. > > Hmm, I would be more inclined to go with a model that avoided signals. Signals are best avoided, unless absolutely necessary, plus you need to have some external script/program to send in the signal requiring added complexity to your work flow. > > Why not have an input option of the time limit for the run and each "iteration" or whatever (maybe in a post step function) check if you getting close to the time-limit and then do what you need. If each "iteration" takes a long time and you are afraid the program will end before it checks the next time at the end of an iteration you could track record the time of each iteration so your monitor routine knows how long a timestep takes and can estimate if another timestep is possible within the current amount of time you have available and if there is not enough time triggers the saving. > > Barry > > > >> >> There is no intention for PETSc to have handling of user-defined >> signals? >> >> >> Thanks. >> >> -- Gard >> >> > From yann.jobic at univ-amu.fr Fri Nov 10 04:13:19 2017 From: yann.jobic at univ-amu.fr (Yann JOBIC) Date: Fri, 10 Nov 2017 11:13:19 +0100 Subject: [petsc-users] DMPlexVecGetClosure for DM forest In-Reply-To: References: Message-ID: On 11/09/2017 07:22 PM, Matthew Knepley wrote: > On Thu, Nov 9, 2017 at 1:20 PM, Yann Jobic > wrote: > > Hello, > I'm trying to access to the values of a p4est forest. > I know how to do that by converting my forest to a DMPlex, and > then use DMPlexVecGetClosure over the converted DM. > However, i want to assign a label and access to the values of the > forest directly. > > > What do you mean by "directly". p4est only has topology. We use a > Section to map points to values, just like Plex. I feel stupid, but i don't know how to use a section to map points to values. In my code i use (from ex11.c and DMComputeL2GradientDiff_Plex) : ? ierr = DMGetDefaultSection(forest, §ion);CHKERRQ(ierr); ? ierr = DMForestGetCellChart(forest,&cStart,&cEnd);CHKERRQ(ierr); ? ierr = DMGetLocalVector(forest, &localX);CHKERRQ(ierr); ? ierr = DMGlobalToLocalBegin(forest, u, INSERT_VALUES, localX);CHKERRQ(ierr); ? ierr = DMGlobalToLocalEnd? (forest, u, INSERT_VALUES, localX);CHKERRQ(ierr); ? for (c = cStart; c < cEnd; c++) { ???? DMPlexVecGetClosure(forest, section, localX, c, NULL, &x); [...] And i would like to get "x" for a DM forest. I looked at dm/impls/forest/p4est/pforest.c, but it looks like quite difficult to import. I also search in vec/is/utils/vsectionis.c in order to find the correct section function, but i didn't catch how to use the right one. It looks so simple to use DMPlexVecGetClosure for DM Plex, getting into the code of DMPlexVecGetClosure is also kind of difficult, at my level i mean. Where can i find the correct way to do it ? Is there an example for what i want to do ? Thanks, Yann > > ?Thanks, > > ? ? Matt > > Is it possible ? > Thanks, > Yann > > > --- > L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e > par le logiciel antivirus Avast. > https://www.avast.com/antivirus > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -- ___________________________ Yann JOBIC HPC engineer Polytech Marseille DME IUSTI-CNRS UMR 6595 Technop?le de Ch?teau Gombert 5 rue Enrico Fermi 13453 Marseille cedex 13 Tel : (33) 4 91 10 69 39 ou (33) 4 91 10 69 43 Fax : (33) 4 91 10 69 69 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gspr at nonempty.org Fri Nov 10 05:47:07 2017 From: gspr at nonempty.org (Gard Spreemann) Date: Fri, 10 Nov 2017 12:47:07 +0100 Subject: [petsc-users] Help with PETSc signal handling In-Reply-To: <7F04AB5F-971F-4CC7-948D-D3D530244291@mcs.anl.gov> References: <1536133.zgCtiWFWTP@moose> <3357209.g0sAtxkE7J@moose> <7F04AB5F-971F-4CC7-948D-D3D530244291@mcs.anl.gov> Message-ID: <1589255.SO8Doa0k5r@moose> On Friday 10 November 2017 04:09:08 CET Smith, Barry F. wrote: > Why not have an input option of the time limit for the run and each "iteration" or whatever (maybe in a post step function) check if you getting close to the time-limit and then do what you need. If each "iteration" takes a long time and you are afraid the program will end before it checks the next time at the end of an iteration you could track record the time of each iteration so your monitor routine knows how long a timestep takes and can estimate if another timestep is possible within the current amount of time you have available and if there is not enough time triggers the saving. Yeah, this is my fallback idea. Thanks, and thanks also to Jose Roman for pointing out the example. -- Gard From florian.kauer at koalo.de Fri Nov 10 05:54:46 2017 From: florian.kauer at koalo.de (Florian Kauer) Date: Fri, 10 Nov 2017 12:54:46 +0100 Subject: [petsc-users] SNES without SNESSetJacobian, snes_fd or snes_mf In-Reply-To: <87fu9njwd4.fsf@jedbrown.org> References: <73fe939b-e1e1-67a0-5e1f-224902e975e7@koalo.de> <87fu9njwd4.fsf@jedbrown.org> Message-ID: <73dd3c63-a498-87d0-4d4e-2e96a9949891@koalo.de> Now everything makes sense, thank you! I actually had the problem that if I have an unconnected node, it is not taken into account when calculating without -snes_fd. So that is probably due to the coloring based Jacobian calculation. And the actual problem was that I did not completely follow Shri's advise from the "Extra Variable in DMCircuit" thread from 2014 to actually connect this node to the network.... With -snes_fd it was just working anyway so I did not care until I looked at the code again yesterday. So, when connecting the node everything runs smoothly :-) On 09.11.2017 16:38, Jed Brown wrote: > Florian Kauer writes: > >> Hi, >> what is the SNES solver actually doing when you do not provide a >> jacobian and not explicitly select either finite differencing >> approximation or matrix-free Newton-Krylov method? >> >> I just noticed that I mistakenly did this, but a good solution is found >> anyway (and fast). So what is actually happening? Simple fixed-point >> iteration? > > Presumably you are using a DM that can provide a coloring, in which case > the sparse Jacobian is assembled using finite differencing with coloring. > From mfadams at lbl.gov Fri Nov 10 07:58:04 2017 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Nov 2017 08:58:04 -0500 Subject: [petsc-users] GAMG advice In-Reply-To: <2e817bc2-132d-87c6-c521-43149ce3cdcc@dim.uchile.cl> References: <47a47b6b-ce8c-10f6-0ded-bf87e9af1bbd@dim.uchile.cl> <991cd7c4-bb92-ed2c-193d-7232c1ff6199@dim.uchile.cl> <6169118C-34FE-491C-BCB4-A86BECCFBAA9@mcs.anl.gov> <0779aa51-17c8-0ef3-fd01-1413ee1225ea@dim.uchile.cl> <2e817bc2-132d-87c6-c521-43149ce3cdcc@dim.uchile.cl> Message-ID: On Thu, Nov 9, 2017 at 2:19 PM, David Nolte wrote: > Hi Mark, > > thanks for clarifying. > When I wrote the initial question I had somehow overlooked the fact that > the GAMG standard smoother was Chebychev while ML uses SOR. All the other > comments concerning threshold etc were based on this mistake. > > The following settings work quite well, of course LU is used on the coarse > level. > > -pc_type gamg > -pc_gamg_type agg > -pc_gamg_threshold 0.03 > -pc_gamg_square_graph 10 # no effect ? > -pc_gamg_sym_graph > -mg_levels_ksp_type richardson > -mg_levels_pc_type sor > > -pc_gamg_agg_nsmooths 0 does not seem to improve the convergence. > Looks reasonable. And this smoothing is good for elliptic operators convergence but it makes the operator more expensive. It's worth doing for elliptic operators but in my experience not for others. If you convergence rate does not change then you probably want -pc_gamg_agg_nsmooths 0. This is a cheaper (if smoothing does not help convergence a lot), simpler method and want to use it. > > The ksp view now looks like this: (does this seem reasonable?) > > > KSP Object: 4 MPI processes > type: fgmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10000 > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > right preconditioning > using nonzero initial guess > using UNPRECONDITIONED norm type for convergence test > PC Object: 4 MPI processes > type: gamg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > GAMG specific options > Threshold for dropping small values from graph 0.03 > AGG specific options > Symmetric graph true > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 4 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 4 MPI processes > type: bjacobi > block Jacobi: number of blocks = 4 > Local solve is same for all blocks, in the following KSP and PC > objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=38, cols=38 > package used to perform factorization: petsc > total: nonzeros=1444, allocated nonzeros=1444 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 8 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=38, cols=38 > total: nonzeros=1444, allocated nonzeros=1444 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 8 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=38, cols=38 > total: nonzeros=1444, allocated nonzeros=1444 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 8 nodes, limit used > is 5 > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 4 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 4 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=168, cols=168 > total: nonzeros=19874, allocated nonzeros=19874 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 17 nodes, limit used > is 5 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 4 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 4 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=3572, cols=3572 > total: nonzeros=963872, allocated nonzeros=963872 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 4 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 4 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=21179, cols=21179 > total: nonzeros=1060605, allocated nonzeros=1060605 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 4 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 4 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=1745224, cols=1745224 > total: nonzeros=99452608, allocated nonzeros=99452608 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 254433 nodes, limit > used is 5 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=1745224, cols=1745224 > total: nonzeros=99452608, allocated nonzeros=99452608 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 254433 nodes, limit used > is 5 > > > > > Thanks, David > > > > > On 11/08/2017 10:11 PM, Mark Adams wrote: > > > > On Wed, Nov 1, 2017 at 5:45 PM, David Nolte wrote: > >> Thanks Barry. >> By simply replacing chebychev by richardson I get similar performance >> with GAMG and ML > > > That too (I assumed you were using the same, I could not see cheby in your > view data). > > I guess SOR works for the coarse grid solver because the coarse grid is > small. It should help using lu. > > >> (GAMG even slightly faster): >> > > This is "random" fluctuations. > > >> >> -pc_type >> gamg >> >> >> >> -pc_gamg_type >> agg >> >> >> >> -pc_gamg_threshold >> 0.03 >> >> >> >> -pc_gamg_square_graph 10 >> -pc_gamg_sym_graph >> -mg_levels_ksp_type >> richardson >> >> >> >> -mg_levels_pc_type sor >> >> Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix >> is asymmetric? > > > yes, > > >> For serial runs it doesn't seem to matter, > > > yes, > > >> but in >> parallel the PC setup hangs (after calls of >> PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set. >> > > yep, > > >> >> David >> >> >> On 10/21/2017 12:10 AM, Barry Smith wrote: >> > David, >> > >> > GAMG picks the number of levels based on how the coarsening process >> etc proceeds. You cannot hardwire it to a particular value. You can run >> with -info to get more info potentially on the decisions GAMG is making. >> > >> > Barry >> > >> >> On Oct 20, 2017, at 2:06 PM, David Nolte wrote: >> >> >> >> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 >> option >> >> was not taken into account: >> >> type: gamg >> >> MG: type is MULTIPLICATIVE, levels=1 cycles=v >> >> >> >> >> >> >> >> On 10/20/2017 03:32 PM, David Nolte wrote: >> >>> Dear all, >> >>> >> >>> I have some problems using GAMG as a preconditioner for (F)GMRES. >> >>> Background: I am solving the incompressible, unsteady Navier-Stokes >> >>> equations with a coupled mixed FEM approach, using P1/P1 elements for >> >>> velocity and pressure on an unstructured tetrahedron mesh with about >> >>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, >> >>> hence, no zeros on the diagonal of the pressure block. Time >> >>> discretization with semi-implicit backward Euler. The flow is a >> >>> convection dominated flow through a nozzle. >> >>> >> >>> So far, for this setup, I have been quite happy with a simple >> FGMRES/ML >> >>> solver for the full system (rather bruteforce, I admit, but much >> faster >> >>> than any block/Schur preconditioners I tried): >> >>> >> >>> -ksp_converged_reason >> >>> -ksp_monitor_true_residual >> >>> -ksp_type fgmres >> >>> -ksp_rtol 1.0e-6 >> >>> -ksp_initial_guess_nonzero >> >>> >> >>> -pc_type ml >> >>> -pc_ml_Threshold 0.03 >> >>> -pc_ml_maxNlevels 3 >> >>> >> >>> This setup converges in ~100 iterations (see below the ksp_view >> output) >> >>> to rtol: >> >>> >> >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm >> >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 >> >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm >> >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 >> >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm >> >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 >> >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm >> >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 >> >>> >> >>> >> >>> Now I'd like to try GAMG instead of ML. However, I don't know how to >> set >> >>> it up to get similar performance. >> >>> The obvious/naive >> >>> >> >>> -pc_type gamg >> >>> -pc_gamg_type agg >> >>> >> >>> # with and without >> >>> -pc_gamg_threshold 0.03 >> >>> -pc_mg_levels 3 >> >>> >> >>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per >> >>> proc), for instance: >> >>> np = 1: >> >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm >> >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 >> >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm >> >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 >> >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm >> >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 >> >>> >> >>> np = 8: >> >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm >> >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 >> >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm >> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 >> >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm >> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 >> >>> >> >>> A very high threshold seems to improve the GAMG PC, for instance with >> >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations. >> >>> What else should I try? >> >>> >> >>> I would very much appreciate any advice on configuring GAMG and >> >>> differences w.r.t ML to be taken into account (not a multigrid expert >> >>> though). >> >>> >> >>> Thanks, best wishes >> >>> David >> >>> >> >>> >> >>> ------ >> >>> ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels >> 3 >> >>> >> >>> KSP Object: 1 MPI processes >> >>> type: fgmres >> >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> >>> Orthogonalization with no iterative refinement >> >>> GMRES: happy breakdown tolerance 1e-30 >> >>> maximum iterations=10000 >> >>> tolerances: relative=1e-06, absolute=1e-50, divergence=10000. >> >>> right preconditioning >> >>> using nonzero initial guess >> >>> using UNPRECONDITIONED norm type for convergence test >> >>> PC Object: 1 MPI processes >> >>> type: gamg >> >>> MG: type is MULTIPLICATIVE, levels=1 cycles=v >> >>> Cycles per PCApply=1 >> >>> Using Galerkin computed coarse grid matrices >> >>> GAMG specific options >> >>> Threshold for dropping small values from graph 0.75 >> >>> AGG specific options >> >>> Symmetric graph false >> >>> Coarse grid solver -- level ------------------------------- >> >>> KSP Object: (mg_levels_0_) 1 MPI processes >> >>> type: preonly >> >>> maximum iterations=2, initial guess is zero >> >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> >>> left preconditioning >> >>> using NONE norm type for convergence test >> >>> PC Object: (mg_levels_0_) 1 MPI processes >> >>> type: sor >> >>> SOR: type = local_symmetric, iterations = 1, local iterations >> = >> >>> 1, omega = 1. >> >>> linear system matrix = precond matrix: >> >>> Mat Object: 1 MPI processes >> >>> type: seqaij >> >>> rows=1745224, cols=1745224 >> >>> total: nonzeros=99452608, allocated nonzeros=99452608 >> >>> total number of mallocs used during MatSetValues calls =0 >> >>> using I-node routines: found 1037847 nodes, limit used is 5 >> >>> linear system matrix = precond matrix: >> >>> Mat Object: 1 MPI processes >> >>> type: seqaij >> >>> rows=1745224, cols=1745224 >> >>> total: nonzeros=99452608, allocated nonzeros=99452608 >> >>> total number of mallocs used during MatSetValues calls =0 >> >>> using I-node routines: found 1037847 nodes, limit used is 5 >> >>> >> >>> >> >>> ------ >> >>> ksp_view for -pc_type ml: >> >>> >> >>> KSP Object: 8 MPI processes >> >>> type: fgmres >> >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> >>> Orthogonalization with no iterative refinement >> >>> GMRES: happy breakdown tolerance 1e-30 >> >>> maximum iterations=10000 >> >>> tolerances: relative=1e-06, absolute=1e-50, divergence=10000. >> >>> right preconditioning >> >>> using nonzero initial guess >> >>> using UNPRECONDITIONED norm type for convergence test >> >>> PC Object: 8 MPI processes >> >>> type: ml >> >>> MG: type is MULTIPLICATIVE, levels=3 cycles=v >> >>> Cycles per PCApply=1 >> >>> Using Galerkin computed coarse grid matrices >> >>> Coarse grid solver -- level ------------------------------- >> >>> KSP Object: (mg_coarse_) 8 MPI processes >> >>> type: preonly >> >>> maximum iterations=10000, initial guess is zero >> >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> >>> left preconditioning >> >>> using NONE norm type for convergence test >> >>> PC Object: (mg_coarse_) 8 MPI processes >> >>> type: redundant >> >>> Redundant preconditioner: First (color=0) of 8 PCs follows >> >>> KSP Object: (mg_coarse_redundant_) 1 MPI >> processes >> >>> type: preonly >> >>> maximum iterations=10000, initial guess is zero >> >>> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> >>> left preconditioning >> >>> using NONE norm type for convergence test >> >>> PC Object: (mg_coarse_redundant_) 1 MPI >> processes >> >>> type: lu >> >>> LU: out-of-place factorization >> >>> tolerance for zero pivot 2.22045e-14 >> >>> using diagonal shift on blocks to prevent zero pivot >> [INBLOCKS] >> >>> matrix ordering: nd >> >>> factor fill ratio given 5., needed 10.4795 >> >>> Factored matrix follows: >> >>> Mat Object: 1 MPI processes >> >>> type: seqaij >> >>> rows=6822, cols=6822 >> >>> package used to perform factorization: petsc >> >>> total: nonzeros=9575688, allocated nonzeros=9575688 >> >>> total number of mallocs used during MatSetValues >> calls =0 >> >>> not using I-node routines >> >>> linear system matrix = precond matrix: >> >>> Mat Object: 1 MPI processes >> >>> type: seqaij >> >>> rows=6822, cols=6822 >> >>> total: nonzeros=913758, allocated nonzeros=913758 >> >>> total number of mallocs used during MatSetValues calls =0 >> >>> not using I-node routines >> >>> linear system matrix = precond matrix: >> >>> Mat Object: 8 MPI processes >> >>> type: mpiaij >> >>> rows=6822, cols=6822 >> >>> total: nonzeros=913758, allocated nonzeros=913758 >> >>> total number of mallocs used during MatSetValues calls =0 >> >>> not using I-node (on process 0) routines >> >>> Down solver (pre-smoother) on level 1 ------------------------------ >> - >> >>> KSP Object: (mg_levels_1_) 8 MPI processes >> >>> type: richardson >> >>> Richardson: damping factor=1. >> >>> maximum iterations=2 >> >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> >>> left preconditioning >> >>> using nonzero initial guess >> >>> using NONE norm type for convergence test >> >>> PC Object: (mg_levels_1_) 8 MPI processes >> >>> type: sor >> >>> SOR: type = local_symmetric, iterations = 1, local iterations >> = >> >>> 1, omega = 1. >> >>> linear system matrix = precond matrix: >> >>> Mat Object: 8 MPI processes >> >>> type: mpiaij >> >>> rows=67087, cols=67087 >> >>> total: nonzeros=9722749, allocated nonzeros=9722749 >> >>> total number of mallocs used during MatSetValues calls =0 >> >>> not using I-node (on process 0) routines >> >>> Up solver (post-smoother) same as down solver (pre-smoother) >> >>> Down solver (pre-smoother) on level 2 ------------------------------ >> - >> >>> KSP Object: (mg_levels_2_) 8 MPI processes >> >>> type: richardson >> >>> Richardson: damping factor=1. >> >>> maximum iterations=2 >> >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> >>> left preconditioning >> >>> using nonzero initial guess >> >>> using NONE norm type for convergence test >> >>> PC Object: (mg_levels_2_) 8 MPI processes >> >>> type: sor >> >>> SOR: type = local_symmetric, iterations = 1, local iterations >> = >> >>> 1, omega = 1. >> >>> linear system matrix = precond matrix: >> >>> Mat Object: 8 MPI processes >> >>> type: mpiaij >> >>> rows=1745224, cols=1745224 >> >>> total: nonzeros=99452608, allocated nonzeros=99452608 >> >>> total number of mallocs used during MatSetValues calls =0 >> >>> using I-node (on process 0) routines: found 126690 nodes, >> >>> limit used is 5 >> >>> Up solver (post-smoother) same as down solver (pre-smoother) >> >>> linear system matrix = precond matrix: >> >>> Mat Object: 8 MPI processes >> >>> type: mpiaij >> >>> rows=1745224, cols=1745224 >> >>> total: nonzeros=99452608, allocated nonzeros=99452608 >> >>> total number of mallocs used during MatSetValues calls =0 >> >>> using I-node (on process 0) routines: found 126690 nodes, limit >> >>> used is 5 >> >>> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Nov 10 08:06:51 2017 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Nov 2017 09:06:51 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: On Thu, Nov 9, 2017 at 1:56 PM, Hong wrote: > Mark: > >> OK, well, just go with the Linux machine for the regression test. I will >> keep trying to reproduce this on my Mac with an O build. >> > > Valgrind error occurs on linux machines with g-build. I cannot merge this > branch to maint until the bug is fixed. > Valgrind is failing on this run on my Mac. Moving to cg, like you I suppose. This takes forever. This is what I have so far. Did you get this far? 07:48 hzhang/fix-submat_samerowdist *= /sandbox/adams/petsc/src/snes/examples/tutorials$ make PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 val /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 ==12414== Memcheck, a memory error detector ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info ==12415== Memcheck, a memory error detector ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 ==12415== ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 ==12414== [0] 27 global equations, 9 vertices [0] 27 equations in vector, 9 vertices Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 [0] 441 global equations, 147 vertices [0] 441 equations in vector, 147 vertices Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 [0] 4725 global equations, 1575 vertices [0] 4725 equations in vector, 1575 vertices -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Nov 10 08:57:25 2017 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Nov 2017 09:57:25 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: This printed a little funny in gmail, snes/ex56 is running clean in the first few loops (appended), but the last one is the one with a reduced processor set. Still waiting. This is with 32 bit integers. I'm running another with 64 bit integers. ... [0] 27 global equations, 9 vertices [0] 27 equations in vector, 9 vertices Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 [0] 441 global equations, 147 vertices [0] 441 equations in vector, 147 vertices Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 [0] 4725 global equations, 1575 vertices [0] 4725 equations in vector, 1575 vertices On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams wrote: > > > On Thu, Nov 9, 2017 at 1:56 PM, Hong wrote: > >> Mark: >> >>> OK, well, just go with the Linux machine for the regression test. I will >>> keep trying to reproduce this on my Mac with an O build. >>> >> >> Valgrind error occurs on linux machines with g-build. I cannot merge this >> branch to maint until the bug is fixed. >> > > Valgrind is failing on this run on my Mac. Moving to cg, like you I > suppose. This takes forever. This is what I have so far. Did you get this > far? > > 07:48 hzhang/fix-submat_samerowdist *= /sandbox/adams/petsc/src/snes/examples/tutorials$ > make PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 val > /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 valgrind ./ex56 > -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it > 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned > -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 > -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true > -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig > 0,0.05,0,1.05 -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type > parmetis -mat_block_size 3 -run_type 1 > ==12414== Memcheck, a memory error detector > ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. > ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright > info > ==12415== Memcheck, a memory error detector > ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. > ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright > info > ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 > -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type > unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg > -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason > -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type > chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 > -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi > -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 > ==12415== > ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 > -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type > unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg > -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason > -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type > chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 > -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi > -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 > ==12414== > [0] 27 global equations, 9 vertices > [0] 27 equations in vector, 9 vertices > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > [0] 441 global equations, 147 vertices > [0] 441 equations in vector, 147 vertices > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > [0] 4725 global equations, 1575 vertices > [0] 4725 equations in vector, 1575 vertices > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Fri Nov 10 09:38:09 2017 From: hzhang at mcs.anl.gov (Hong) Date: Fri, 10 Nov 2017 09:38:09 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Using petsc machine, I get hzhang at petsc /sandbox/hzhang/petsc/src/snes/examples/tutorials (hzhang/fix-submat_samerowdist) $ mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 ==28811== Memcheck, a memory error detector ==28811== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==28811== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info ==28811== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 ... ==28811== Invalid read of size 16 ==28811== at 0x8550946: dswap_k_NEHALEM (in /usr/lib/openblas-base/libblas.so.3) ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas.so.3) ==28811== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3.0) ==28811== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) ==28811== by 0x41056E: main (ex56.c:395) ==28811== Address 0xdc650d0 is 52,480 bytes inside a block of size 52,488 alloc'd ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) ==28811== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) ==28811== by 0x41056E: main (ex56.c:395) ==28811== ==28812== Invalid read of size 16 ==28812== at 0x8550946: dswap_k_NEHALEM (in /usr/lib/openblas-base/libblas.so.3) ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas.so.3) ==28812== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3.0) ==28812== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) ==28812== by 0x41056E: main (ex56.c:395) ==28812== Address 0xd9c7600 is 52,480 bytes inside a block of size 52,488 alloc'd ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) ==28812== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) ==28812== by 0x41056E: main (ex56.c:395) ==28812== ==28811== Invalid read of size 16 ==28811== at 0x8550A55: dswap_k_NEHALEM (in /usr/lib/openblas-base/libblas.so.3) ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas.so.3) ==28811== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3.0) ==28811== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) ==28811== by 0x41056E: main (ex56.c:395) ==28811== Address 0xd99cbe0 is 64 bytes inside a block of size 72 alloc'd ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) ==28811== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) ==28811== by 0x41056E: main (ex56.c:395) ==28811== ==28812== Invalid read of size 16 ==28812== at 0x8550A55: dswap_k_NEHALEM (in /usr/lib/openblas-base/libblas.so.3) ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas.so.3) ==28812== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3.0) ==28812== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) ==28812== by 0x41056E: main (ex56.c:395) ==28812== Address 0xdc11f30 is 64 bytes inside a block of size 72 alloc'd ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) ==28812== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) ==28812== by 0x41056E: main (ex56.c:395) ==28812== [0] 27 global equations, 9 vertices [0] 27 equations in vector, 9 vertices Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 [0] 441 global equations, 147 vertices [0] 441 equations in vector, 147 vertices hangs here ... Hong On Fri, Nov 10, 2017 at 8:57 AM, Mark Adams wrote: > This printed a little funny in gmail, snes/ex56 is running clean in the > first few loops (appended), but the last one is the one with a reduced > processor set. Still waiting. This is with 32 bit integers. I'm running > another with 64 bit integers. > > ... > [0] 27 global equations, 9 vertices > [0] 27 equations in vector, 9 vertices > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > [0] 441 global equations, 147 vertices > [0] 441 equations in vector, 147 vertices > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > [0] 4725 global equations, 1575 vertices > [0] 4725 equations in vector, 1575 vertices > > > > > > > > > On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams wrote: > >> >> >> On Thu, Nov 9, 2017 at 1:56 PM, Hong wrote: >> >>> Mark: >>> >>>> OK, well, just go with the Linux machine for the regression test. I >>>> will keep trying to reproduce this on my Mac with an O build. >>>> >>> >>> Valgrind error occurs on linux machines with g-build. I cannot merge >>> this branch to maint until the bug is fixed. >>> >> >> Valgrind is failing on this run on my Mac. Moving to cg, like you I >> suppose. This takes forever. This is what I have so far. Did you get this >> far? >> >> 07:48 hzhang/fix-submat_samerowdist *= /sandbox/adams/petsc/src/snes/examples/tutorials$ >> make PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 val >> /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 valgrind >> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi >> -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >> ==12414== Memcheck, a memory error detector >> ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. >> ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright >> info >> ==12415== Memcheck, a memory error detector >> ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. >> ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright >> info >> ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order >> 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 >> -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi >> -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >> ==12415== >> ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order >> 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 >> -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi >> -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >> ==12414== >> [0] 27 global equations, 9 vertices >> [0] 27 equations in vector, 9 vertices >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >> [0] 441 global equations, 147 vertices >> [0] 441 equations in vector, 147 vertices >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >> [0] 4725 global equations, 1575 vertices >> [0] 4725 equations in vector, 1575 vertices >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Nov 10 09:56:50 2017 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Nov 2017 10:56:50 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: This is comming from blas. How did you configure blas? On Fri, Nov 10, 2017 at 10:38 AM, Hong wrote: > Using petsc machine, I get > hzhang at petsc /sandbox/hzhang/petsc/src/snes/examples/tutorials > (hzhang/fix-submat_samerowdist) > $ mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 > -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol > 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg > -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason > -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type > chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 > -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi > -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 > > ==28811== Memcheck, a memory error detector > ==28811== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. > ==28811== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright > info > ==28811== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 > -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type > unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg > -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason > -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type > chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 > -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi > -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 > ... > ==28811== Invalid read of size 16 > ==28811== at 0x8550946: dswap_k_NEHALEM (in /usr/lib/openblas-base/ > libblas.so.3) > ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas.so.3) > ==28811== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3.0) > ==28811== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) > ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > ==28811== by 0x41056E: main (ex56.c:395) > ==28811== Address 0xdc650d0 is 52,480 bytes inside a block of size 52,488 > alloc'd > ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_ > memcheck-amd64-linux.so) > ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) > ==28811== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) > ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > ==28811== by 0x41056E: main (ex56.c:395) > ==28811== > ==28812== Invalid read of size 16 > ==28812== at 0x8550946: dswap_k_NEHALEM (in /usr/lib/openblas-base/ > libblas.so.3) > ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas.so.3) > ==28812== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3.0) > ==28812== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) > ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > ==28812== by 0x41056E: main (ex56.c:395) > ==28812== Address 0xd9c7600 is 52,480 bytes inside a block of size 52,488 > alloc'd > ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_ > memcheck-amd64-linux.so) > ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) > ==28812== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) > ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > ==28812== by 0x41056E: main (ex56.c:395) > ==28812== > ==28811== Invalid read of size 16 > ==28811== at 0x8550A55: dswap_k_NEHALEM (in /usr/lib/openblas-base/ > libblas.so.3) > ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas.so.3) > ==28811== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3.0) > ==28811== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) > ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > ==28811== by 0x41056E: main (ex56.c:395) > ==28811== Address 0xd99cbe0 is 64 bytes inside a block of size 72 alloc'd > ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_ > memcheck-amd64-linux.so) > ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) > ==28811== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) > ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > ==28811== by 0x41056E: main (ex56.c:395) > ==28811== > ==28812== Invalid read of size 16 > ==28812== at 0x8550A55: dswap_k_NEHALEM (in /usr/lib/openblas-base/ > libblas.so.3) > ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas.so.3) > ==28812== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3.0) > ==28812== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) > ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > ==28812== by 0x41056E: main (ex56.c:395) > ==28812== Address 0xdc11f30 is 64 bytes inside a block of size 72 alloc'd > ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_ > memcheck-amd64-linux.so) > ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) > ==28812== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) > ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > ==28812== by 0x41056E: main (ex56.c:395) > ==28812== > [0] 27 global equations, 9 vertices > [0] 27 equations in vector, 9 vertices > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > [0] 441 global equations, 147 vertices > [0] 441 equations in vector, 147 vertices > > hangs here ... > > Hong > > On Fri, Nov 10, 2017 at 8:57 AM, Mark Adams wrote: > >> This printed a little funny in gmail, snes/ex56 is running clean in the >> first few loops (appended), but the last one is the one with a reduced >> processor set. Still waiting. This is with 32 bit integers. I'm running >> another with 64 bit integers. >> >> ... >> [0] 27 global equations, 9 vertices >> [0] 27 equations in vector, 9 vertices >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >> [0] 441 global equations, 147 vertices >> [0] 441 equations in vector, 147 vertices >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >> [0] 4725 global equations, 1575 vertices >> [0] 4725 equations in vector, 1575 vertices >> >> >> >> >> >> >> >> >> On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams wrote: >> >>> >>> >>> On Thu, Nov 9, 2017 at 1:56 PM, Hong wrote: >>> >>>> Mark: >>>> >>>>> OK, well, just go with the Linux machine for the regression test. I >>>>> will keep trying to reproduce this on my Mac with an O build. >>>>> >>>> >>>> Valgrind error occurs on linux machines with g-build. I cannot merge >>>> this branch to maint until the bug is fixed. >>>> >>> >>> Valgrind is failing on this run on my Mac. Moving to cg, like you I >>> suppose. This takes forever. This is what I have so far. Did you get this >>> far? >>> >>> 07:48 hzhang/fix-submat_samerowdist *= /sandbox/adams/petsc/src/snes/examples/tutorials$ >>> make PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 val >>> /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 valgrind >>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi >>> -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>> ==12414== Memcheck, a memory error detector >>> ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. >>> ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright >>> info >>> ==12415== Memcheck, a memory error detector >>> ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. >>> ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright >>> info >>> ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order >>> 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 >>> -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi >>> -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>> ==12415== >>> ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order >>> 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 >>> -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi >>> -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>> ==12414== >>> [0] 27 global equations, 9 vertices >>> [0] 27 equations in vector, 9 vertices >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>> [0] 441 global equations, 147 vertices >>> [0] 441 equations in vector, 147 vertices >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>> [0] 4725 global equations, 1575 vertices >>> [0] 4725 equations in vector, 1575 vertices >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Nov 10 09:59:07 2017 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Nov 2017 10:59:07 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: This must be a configure issue. I don't see these warning: #!/usr/bin/python if __name__ == '__main__': import sys import os sys.path.insert(0, os.path.abspath('config')) import configure configure_options = [ '--with-cc=clang', '--with-cc++=clang++', '--download-mpich=1', '--download-metis=1', '--download-superlu=1', '--download-superlu_dist=1', '--download-parmetis=1', '--download-fblaslapack=1', '--download-p4est=1', '--with-debugging=1', '--with-batch=0', 'PETSC_ARCH=arch-linux2-c-dbg32', '--with-openmp=0', '--download-p4est=0' ] configure.petsc_configure(configure_options) ~ On Fri, Nov 10, 2017 at 10:56 AM, Mark Adams wrote: > This is comming from blas. How did you configure blas? > > On Fri, Nov 10, 2017 at 10:38 AM, Hong wrote: > >> Using petsc machine, I get >> hzhang at petsc /sandbox/hzhang/petsc/src/snes/examples/tutorials >> (hzhang/fix-submat_samerowdist) >> $ mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 >> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi >> -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >> >> ==28811== Memcheck, a memory error detector >> ==28811== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. >> ==28811== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright >> info >> ==28811== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order >> 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 >> -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi >> -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >> ... >> ==28811== Invalid read of size 16 >> ==28811== at 0x8550946: dswap_k_NEHALEM (in >> /usr/lib/openblas-base/libblas.so.3) >> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >> .so.3) >> ==28811== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3.0) >> ==28811== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >> ==28811== by 0x41056E: main (ex56.c:395) >> ==28811== Address 0xdc650d0 is 52,480 bytes inside a block of size >> 52,488 alloc'd >> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >> mcheck-amd64-linux.so) >> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >> ==28811== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >> ==28811== by 0x41056E: main (ex56.c:395) >> ==28811== >> ==28812== Invalid read of size 16 >> ==28812== at 0x8550946: dswap_k_NEHALEM (in >> /usr/lib/openblas-base/libblas.so.3) >> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >> .so.3) >> ==28812== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3.0) >> ==28812== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >> ==28812== by 0x41056E: main (ex56.c:395) >> ==28812== Address 0xd9c7600 is 52,480 bytes inside a block of size >> 52,488 alloc'd >> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >> mcheck-amd64-linux.so) >> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >> ==28812== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >> ==28812== by 0x41056E: main (ex56.c:395) >> ==28812== >> ==28811== Invalid read of size 16 >> ==28811== at 0x8550A55: dswap_k_NEHALEM (in >> /usr/lib/openblas-base/libblas.so.3) >> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >> .so.3) >> ==28811== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3.0) >> ==28811== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >> ==28811== by 0x41056E: main (ex56.c:395) >> ==28811== Address 0xd99cbe0 is 64 bytes inside a block of size 72 alloc'd >> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >> mcheck-amd64-linux.so) >> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >> ==28811== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >> ==28811== by 0x41056E: main (ex56.c:395) >> ==28811== >> ==28812== Invalid read of size 16 >> ==28812== at 0x8550A55: dswap_k_NEHALEM (in >> /usr/lib/openblas-base/libblas.so.3) >> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >> .so.3) >> ==28812== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3.0) >> ==28812== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >> ==28812== by 0x41056E: main (ex56.c:395) >> ==28812== Address 0xdc11f30 is 64 bytes inside a block of size 72 alloc'd >> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >> mcheck-amd64-linux.so) >> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >> ==28812== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >> ==28812== by 0x41056E: main (ex56.c:395) >> ==28812== >> [0] 27 global equations, 9 vertices >> [0] 27 equations in vector, 9 vertices >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >> [0] 441 global equations, 147 vertices >> [0] 441 equations in vector, 147 vertices >> >> hangs here ... >> >> Hong >> >> On Fri, Nov 10, 2017 at 8:57 AM, Mark Adams wrote: >> >>> This printed a little funny in gmail, snes/ex56 is running clean in the >>> first few loops (appended), but the last one is the one with a reduced >>> processor set. Still waiting. This is with 32 bit integers. I'm running >>> another with 64 bit integers. >>> >>> ... >>> [0] 27 global equations, 9 vertices >>> [0] 27 equations in vector, 9 vertices >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>> [0] 441 global equations, 147 vertices >>> [0] 441 equations in vector, 147 vertices >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>> [0] 4725 global equations, 1575 vertices >>> [0] 4725 equations in vector, 1575 vertices >>> >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams wrote: >>> >>>> >>>> >>>> On Thu, Nov 9, 2017 at 1:56 PM, Hong wrote: >>>> >>>>> Mark: >>>>> >>>>>> OK, well, just go with the Linux machine for the regression test. I >>>>>> will keep trying to reproduce this on my Mac with an O build. >>>>>> >>>>> >>>>> Valgrind error occurs on linux machines with g-build. I cannot merge >>>>> this branch to maint until the bug is fixed. >>>>> >>>> >>>> Valgrind is failing on this run on my Mac. Moving to cg, like you I >>>> suppose. This takes forever. This is what I have so far. Did you get this >>>> far? >>>> >>>> 07:48 hzhang/fix-submat_samerowdist *= /sandbox/adams/petsc/src/snes/examples/tutorials$ >>>> make PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 val >>>> /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 valgrind >>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>> ==12414== Memcheck, a memory error detector >>>> ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et >>>> al. >>>> ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright >>>> info >>>> ==12415== Memcheck, a memory error detector >>>> ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et >>>> al. >>>> ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright >>>> info >>>> ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>> ==12415== >>>> ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>> ==12414== >>>> [0] 27 global equations, 9 vertices >>>> [0] 27 equations in vector, 9 vertices >>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>> [0] 441 global equations, 147 vertices >>>> [0] 441 equations in vector, 147 vertices >>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>> [0] 4725 global equations, 1575 vertices >>>> [0] 4725 equations in vector, 1575 vertices >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Fri Nov 10 10:03:33 2017 From: hzhang at mcs.anl.gov (Hong) Date: Fri, 10 Nov 2017 10:03:33 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: I use Using configure Options: --download-metis --download-mpich --download-mumps --download-parmetis --download-scalapack --download-superlu --download-superlu_dist --download-suitesparse --download-hypre --download-ptscotch --download-chaco --with-ctable=1 --download-cmake --with-cc=gcc --with-cxx=g++ --with-debugging=1 --with-visibility=0 --with-fc=gfortran Hong On Fri, Nov 10, 2017 at 9:59 AM, Mark Adams wrote: > This must be a configure issue. I don't see these warning: > > #!/usr/bin/python > if __name__ == '__main__': > import sys > import os > sys.path.insert(0, os.path.abspath('config')) > import configure > configure_options = [ > '--with-cc=clang', > '--with-cc++=clang++', > '--download-mpich=1', > '--download-metis=1', > '--download-superlu=1', > '--download-superlu_dist=1', > '--download-parmetis=1', > '--download-fblaslapack=1', > '--download-p4est=1', > '--with-debugging=1', > '--with-batch=0', > 'PETSC_ARCH=arch-linux2-c-dbg32', > '--with-openmp=0', > '--download-p4est=0' > ] > configure.petsc_configure(configure_options) > > ~ > > > > > > On Fri, Nov 10, 2017 at 10:56 AM, Mark Adams wrote: > >> This is comming from blas. How did you configure blas? >> >> On Fri, Nov 10, 2017 at 10:38 AM, Hong wrote: >> >>> Using petsc machine, I get >>> hzhang at petsc /sandbox/hzhang/petsc/src/snes/examples/tutorials >>> (hzhang/fix-submat_samerowdist) >>> $ mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 >>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi >>> -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>> >>> ==28811== Memcheck, a memory error detector >>> ==28811== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. >>> ==28811== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright >>> info >>> ==28811== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order >>> 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 >>> -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type jacobi >>> -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>> ... >>> ==28811== Invalid read of size 16 >>> ==28811== at 0x8550946: dswap_k_NEHALEM (in >>> /usr/lib/openblas-base/libblas.so.3) >>> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>> .so.3) >>> ==28811== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3.0) >>> ==28811== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>> ==28811== by 0x41056E: main (ex56.c:395) >>> ==28811== Address 0xdc650d0 is 52,480 bytes inside a block of size >>> 52,488 alloc'd >>> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>> mcheck-amd64-linux.so) >>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >>> ==28811== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>> ==28811== by 0x41056E: main (ex56.c:395) >>> ==28811== >>> ==28812== Invalid read of size 16 >>> ==28812== at 0x8550946: dswap_k_NEHALEM (in >>> /usr/lib/openblas-base/libblas.so.3) >>> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>> .so.3) >>> ==28812== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3.0) >>> ==28812== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>> ==28812== by 0x41056E: main (ex56.c:395) >>> ==28812== Address 0xd9c7600 is 52,480 bytes inside a block of size >>> 52,488 alloc'd >>> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>> mcheck-amd64-linux.so) >>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >>> ==28812== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>> ==28812== by 0x41056E: main (ex56.c:395) >>> ==28812== >>> ==28811== Invalid read of size 16 >>> ==28811== at 0x8550A55: dswap_k_NEHALEM (in >>> /usr/lib/openblas-base/libblas.so.3) >>> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>> .so.3) >>> ==28811== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3.0) >>> ==28811== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>> ==28811== by 0x41056E: main (ex56.c:395) >>> ==28811== Address 0xd99cbe0 is 64 bytes inside a block of size 72 >>> alloc'd >>> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>> mcheck-amd64-linux.so) >>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >>> ==28811== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>> ==28811== by 0x41056E: main (ex56.c:395) >>> ==28811== >>> ==28812== Invalid read of size 16 >>> ==28812== at 0x8550A55: dswap_k_NEHALEM (in >>> /usr/lib/openblas-base/libblas.so.3) >>> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>> .so.3) >>> ==28812== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3.0) >>> ==28812== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>> ==28812== by 0x41056E: main (ex56.c:395) >>> ==28812== Address 0xdc11f30 is 64 bytes inside a block of size 72 >>> alloc'd >>> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>> mcheck-amd64-linux.so) >>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >>> ==28812== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>> ==28812== by 0x41056E: main (ex56.c:395) >>> ==28812== >>> [0] 27 global equations, 9 vertices >>> [0] 27 equations in vector, 9 vertices >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>> [0] 441 global equations, 147 vertices >>> [0] 441 equations in vector, 147 vertices >>> >>> hangs here ... >>> >>> Hong >>> >>> On Fri, Nov 10, 2017 at 8:57 AM, Mark Adams wrote: >>> >>>> This printed a little funny in gmail, snes/ex56 is running clean in the >>>> first few loops (appended), but the last one is the one with a reduced >>>> processor set. Still waiting. This is with 32 bit integers. I'm running >>>> another with 64 bit integers. >>>> >>>> ... >>>> [0] 27 global equations, 9 vertices >>>> [0] 27 equations in vector, 9 vertices >>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>> [0] 441 global equations, 147 vertices >>>> [0] 441 equations in vector, 147 vertices >>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>> [0] 4725 global equations, 1575 vertices >>>> [0] 4725 equations in vector, 1575 vertices >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams wrote: >>>> >>>>> >>>>> >>>>> On Thu, Nov 9, 2017 at 1:56 PM, Hong wrote: >>>>> >>>>>> Mark: >>>>>> >>>>>>> OK, well, just go with the Linux machine for the regression test. I >>>>>>> will keep trying to reproduce this on my Mac with an O build. >>>>>>> >>>>>> >>>>>> Valgrind error occurs on linux machines with g-build. I cannot merge >>>>>> this branch to maint until the bug is fixed. >>>>>> >>>>> >>>>> Valgrind is failing on this run on my Mac. Moving to cg, like you I >>>>> suppose. This takes forever. This is what I have so far. Did you get this >>>>> far? >>>>> >>>>> 07:48 hzhang/fix-submat_samerowdist *= /sandbox/adams/petsc/src/snes/examples/tutorials$ >>>>> make PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 val >>>>> /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 valgrind >>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>> ==12414== Memcheck, a memory error detector >>>>> ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et >>>>> al. >>>>> ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >>>>> copyright info >>>>> ==12415== Memcheck, a memory error detector >>>>> ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et >>>>> al. >>>>> ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >>>>> copyright info >>>>> ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>> ==12415== >>>>> ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>> ==12414== >>>>> [0] 27 global equations, 9 vertices >>>>> [0] 27 equations in vector, 9 vertices >>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>> [0] 441 global equations, 147 vertices >>>>> [0] 441 equations in vector, 147 vertices >>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>> [0] 4725 global equations, 1575 vertices >>>>> [0] 4725 equations in vector, 1575 vertices >>>>> >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Nov 10 13:07:02 2017 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Nov 2017 14:07:02 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: I would add: '--download-fblaslapack=1', This is what I have on my Linux machine (cg at ANL) and it runs clean. On Fri, Nov 10, 2017 at 11:03 AM, Hong wrote: > I use > Using configure Options: --download-metis --download-mpich > --download-mumps --download-parmetis --download-scalapack > --download-superlu --download-superlu_dist --download-suitesparse > --download-hypre --download-ptscotch --download-chaco --with-ctable=1 > --download-cmake --with-cc=gcc --with-cxx=g++ --with-debugging=1 > --with-visibility=0 --with-fc=gfortran > Hong > > On Fri, Nov 10, 2017 at 9:59 AM, Mark Adams wrote: > >> This must be a configure issue. I don't see these warning: >> >> #!/usr/bin/python >> if __name__ == '__main__': >> import sys >> import os >> sys.path.insert(0, os.path.abspath('config')) >> import configure >> configure_options = [ >> '--with-cc=clang', >> '--with-cc++=clang++', >> '--download-mpich=1', >> '--download-metis=1', >> '--download-superlu=1', >> '--download-superlu_dist=1', >> '--download-parmetis=1', >> '--download-fblaslapack=1', >> '--download-p4est=1', >> '--with-debugging=1', >> '--with-batch=0', >> 'PETSC_ARCH=arch-linux2-c-dbg32', >> '--with-openmp=0', >> '--download-p4est=0' >> ] >> configure.petsc_configure(configure_options) >> >> ~ >> >> >> >> >> >> On Fri, Nov 10, 2017 at 10:56 AM, Mark Adams wrote: >> >>> This is comming from blas. How did you configure blas? >>> >>> On Fri, Nov 10, 2017 at 10:38 AM, Hong wrote: >>> >>>> Using petsc machine, I get >>>> hzhang at petsc /sandbox/hzhang/petsc/src/snes/examples/tutorials >>>> (hzhang/fix-submat_samerowdist) >>>> $ mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 >>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>> >>>> ==28811== Memcheck, a memory error detector >>>> ==28811== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et >>>> al. >>>> ==28811== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright >>>> info >>>> ==28811== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>> ... >>>> ==28811== Invalid read of size 16 >>>> ==28811== at 0x8550946: dswap_k_NEHALEM (in >>>> /usr/lib/openblas-base/libblas.so.3) >>>> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>> .so.3) >>>> ==28811== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3 >>>> .0) >>>> ==28811== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>> ==28811== by 0x41056E: main (ex56.c:395) >>>> ==28811== Address 0xdc650d0 is 52,480 bytes inside a block of size >>>> 52,488 alloc'd >>>> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>> mcheck-amd64-linux.so) >>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>> ==28811== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>> ==28811== by 0x41056E: main (ex56.c:395) >>>> ==28811== >>>> ==28812== Invalid read of size 16 >>>> ==28812== at 0x8550946: dswap_k_NEHALEM (in >>>> /usr/lib/openblas-base/libblas.so.3) >>>> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>> .so.3) >>>> ==28812== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3 >>>> .0) >>>> ==28812== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>> ==28812== by 0x41056E: main (ex56.c:395) >>>> ==28812== Address 0xd9c7600 is 52,480 bytes inside a block of size >>>> 52,488 alloc'd >>>> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>> mcheck-amd64-linux.so) >>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>> ==28812== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>> ==28812== by 0x41056E: main (ex56.c:395) >>>> ==28812== >>>> ==28811== Invalid read of size 16 >>>> ==28811== at 0x8550A55: dswap_k_NEHALEM (in >>>> /usr/lib/openblas-base/libblas.so.3) >>>> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>> .so.3) >>>> ==28811== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3 >>>> .0) >>>> ==28811== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>> ==28811== by 0x41056E: main (ex56.c:395) >>>> ==28811== Address 0xd99cbe0 is 64 bytes inside a block of size 72 >>>> alloc'd >>>> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>> mcheck-amd64-linux.so) >>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>> ==28811== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>> ==28811== by 0x41056E: main (ex56.c:395) >>>> ==28811== >>>> ==28812== Invalid read of size 16 >>>> ==28812== at 0x8550A55: dswap_k_NEHALEM (in >>>> /usr/lib/openblas-base/libblas.so.3) >>>> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>> .so.3) >>>> ==28812== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3 >>>> .0) >>>> ==28812== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>> ==28812== by 0x41056E: main (ex56.c:395) >>>> ==28812== Address 0xdc11f30 is 64 bytes inside a block of size 72 >>>> alloc'd >>>> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>> mcheck-amd64-linux.so) >>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>> ==28812== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>> ==28812== by 0x41056E: main (ex56.c:395) >>>> ==28812== >>>> [0] 27 global equations, 9 vertices >>>> [0] 27 equations in vector, 9 vertices >>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>> [0] 441 global equations, 147 vertices >>>> [0] 441 equations in vector, 147 vertices >>>> >>>> hangs here ... >>>> >>>> Hong >>>> >>>> On Fri, Nov 10, 2017 at 8:57 AM, Mark Adams wrote: >>>> >>>>> This printed a little funny in gmail, snes/ex56 is running clean in >>>>> the first few loops (appended), but the last one is the one with a reduced >>>>> processor set. Still waiting. This is with 32 bit integers. I'm running >>>>> another with 64 bit integers. >>>>> >>>>> ... >>>>> [0] 27 global equations, 9 vertices >>>>> [0] 27 equations in vector, 9 vertices >>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>> [0] 441 global equations, 147 vertices >>>>> [0] 441 equations in vector, 147 vertices >>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>> [0] 4725 global equations, 1575 vertices >>>>> [0] 4725 equations in vector, 1575 vertices >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams wrote: >>>>> >>>>>> >>>>>> >>>>>> On Thu, Nov 9, 2017 at 1:56 PM, Hong wrote: >>>>>> >>>>>>> Mark: >>>>>>> >>>>>>>> OK, well, just go with the Linux machine for the regression test. I >>>>>>>> will keep trying to reproduce this on my Mac with an O build. >>>>>>>> >>>>>>> >>>>>>> Valgrind error occurs on linux machines with g-build. I cannot merge >>>>>>> this branch to maint until the bug is fixed. >>>>>>> >>>>>> >>>>>> Valgrind is failing on this run on my Mac. Moving to cg, like you I >>>>>> suppose. This takes forever. This is what I have so far. Did you get this >>>>>> far? >>>>>> >>>>>> 07:48 hzhang/fix-submat_samerowdist *= /sandbox/adams/petsc/src/snes/examples/tutorials$ >>>>>> make PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 val >>>>>> /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 valgrind >>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>>> ==12414== Memcheck, a memory error detector >>>>>> ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et >>>>>> al. >>>>>> ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >>>>>> copyright info >>>>>> ==12415== Memcheck, a memory error detector >>>>>> ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et >>>>>> al. >>>>>> ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >>>>>> copyright info >>>>>> ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>>> ==12415== >>>>>> ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>>> ==12414== >>>>>> [0] 27 global equations, 9 vertices >>>>>> [0] 27 equations in vector, 9 vertices >>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>>> [0] 441 global equations, 147 vertices >>>>>> [0] 441 equations in vector, 147 vertices >>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>>> [0] 4725 global equations, 1575 vertices >>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Fri Nov 10 16:17:12 2017 From: hzhang at mcs.anl.gov (Hong) Date: Fri, 10 Nov 2017 16:17:12 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Mark: > I would add: > > '--download-fblaslapack=1', > > This is what I have on my Linux machine (cg at ANL) and it runs clean. > I do not see any error report from nightly tests for ex56. Should I merge this branch to maint? Hong > > On Fri, Nov 10, 2017 at 11:03 AM, Hong wrote: > >> I use >> Using configure Options: --download-metis --download-mpich >> --download-mumps --download-parmetis --download-scalapack >> --download-superlu --download-superlu_dist --download-suitesparse >> --download-hypre --download-ptscotch --download-chaco --with-ctable=1 >> --download-cmake --with-cc=gcc --with-cxx=g++ --with-debugging=1 >> --with-visibility=0 --with-fc=gfortran >> Hong >> >> On Fri, Nov 10, 2017 at 9:59 AM, Mark Adams wrote: >> >>> This must be a configure issue. I don't see these warning: >>> >>> #!/usr/bin/python >>> if __name__ == '__main__': >>> import sys >>> import os >>> sys.path.insert(0, os.path.abspath('config')) >>> import configure >>> configure_options = [ >>> '--with-cc=clang', >>> '--with-cc++=clang++', >>> '--download-mpich=1', >>> '--download-metis=1', >>> '--download-superlu=1', >>> '--download-superlu_dist=1', >>> '--download-parmetis=1', >>> '--download-fblaslapack=1', >>> '--download-p4est=1', >>> '--with-debugging=1', >>> '--with-batch=0', >>> 'PETSC_ARCH=arch-linux2-c-dbg32', >>> '--with-openmp=0', >>> '--download-p4est=0' >>> ] >>> configure.petsc_configure(configure_options) >>> >>> ~ >>> >>> >>> >>> >>> >>> On Fri, Nov 10, 2017 at 10:56 AM, Mark Adams wrote: >>> >>>> This is comming from blas. How did you configure blas? >>>> >>>> On Fri, Nov 10, 2017 at 10:38 AM, Hong wrote: >>>> >>>>> Using petsc machine, I get >>>>> hzhang at petsc /sandbox/hzhang/petsc/src/snes/examples/tutorials >>>>> (hzhang/fix-submat_samerowdist) >>>>> $ mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>> >>>>> ==28811== Memcheck, a memory error detector >>>>> ==28811== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et >>>>> al. >>>>> ==28811== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >>>>> copyright info >>>>> ==28811== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>> ... >>>>> ==28811== Invalid read of size 16 >>>>> ==28811== at 0x8550946: dswap_k_NEHALEM (in >>>>> /usr/lib/openblas-base/libblas.so.3) >>>>> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>>> .so.3) >>>>> ==28811== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3 >>>>> .0) >>>>> ==28811== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>>> ==28811== by 0x41056E: main (ex56.c:395) >>>>> ==28811== Address 0xdc650d0 is 52,480 bytes inside a block of size >>>>> 52,488 alloc'd >>>>> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>>> mcheck-amd64-linux.so) >>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>>> ==28811== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>>> ==28811== by 0x41056E: main (ex56.c:395) >>>>> ==28811== >>>>> ==28812== Invalid read of size 16 >>>>> ==28812== at 0x8550946: dswap_k_NEHALEM (in >>>>> /usr/lib/openblas-base/libblas.so.3) >>>>> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>>> .so.3) >>>>> ==28812== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3 >>>>> .0) >>>>> ==28812== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>>> ==28812== by 0x41056E: main (ex56.c:395) >>>>> ==28812== Address 0xd9c7600 is 52,480 bytes inside a block of size >>>>> 52,488 alloc'd >>>>> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>>> mcheck-amd64-linux.so) >>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>>> ==28812== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>>> ==28812== by 0x41056E: main (ex56.c:395) >>>>> ==28812== >>>>> ==28811== Invalid read of size 16 >>>>> ==28811== at 0x8550A55: dswap_k_NEHALEM (in >>>>> /usr/lib/openblas-base/libblas.so.3) >>>>> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>>> .so.3) >>>>> ==28811== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3 >>>>> .0) >>>>> ==28811== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>>> ==28811== by 0x41056E: main (ex56.c:395) >>>>> ==28811== Address 0xd99cbe0 is 64 bytes inside a block of size 72 >>>>> alloc'd >>>>> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>>> mcheck-amd64-linux.so) >>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>>> ==28811== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>>> ==28811== by 0x41056E: main (ex56.c:395) >>>>> ==28811== >>>>> ==28812== Invalid read of size 16 >>>>> ==28812== at 0x8550A55: dswap_k_NEHALEM (in >>>>> /usr/lib/openblas-base/libblas.so.3) >>>>> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>>> .so.3) >>>>> ==28812== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3 >>>>> .0) >>>>> ==28812== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>>> ==28812== by 0x41056E: main (ex56.c:395) >>>>> ==28812== Address 0xdc11f30 is 64 bytes inside a block of size 72 >>>>> alloc'd >>>>> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>>> mcheck-amd64-linux.so) >>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>>> ==28812== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>>> ==28812== by 0x41056E: main (ex56.c:395) >>>>> ==28812== >>>>> [0] 27 global equations, 9 vertices >>>>> [0] 27 equations in vector, 9 vertices >>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>> [0] 441 global equations, 147 vertices >>>>> [0] 441 equations in vector, 147 vertices >>>>> >>>>> hangs here ... >>>>> >>>>> Hong >>>>> >>>>> On Fri, Nov 10, 2017 at 8:57 AM, Mark Adams wrote: >>>>> >>>>>> This printed a little funny in gmail, snes/ex56 is running clean in >>>>>> the first few loops (appended), but the last one is the one with a reduced >>>>>> processor set. Still waiting. This is with 32 bit integers. I'm running >>>>>> another with 64 bit integers. >>>>>> >>>>>> ... >>>>>> [0] 27 global equations, 9 vertices >>>>>> [0] 27 equations in vector, 9 vertices >>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>>> [0] 441 global equations, 147 vertices >>>>>> [0] 441 equations in vector, 147 vertices >>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>>> [0] 4725 global equations, 1575 vertices >>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Nov 9, 2017 at 1:56 PM, Hong wrote: >>>>>>> >>>>>>>> Mark: >>>>>>>> >>>>>>>>> OK, well, just go with the Linux machine for the regression test. >>>>>>>>> I will keep trying to reproduce this on my Mac with an O build. >>>>>>>>> >>>>>>>> >>>>>>>> Valgrind error occurs on linux machines with g-build. I cannot >>>>>>>> merge this branch to maint until the bug is fixed. >>>>>>>> >>>>>>> >>>>>>> Valgrind is failing on this run on my Mac. Moving to cg, like you I >>>>>>> suppose. This takes forever. This is what I have so far. Did you get this >>>>>>> far? >>>>>>> >>>>>>> 07:48 hzhang/fix-submat_samerowdist *= /sandbox/adams/petsc/src/snes/examples/tutorials$ >>>>>>> make PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 val >>>>>>> /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 valgrind >>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>>>> ==12414== Memcheck, a memory error detector >>>>>>> ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward >>>>>>> et al. >>>>>>> ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >>>>>>> copyright info >>>>>>> ==12415== Memcheck, a memory error detector >>>>>>> ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward >>>>>>> et al. >>>>>>> ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >>>>>>> copyright info >>>>>>> ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>>>> ==12415== >>>>>>> ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>>>> ==12414== >>>>>>> [0] 27 global equations, 9 vertices >>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations >>>>>>> 1 >>>>>>> [0] 441 global equations, 147 vertices >>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations >>>>>>> 1 >>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Fri Nov 10 16:18:10 2017 From: zakaryah at gmail.com (zakaryah .) Date: Fri, 10 Nov 2017 17:18:10 -0500 Subject: [petsc-users] Newton LS - better results on single processor In-Reply-To: <6A14B327-7D3E-4A0E-81D1-0595DEC2B099@mcs.anl.gov> References: <6A14B327-7D3E-4A0E-81D1-0595DEC2B099@mcs.anl.gov> Message-ID: Thanks for the advice. I put a VecView() on the solution vector, both before and after all the solves, and in the SNES Function and Jacobian, as well as a MatView() in the Jacobian, and a VecView() on the residual after the solves. Then I run a tiny problem with -pc_type redundant -redundant_pc_type lu, as Stefano suggested, and I compare the vectors and matrices with -n 1 to -n 2. Although the monitor shows somewhat different residuals for the KSP and for the SNES, the differences are very small. For example, after the first SNESSolve(), the SNES residual is 6.5e-18 with -n 1 and 6.0e-18 with -n 2, and of course I don't care about that tiny difference unless it indicates some mistake in my code. The VecView() and MatView() show that the state vector, the function vector, and the function Jacobian are identical (up to the default output precision of the view routines). The residuals are of course slightly different. For this problem it took 8 Riks iterations for the loading coefficient to reach 1 (i.e. to finish the iterations). For the last solve, the residuals and their differences were larger: 8.4e-15 with -n 1 and 8.7e-15 with -n 2. I think this supports my hypothesis that the iterations which feed one SNESSolve() solution into the initial guess for the next solve can amplify small differences. To check a bit deeper, I removed all of the view calls except to the SNES residual, and ran on a more realistic problem size, with the SNES defaults (and KSP defaults, PC defaults, etc). I will call the residual after the first SNESSolve Ri, and the residual after the last SNESSolve Rf. With -n 1, Ri and Rf are both spatially smooth (as I expect). I think the standard deviation (over space) of the residual is a useful way to quantify its amplitude; for -n 1, sd(Ri) = 5.1e-13 and sd(Rf) = 6.8e-12. With -n 2, both Ri and Rf have a discontinuity in x, at x=0, x=20, x=21, and x=41 (i.e. at the global boundary and at the boundary between the two processes). For -n 2, sd(Ri) = 1.2e-12 and sd(Rf) = 5.7e-12, with all of the additional fluctuations coming from those boundary coordinates. In other words, the fluctuations of the residual are an order of magnitude larger at the boundary between processes. If I consider the residuals as a function of the other dimensions (y or z instead of x), the entire range of which is owned by each processor, I don't see any discontinuity. I suppose that all of this has something to do with the spectrums of the matrices involved in the solve but I don't know enough to improve the results I'm obtaining. On Thu, Nov 9, 2017 at 11:09 PM, Smith, Barry F. wrote: > > > > On Nov 9, 2017, at 3:33 PM, zakaryah . wrote: > > > > Hi Stefano - when I referred to the iterations, I was trying to point > out that my method solves a series of nonlinear systems, with the solution > to the first problem being used to initialize the state vector for the > second problem, etc. The reason I mentioned that was I thought perhaps I > can expect the residuals from single process solve to differ from the > residuals from multiprocess solve by a very small amount, say machine > precision or the tolerance of the KSP/SNES, that would be fine normally. > But, if there is a possibility that those differences are somehow amplified > by each of the iterations (solution->initial state), that could explain > what I see. > > > > I agree that it is more likely that I have a bug in my code but I'm > having trouble finding it. > > Run a tiny problem on one and two processes with LU linear solver and > the same mesh. So in the first case all values live on the first process > and in the second the same first half live on one process and the second > half on the second process. > > Now track the values in the actual vectors and matrices. For example you > can just put in VecView() and MatView() on all objects you pass into the > solver and then put them in the SNESComputeFunction/Jacobian routines. > Print both the vectors inputed to these routines and the vectors/matrices > created in the routines. The output differences from the two runs should be > small, determine when they significantly vary. This will tell you the > likely location of the bug in your source code. (For example if certain > values of the Jacobian differ) > > Good luck, I've done this plenty of times and if it is a > "parallelization" bug this will help you find it much faster than guessing > where the problem is and trying code inspect to find the bug. > > Barry > > > > > I ran a small problem with -pc_type redundant -redundant_pc_type lu, as > you suggested. What I think is the relevant portion of the output is here > (i.e. there are small differences in the KSP residuals and SNES residuals): > > > > -n 1, first "iteration" as described above: > > > > 0 SNES Function norm 6.053565720454e-02 > > 0 KSP Residual norm 4.883115701982e-05 > > > > 0 KSP preconditioned resid norm 4.883115701982e-05 true resid norm > 6.053565720454e-02 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 8.173640409069e-20 > > > > 1 KSP preconditioned resid norm 8.173640409069e-20 true resid norm > 1.742143029296e-16 ||r(i)||/||b|| 2.877879104227e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 6.053565720454e-02 gnorm > 2.735518570862e-07 > > > > 1 SNES Function norm 2.735518570862e-07 > > > > 0 KSP Residual norm 1.298536630766e-10 > > > > 0 KSP preconditioned resid norm 1.298536630766e-10 true resid norm > 2.735518570862e-07 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 2.152782096751e-25 > > > > 1 KSP preconditioned resid norm 2.152782096751e-25 true resid norm > 4.755555202641e-22 ||r(i)||/||b|| 1.738447420279e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 2.735518570862e-07 gnorm > 1.917989238989e-17 > > > > 2 SNES Function norm 1.917989238989e-17 > > > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 > > > > > > > > -n 2, first "iteration" as described above: > > > > 0 SNES Function norm 6.053565720454e-02 > > > > 0 KSP Residual norm 4.883115701982e-05 > > > > 0 KSP preconditioned resid norm 4.883115701982e-05 true resid norm > 6.053565720454e-02 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.007084240718e-19 > > > > 1 KSP preconditioned resid norm 1.007084240718e-19 true resid norm > 1.868472589717e-16 ||r(i)||/||b|| 3.086565300520e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 6.053565720454e-02 gnorm > 2.735518570379e-07 > > > > 1 SNES Function norm 2.735518570379e-07 > > > > 0 KSP Residual norm 1.298536630342e-10 > > > > 0 KSP preconditioned resid norm 1.298536630342e-10 true resid norm > 2.735518570379e-07 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.885083482938e-25 > > > > 1 KSP preconditioned resid norm 1.885083482938e-25 true resid norm > 4.735707460766e-22 ||r(i)||/||b|| 1.731191852267e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 2.735518570379e-07 gnorm > 1.851472273258e-17 > > > > > > 2 SNES Function norm 1.851472273258e-17 > > > > > > -n 1, final "iteration": > > 0 SNES Function norm 9.695669610792e+01 > > > > 0 KSP Residual norm 7.898912593878e-03 > > > > 0 KSP preconditioned resid norm 7.898912593878e-03 true resid norm > 9.695669610792e+01 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.720960785852e-17 > > > > 1 KSP preconditioned resid norm 1.720960785852e-17 true resid norm > 1.237111121391e-13 ||r(i)||/||b|| 1.275941911237e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 9.695669610792e+01 gnorm > 1.026572731653e-01 > > > > 1 SNES Function norm 1.026572731653e-01 > > > > 0 KSP Residual norm 1.382450412926e-04 > > > > 0 KSP preconditioned resid norm 1.382450412926e-04 true resid norm > 1.026572731653e-01 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 5.018078565710e-20 > > > > 1 KSP preconditioned resid norm 5.018078565710e-20 true resid norm > 9.031463071676e-17 ||r(i)||/||b|| 8.797684560673e-16 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 1.026572731653e-01 gnorm > 7.982937980399e-06 > > > > 2 SNES Function norm 7.982937980399e-06 > > > > 0 KSP Residual norm 4.223898196692e-08 > > > > 0 KSP preconditioned resid norm 4.223898196692e-08 true resid norm > 7.982937980399e-06 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.038123933240e-22 > > > > 1 KSP preconditioned resid norm 1.038123933240e-22 true resid norm > 3.213931469966e-20 ||r(i)||/||b|| 4.026000800530e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 7.982937980399e-06 gnorm > 9.776066323463e-13 > > > > 3 SNES Function norm 9.776066323463e-13 > > > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3 > > > > -n 2, final "iteration": > > > > 0 SNES Function norm 9.695669610792e+01 > > > > 0 KSP Residual norm 7.898912593878e-03 > > > > 0 KSP preconditioned resid norm 7.898912593878e-03 true resid norm > 9.695669610792e+01 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.752819851736e-17 > > > > 1 KSP preconditioned resid norm 1.752819851736e-17 true resid norm > 1.017605437996e-13 ||r(i)||/||b|| 1.049546322064e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 9.695669610792e+01 gnorm > 1.026572731655e-01 > > > > 1 SNES Function norm 1.026572731655e-01 > > > > 0 KSP Residual norm 1.382450412926e-04 > > > > 0 KSP preconditioned resid norm 1.382450412926e-04 true resid norm > 1.026572731655e-01 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.701690118486e-19 > > > > 1 KSP preconditioned resid norm 1.701690118486e-19 true resid norm > 9.077679331860e-17 ||r(i)||/||b|| 8.842704517606e-16 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 1.026572731655e-01 gnorm > 7.982937883350e-06 > > > > 2 SNES Function norm 7.982937883350e-06 > > > > 0 KSP Residual norm 4.223898196594e-08 > > > > 0 KSP preconditioned resid norm 4.223898196594e-08 true resid norm > 7.982937883350e-06 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.471638984554e-23 > > > > 1 KSP preconditioned resid norm 1.471638984554e-23 true resid norm > 2.483672977401e-20 ||r(i)||/||b|| 3.111226735938e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 7.982937883350e-06 gnorm > 1.019121417798e-12 > > > > 3 SNES Function norm 1.019121417798e-12 > > > > > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3 > > > > > > > > Of course these differences are still very small, but this is only true > for such a small problem size. For a regular sized problem, the > differences at the final iteration can exceed 1 and even 100 at a > particular grid point (i.e. in a sense that doesn't scale with problem > size). > > > > I also compared -n 1 and -n 2 with the -snes_monitor_solution > -ksp_view_rhs -ksp_view_mat -ksp_view_solution options on a tiny problem > (5x5x5), and I was not able to find any differences in the Jacobian or the > vectors, but I'm suspicious that this could be due to the output format, > because even for the tiny problem there are non-trivial differences in the > residuals of both the SNES and the KSP. > > > > In all cases, the differences in the residuals are localized to the > boundary between parts of the displacement vector owned by the two > processes. The SNES residual with -n 2 typically looks discontinuous > across that boundary. > > > > > > On Thu, Nov 9, 2017 at 11:16 AM, zakaryah . wrote: > > Thanks Stefano, I will try what you suggest. > > > > ?Matt - my DM is a composite between the redundant field (loading > coefficient, which is included in the Newton solve in Riks' method) and the > displacements, which are represented by a 3D DA with 3 dof. I am using > finite difference. > > > > Probably my problem comes from confusion over how the composite DM is > organized. I am using FormFunction()?, and within that I call > DMCompositeGetLocalVectors(), DMCompositeScatter(), DMDAVecGetArray(), and > for the Jacobian, DMCompositeGetLocalISs() and MatGetLocalSubmatrix() to > split J into Jbb, Jbh, Jhb, and Jhh, where b is the loading coefficient, > and h is the displacements). The values of each submatrix are set using > MatSetValuesLocal(). > > > > ?I'm most suspicious of the part of the Jacobian routine where I > calculate the rows of Jhb, the columns of Jbh, and the corresponding > values. I take the DA coordinates and ix,iy,iz, then calculate the row of > Jhb as ((((iz-info->gzs)*info->gym + (iy-info->gys))*info->gxm + > (ix-info->gxs))*info->dof+c), where info is the DA local info and c is the > degree of freedom. The same calculation is performed for the column of > Jbh. I suspect that the indexing of the DA vector is not so simple, but I > don't know for a fact that I'm doing this incorrectly nor how to do this > properly. > > > > ?Thanks for all the help!? > > > > > > On Nov 9, 2017 8:44 AM, "Matthew Knepley" wrote: > > On Thu, Nov 9, 2017 at 12:14 AM, zakaryah . wrote: > > Well the saga of my problem continues. As I described previously in an > epic thread, I'm using the SNES to solve problems involving an elastic > material on a rectangular grid, subjected to external forces. In any case, > I'm occasionally getting poor convergence using Newton's method with line > search. In troubleshooting by visualizing the residual, I saw that in data > sets which had good convergence, the residual was nevertheless > significantly larger along the boundary between different processors. > Likewise, in data sets with poor convergence, the residual became very > large on the boundary between different processors. The residual is not > significantly larger on the physical boundary, i.e. the global boundary. > When I run on a single process, convergence seems to be good on all data > sets. > > > > Any clues to fix this? > > > > It sounds like something is wrong with communication across domains: > > > > - If this is FEM, it sounds like you are not adding contributions from > the other domain to shared vertices/edges/faces > > > > - If this is FDM/FVM, maybe the ghosts are not updated > > > > What DM are you using? Are you using the Local assembly functions > (FormFunctionLocal), or just FormFunction()? > > > > Thanks, > > > > Matt > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Nov 10 16:25:22 2017 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Nov 2017 17:25:22 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: I don't see a problem with this. As far as what branch to merge it into, that is a Barry/Satish/etc decision. On Fri, Nov 10, 2017 at 5:17 PM, Hong wrote: > Mark: > >> I would add: >> >> '--download-fblaslapack=1', >> >> This is what I have on my Linux machine (cg at ANL) and it runs clean. >> > I do not see any error report from nightly tests for ex56. Should I merge > this branch to maint? > Hong > >> >> On Fri, Nov 10, 2017 at 11:03 AM, Hong wrote: >> >>> I use >>> Using configure Options: --download-metis --download-mpich >>> --download-mumps --download-parmetis --download-scalapack >>> --download-superlu --download-superlu_dist --download-suitesparse >>> --download-hypre --download-ptscotch --download-chaco --with-ctable=1 >>> --download-cmake --with-cc=gcc --with-cxx=g++ --with-debugging=1 >>> --with-visibility=0 --with-fc=gfortran >>> Hong >>> >>> On Fri, Nov 10, 2017 at 9:59 AM, Mark Adams wrote: >>> >>>> This must be a configure issue. I don't see these warning: >>>> >>>> #!/usr/bin/python >>>> if __name__ == '__main__': >>>> import sys >>>> import os >>>> sys.path.insert(0, os.path.abspath('config')) >>>> import configure >>>> configure_options = [ >>>> '--with-cc=clang', >>>> '--with-cc++=clang++', >>>> '--download-mpich=1', >>>> '--download-metis=1', >>>> '--download-superlu=1', >>>> '--download-superlu_dist=1', >>>> '--download-parmetis=1', >>>> '--download-fblaslapack=1', >>>> '--download-p4est=1', >>>> '--with-debugging=1', >>>> '--with-batch=0', >>>> 'PETSC_ARCH=arch-linux2-c-dbg32', >>>> '--with-openmp=0', >>>> '--download-p4est=0' >>>> ] >>>> configure.petsc_configure(configure_options) >>>> >>>> ~ >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Nov 10, 2017 at 10:56 AM, Mark Adams wrote: >>>> >>>>> This is comming from blas. How did you configure blas? >>>>> >>>>> On Fri, Nov 10, 2017 at 10:38 AM, Hong wrote: >>>>> >>>>>> Using petsc machine, I get >>>>>> hzhang at petsc /sandbox/hzhang/petsc/src/snes/examples/tutorials >>>>>> (hzhang/fix-submat_samerowdist) >>>>>> $ mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>>> >>>>>> ==28811== Memcheck, a memory error detector >>>>>> ==28811== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et >>>>>> al. >>>>>> ==28811== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >>>>>> copyright info >>>>>> ==28811== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>>> ... >>>>>> ==28811== Invalid read of size 16 >>>>>> ==28811== at 0x8550946: dswap_k_NEHALEM (in >>>>>> /usr/lib/openblas-base/libblas.so.3) >>>>>> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>>>> .so.3) >>>>>> ==28811== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3 >>>>>> .0) >>>>>> ==28811== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >>>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>>>> ==28811== by 0x41056E: main (ex56.c:395) >>>>>> ==28811== Address 0xdc650d0 is 52,480 bytes inside a block of size >>>>>> 52,488 alloc'd >>>>>> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>>>> mcheck-amd64-linux.so) >>>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>>>> ==28811== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >>>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>>>> ==28811== by 0x41056E: main (ex56.c:395) >>>>>> ==28811== >>>>>> ==28812== Invalid read of size 16 >>>>>> ==28812== at 0x8550946: dswap_k_NEHALEM (in >>>>>> /usr/lib/openblas-base/libblas.so.3) >>>>>> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>>>> .so.3) >>>>>> ==28812== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3 >>>>>> .0) >>>>>> ==28812== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >>>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>>>> ==28812== by 0x41056E: main (ex56.c:395) >>>>>> ==28812== Address 0xd9c7600 is 52,480 bytes inside a block of size >>>>>> 52,488 alloc'd >>>>>> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>>>> mcheck-amd64-linux.so) >>>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>>>> ==28812== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >>>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >>>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >>>>>> ==28812== by 0x41056E: main (ex56.c:395) >>>>>> ==28812== >>>>>> ==28811== Invalid read of size 16 >>>>>> ==28811== at 0x8550A55: dswap_k_NEHALEM (in >>>>>> /usr/lib/openblas-base/libblas.so.3) >>>>>> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>>>> .so.3) >>>>>> ==28811== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3 >>>>>> .0) >>>>>> ==28811== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >>>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>>>> ==28811== by 0x41056E: main (ex56.c:395) >>>>>> ==28811== Address 0xd99cbe0 is 64 bytes inside a block of size 72 >>>>>> alloc'd >>>>>> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>>>> mcheck-amd64-linux.so) >>>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>>>> ==28811== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >>>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>>>> ==28811== by 0x41056E: main (ex56.c:395) >>>>>> ==28811== >>>>>> ==28812== Invalid read of size 16 >>>>>> ==28812== at 0x8550A55: dswap_k_NEHALEM (in >>>>>> /usr/lib/openblas-base/libblas.so.3) >>>>>> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas >>>>>> .so.3) >>>>>> ==28812== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3 >>>>>> .0) >>>>>> ==28812== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >>>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>>>> ==28812== by 0x41056E: main (ex56.c:395) >>>>>> ==28812== Address 0xdc11f30 is 64 bytes inside a block of size 72 >>>>>> alloc'd >>>>>> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me >>>>>> mcheck-amd64-linux.so) >>>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >>>>>> ==28812== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >>>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) >>>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >>>>>> ==28812== by 0x41056E: main (ex56.c:395) >>>>>> ==28812== >>>>>> [0] 27 global equations, 9 vertices >>>>>> [0] 27 equations in vector, 9 vertices >>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>>>> [0] 441 global equations, 147 vertices >>>>>> [0] 441 equations in vector, 147 vertices >>>>>> >>>>>> hangs here ... >>>>>> >>>>>> Hong >>>>>> >>>>>> On Fri, Nov 10, 2017 at 8:57 AM, Mark Adams wrote: >>>>>> >>>>>>> This printed a little funny in gmail, snes/ex56 is running clean in >>>>>>> the first few loops (appended), but the last one is the one with a reduced >>>>>>> processor set. Still waiting. This is with 32 bit integers. I'm running >>>>>>> another with 64 bit integers. >>>>>>> >>>>>>> ... >>>>>>> [0] 27 global equations, 9 vertices >>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations >>>>>>> 1 >>>>>>> [0] 441 global equations, 147 vertices >>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations >>>>>>> 1 >>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Nov 9, 2017 at 1:56 PM, Hong wrote: >>>>>>>> >>>>>>>>> Mark: >>>>>>>>> >>>>>>>>>> OK, well, just go with the Linux machine for the regression test. >>>>>>>>>> I will keep trying to reproduce this on my Mac with an O build. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Valgrind error occurs on linux machines with g-build. I cannot >>>>>>>>> merge this branch to maint until the bug is fixed. >>>>>>>>> >>>>>>>> >>>>>>>> Valgrind is failing on this run on my Mac. Moving to cg, like you I >>>>>>>> suppose. This takes forever. This is what I have so far. Did you get this >>>>>>>> far? >>>>>>>> >>>>>>>> 07:48 hzhang/fix-submat_samerowdist *= >>>>>>>> /sandbox/adams/petsc/src/snes/examples/tutorials$ make >>>>>>>> PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 val >>>>>>>> /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 valgrind >>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 >>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg >>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>>>>> ==12414== Memcheck, a memory error detector >>>>>>>> ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward >>>>>>>> et al. >>>>>>>> ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >>>>>>>> copyright info >>>>>>>> ==12415== Memcheck, a memory error detector >>>>>>>> ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward >>>>>>>> et al. >>>>>>>> ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >>>>>>>> copyright info >>>>>>>> ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>>>>> ==12415== >>>>>>>> ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >>>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol >>>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg >>>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 >>>>>>>> ==12414== >>>>>>>> [0] 27 global equations, 9 vertices >>>>>>>> [0] 27 equations in vector, 9 vertices >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>> iterations 1 >>>>>>>> [0] 441 global equations, 147 vertices >>>>>>>> [0] 441 equations in vector, 147 vertices >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >>>>>>>> iterations 1 >>>>>>>> [0] 4725 global equations, 1575 vertices >>>>>>>> [0] 4725 equations in vector, 1575 vertices >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Nov 10 17:04:55 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 10 Nov 2017 17:04:55 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: I don't see any changes hzhang/fix-submat_samerowdist in past few days. So how did this error get fixed? please do not merge yet. Satish On Fri, 10 Nov 2017, Mark Adams wrote: > I don't see a problem with this. As far as what branch to merge it into, > that is a Barry/Satish/etc decision. > > On Fri, Nov 10, 2017 at 5:17 PM, Hong wrote: > > > Mark: > > > >> I would add: > >> > >> '--download-fblaslapack=1', > >> > >> This is what I have on my Linux machine (cg at ANL) and it runs clean. > >> > > I do not see any error report from nightly tests for ex56. Should I merge > > this branch to maint? > > Hong > > > >> > >> On Fri, Nov 10, 2017 at 11:03 AM, Hong wrote: > >> > >>> I use > >>> Using configure Options: --download-metis --download-mpich > >>> --download-mumps --download-parmetis --download-scalapack > >>> --download-superlu --download-superlu_dist --download-suitesparse > >>> --download-hypre --download-ptscotch --download-chaco --with-ctable=1 > >>> --download-cmake --with-cc=gcc --with-cxx=g++ --with-debugging=1 > >>> --with-visibility=0 --with-fc=gfortran > >>> Hong > >>> > >>> On Fri, Nov 10, 2017 at 9:59 AM, Mark Adams wrote: > >>> > >>>> This must be a configure issue. I don't see these warning: > >>>> > >>>> #!/usr/bin/python > >>>> if __name__ == '__main__': > >>>> import sys > >>>> import os > >>>> sys.path.insert(0, os.path.abspath('config')) > >>>> import configure > >>>> configure_options = [ > >>>> '--with-cc=clang', > >>>> '--with-cc++=clang++', > >>>> '--download-mpich=1', > >>>> '--download-metis=1', > >>>> '--download-superlu=1', > >>>> '--download-superlu_dist=1', > >>>> '--download-parmetis=1', > >>>> '--download-fblaslapack=1', > >>>> '--download-p4est=1', > >>>> '--with-debugging=1', > >>>> '--with-batch=0', > >>>> 'PETSC_ARCH=arch-linux2-c-dbg32', > >>>> '--with-openmp=0', > >>>> '--download-p4est=0' > >>>> ] > >>>> configure.petsc_configure(configure_options) > >>>> > >>>> ~ > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Fri, Nov 10, 2017 at 10:56 AM, Mark Adams wrote: > >>>> > >>>>> This is comming from blas. How did you configure blas? > >>>>> > >>>>> On Fri, Nov 10, 2017 at 10:38 AM, Hong wrote: > >>>>> > >>>>>> Using petsc machine, I get > >>>>>> hzhang at petsc /sandbox/hzhang/petsc/src/snes/examples/tutorials > >>>>>> (hzhang/fix-submat_samerowdist) > >>>>>> $ mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 > >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol > >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg > >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason > >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type > >>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 > >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type > >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 > >>>>>> > >>>>>> ==28811== Memcheck, a memory error detector > >>>>>> ==28811== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et > >>>>>> al. > >>>>>> ==28811== Using Valgrind-3.10.1 and LibVEX; rerun with -h for > >>>>>> copyright info > >>>>>> ==28811== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 > >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol > >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg > >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason > >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type > >>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 > >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type > >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 > >>>>>> ... > >>>>>> ==28811== Invalid read of size 16 > >>>>>> ==28811== at 0x8550946: dswap_k_NEHALEM (in > >>>>>> /usr/lib/openblas-base/libblas.so.3) > >>>>>> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas > >>>>>> .so.3) > >>>>>> ==28811== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3 > >>>>>> .0) > >>>>>> ==28811== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) > >>>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > >>>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > >>>>>> ==28811== Address 0xdc650d0 is 52,480 bytes inside a block of size > >>>>>> 52,488 alloc'd > >>>>>> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me > >>>>>> mcheck-amd64-linux.so) > >>>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) > >>>>>> ==28811== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) > >>>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > >>>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > >>>>>> ==28811== > >>>>>> ==28812== Invalid read of size 16 > >>>>>> ==28812== at 0x8550946: dswap_k_NEHALEM (in > >>>>>> /usr/lib/openblas-base/libblas.so.3) > >>>>>> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas > >>>>>> .so.3) > >>>>>> ==28812== by 0x75B33B2: dgetri_ (in /usr/lib/lapack/liblapack.so.3 > >>>>>> .0) > >>>>>> ==28812== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) > >>>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > >>>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > >>>>>> ==28812== Address 0xd9c7600 is 52,480 bytes inside a block of size > >>>>>> 52,488 alloc'd > >>>>>> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me > >>>>>> mcheck-amd64-linux.so) > >>>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) > >>>>>> ==28812== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) > >>>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > >>>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > >>>>>> ==28812== > >>>>>> ==28811== Invalid read of size 16 > >>>>>> ==28811== at 0x8550A55: dswap_k_NEHALEM (in > >>>>>> /usr/lib/openblas-base/libblas.so.3) > >>>>>> ==28811== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas > >>>>>> .so.3) > >>>>>> ==28811== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3 > >>>>>> .0) > >>>>>> ==28811== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) > >>>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > >>>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > >>>>>> ==28811== Address 0xd99cbe0 is 64 bytes inside a block of size 72 > >>>>>> alloc'd > >>>>>> ==28811== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me > >>>>>> mcheck-amd64-linux.so) > >>>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) > >>>>>> ==28811== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) > >>>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > >>>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > >>>>>> ==28811== > >>>>>> ==28812== Invalid read of size 16 > >>>>>> ==28812== at 0x8550A55: dswap_k_NEHALEM (in > >>>>>> /usr/lib/openblas-base/libblas.so.3) > >>>>>> ==28812== by 0x7C6797F: dswap_ (in /usr/lib/openblas-base/libblas > >>>>>> .so.3) > >>>>>> ==28812== by 0x7675179: dsteqr_ (in /usr/lib/lapack/liblapack.so.3 > >>>>>> .0) > >>>>>> ==28812== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) > >>>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > >>>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > >>>>>> ==28812== Address 0xdc11f30 is 64 bytes inside a block of size 72 > >>>>>> alloc'd > >>>>>> ==28812== at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_me > >>>>>> mcheck-amd64-linux.so) > >>>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) > >>>>>> ==28812== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) > >>>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > >>>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > >>>>>> ==28812== > >>>>>> [0] 27 global equations, 9 vertices > >>>>>> [0] 27 equations in vector, 9 vertices > >>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > >>>>>> [0] 441 global equations, 147 vertices > >>>>>> [0] 441 equations in vector, 147 vertices > >>>>>> > >>>>>> hangs here ... > >>>>>> > >>>>>> Hong > >>>>>> > >>>>>> On Fri, Nov 10, 2017 at 8:57 AM, Mark Adams wrote: > >>>>>> > >>>>>>> This printed a little funny in gmail, snes/ex56 is running clean in > >>>>>>> the first few loops (appended), but the last one is the one with a reduced > >>>>>>> processor set. Still waiting. This is with 32 bit integers. I'm running > >>>>>>> another with 64 bit integers. > >>>>>>> > >>>>>>> ... > >>>>>>> [0] 27 global equations, 9 vertices > >>>>>>> [0] 27 equations in vector, 9 vertices > >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations > >>>>>>> 1 > >>>>>>> [0] 441 global equations, 147 vertices > >>>>>>> [0] 441 equations in vector, 147 vertices > >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations > >>>>>>> 1 > >>>>>>> [0] 4725 global equations, 1575 vertices > >>>>>>> [0] 4725 equations in vector, 1575 vertices > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams wrote: > >>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, Nov 9, 2017 at 1:56 PM, Hong wrote: > >>>>>>>> > >>>>>>>>> Mark: > >>>>>>>>> > >>>>>>>>>> OK, well, just go with the Linux machine for the regression test. > >>>>>>>>>> I will keep trying to reproduce this on my Mac with an O build. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> Valgrind error occurs on linux machines with g-build. I cannot > >>>>>>>>> merge this branch to maint until the bug is fixed. > >>>>>>>>> > >>>>>>>> > >>>>>>>> Valgrind is failing on this run on my Mac. Moving to cg, like you I > >>>>>>>> suppose. This takes forever. This is what I have so far. Did you get this > >>>>>>>> far? > >>>>>>>> > >>>>>>>> 07:48 hzhang/fix-submat_samerowdist *= > >>>>>>>> /sandbox/adams/petsc/src/snes/examples/tutorials$ make > >>>>>>>> PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 val > >>>>>>>> /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 valgrind > >>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 -snes_max_it 2 > >>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type > >>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg > >>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 > >>>>>>>> ==12414== Memcheck, a memory error detector > >>>>>>>> ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward > >>>>>>>> et al. > >>>>>>>> ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for > >>>>>>>> copyright info > >>>>>>>> ==12415== Memcheck, a memory error detector > >>>>>>>> ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward > >>>>>>>> et al. > >>>>>>>> ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for > >>>>>>>> copyright info > >>>>>>>> ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 > >>>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol > >>>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg > >>>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 > >>>>>>>> ==12415== > >>>>>>>> ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 > >>>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol > >>>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg > >>>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -snes_converged_reason > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 -mg_levels_ksp_type > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 -run_type 1 > >>>>>>>> ==12414== > >>>>>>>> [0] 27 global equations, 9 vertices > >>>>>>>> [0] 27 equations in vector, 9 vertices > >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > >>>>>>>> iterations 1 > >>>>>>>> [0] 441 global equations, 147 vertices > >>>>>>>> [0] 441 equations in vector, 147 vertices > >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > >>>>>>>> iterations 1 > >>>>>>>> [0] 4725 global equations, 1575 vertices > >>>>>>>> [0] 4725 equations in vector, 1575 vertices > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > From hzhang at mcs.anl.gov Fri Nov 10 17:09:01 2017 From: hzhang at mcs.anl.gov (Hong) Date: Fri, 10 Nov 2017 17:09:01 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Satish : > I don't see any changes hzhang/fix-submat_samerowdist in past few > days. So how did this error get fixed? > I've never seen valgrind error for ex56 from nightly tests. Mark told me about Valgrind error on his mac, and I reproduced it on our linux machine with my own configure, which directs to lapack. > > please do not merge yet. > Sure. Hong > > On Fri, 10 Nov 2017, Mark Adams wrote: > > > I don't see a problem with this. As far as what branch to merge it into, > > that is a Barry/Satish/etc decision. > > > > On Fri, Nov 10, 2017 at 5:17 PM, Hong wrote: > > > > > Mark: > > > > > >> I would add: > > >> > > >> '--download-fblaslapack=1', > > >> > > >> This is what I have on my Linux machine (cg at ANL) and it runs clean. > > >> > > > I do not see any error report from nightly tests for ex56. Should I > merge > > > this branch to maint? > > > Hong > > > > > >> > > >> On Fri, Nov 10, 2017 at 11:03 AM, Hong wrote: > > >> > > >>> I use > > >>> Using configure Options: --download-metis --download-mpich > > >>> --download-mumps --download-parmetis --download-scalapack > > >>> --download-superlu --download-superlu_dist --download-suitesparse > > >>> --download-hypre --download-ptscotch --download-chaco --with-ctable=1 > > >>> --download-cmake --with-cc=gcc --with-cxx=g++ --with-debugging=1 > > >>> --with-visibility=0 --with-fc=gfortran > > >>> Hong > > >>> > > >>> On Fri, Nov 10, 2017 at 9:59 AM, Mark Adams wrote: > > >>> > > >>>> This must be a configure issue. I don't see these warning: > > >>>> > > >>>> #!/usr/bin/python > > >>>> if __name__ == '__main__': > > >>>> import sys > > >>>> import os > > >>>> sys.path.insert(0, os.path.abspath('config')) > > >>>> import configure > > >>>> configure_options = [ > > >>>> '--with-cc=clang', > > >>>> '--with-cc++=clang++', > > >>>> '--download-mpich=1', > > >>>> '--download-metis=1', > > >>>> '--download-superlu=1', > > >>>> '--download-superlu_dist=1', > > >>>> '--download-parmetis=1', > > >>>> '--download-fblaslapack=1', > > >>>> '--download-p4est=1', > > >>>> '--with-debugging=1', > > >>>> '--with-batch=0', > > >>>> 'PETSC_ARCH=arch-linux2-c-dbg32', > > >>>> '--with-openmp=0', > > >>>> '--download-p4est=0' > > >>>> ] > > >>>> configure.petsc_configure(configure_options) > > >>>> > > >>>> ~ > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> On Fri, Nov 10, 2017 at 10:56 AM, Mark Adams > wrote: > > >>>> > > >>>>> This is comming from blas. How did you configure blas? > > >>>>> > > >>>>> On Fri, Nov 10, 2017 at 10:38 AM, Hong wrote: > > >>>>> > > >>>>>> Using petsc machine, I get > > >>>>>> hzhang at petsc /sandbox/hzhang/petsc/src/snes/examples/tutorials > > >>>>>> (hzhang/fix-submat_samerowdist) > > >>>>>> $ mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 > > >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg > -ksp_rtol > > >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type > gamg > > >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 > -pc_gamg_coarse_eq_limit 10 > > >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > > >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > -snes_converged_reason > > >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > -mg_levels_ksp_type > > >>>>>> chebyshev -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 > > >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type > > >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 > -run_type 1 > > >>>>>> > > >>>>>> ==28811== Memcheck, a memory error detector > > >>>>>> ==28811== Copyright (C) 2002-2013, and GNU GPL'd, by Julian > Seward et > > >>>>>> al. > > >>>>>> ==28811== Using Valgrind-3.10.1 and LibVEX; rerun with -h for > > >>>>>> copyright info > > >>>>>> ==28811== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 > > >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg > -ksp_rtol > > >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type > gamg > > >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 > -pc_gamg_coarse_eq_limit 10 > > >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > > >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > -snes_converged_reason > > >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > -mg_levels_ksp_type > > >>>>>> chebyshev -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 > > >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type > > >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 > -run_type 1 > > >>>>>> ... > > >>>>>> ==28811== Invalid read of size 16 > > >>>>>> ==28811== at 0x8550946: dswap_k_NEHALEM (in > > >>>>>> /usr/lib/openblas-base/libblas.so.3) > > >>>>>> ==28811== by 0x7C6797F: dswap_ (in > /usr/lib/openblas-base/libblas > > >>>>>> .so.3) > > >>>>>> ==28811== by 0x75B33B2: dgetri_ (in > /usr/lib/lapack/liblapack.so.3 > > >>>>>> .0) > > >>>>>> ==28811== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) > > >>>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > > >>>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > > >>>>>> ==28811== Address 0xdc650d0 is 52,480 bytes inside a block of > size > > >>>>>> 52,488 alloc'd > > >>>>>> ==28811== at 0x4C2D110: memalign (in > /usr/lib/valgrind/vgpreload_me > > >>>>>> mcheck-amd64-linux.so) > > >>>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) > > >>>>>> ==28811== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) > > >>>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > > >>>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > > >>>>>> ==28811== > > >>>>>> ==28812== Invalid read of size 16 > > >>>>>> ==28812== at 0x8550946: dswap_k_NEHALEM (in > > >>>>>> /usr/lib/openblas-base/libblas.so.3) > > >>>>>> ==28812== by 0x7C6797F: dswap_ (in > /usr/lib/openblas-base/libblas > > >>>>>> .so.3) > > >>>>>> ==28812== by 0x75B33B2: dgetri_ (in > /usr/lib/lapack/liblapack.so.3 > > >>>>>> .0) > > >>>>>> ==28812== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) > > >>>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > > >>>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > > >>>>>> ==28812== Address 0xd9c7600 is 52,480 bytes inside a block of > size > > >>>>>> 52,488 alloc'd > > >>>>>> ==28812== at 0x4C2D110: memalign (in > /usr/lib/valgrind/vgpreload_me > > >>>>>> mcheck-amd64-linux.so) > > >>>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) > > >>>>>> ==28812== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) > > >>>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > > >>>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > > >>>>>> ==28812== > > >>>>>> ==28811== Invalid read of size 16 > > >>>>>> ==28811== at 0x8550A55: dswap_k_NEHALEM (in > > >>>>>> /usr/lib/openblas-base/libblas.so.3) > > >>>>>> ==28811== by 0x7C6797F: dswap_ (in > /usr/lib/openblas-base/libblas > > >>>>>> .so.3) > > >>>>>> ==28811== by 0x7675179: dsteqr_ (in > /usr/lib/lapack/liblapack.so.3 > > >>>>>> .0) > > >>>>>> ==28811== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) > > >>>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > > >>>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > > >>>>>> ==28811== Address 0xd99cbe0 is 64 bytes inside a block of size 72 > > >>>>>> alloc'd > > >>>>>> ==28811== at 0x4C2D110: memalign (in > /usr/lib/valgrind/vgpreload_me > > >>>>>> mcheck-amd64-linux.so) > > >>>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) > > >>>>>> ==28811== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) > > >>>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > > >>>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > > >>>>>> ==28811== > > >>>>>> ==28812== Invalid read of size 16 > > >>>>>> ==28812== at 0x8550A55: dswap_k_NEHALEM (in > > >>>>>> /usr/lib/openblas-base/libblas.so.3) > > >>>>>> ==28812== by 0x7C6797F: dswap_ (in > /usr/lib/openblas-base/libblas > > >>>>>> .so.3) > > >>>>>> ==28812== by 0x7675179: dsteqr_ (in > /usr/lib/lapack/liblapack.so.3 > > >>>>>> .0) > > >>>>>> ==28812== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) > > >>>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > > >>>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > > >>>>>> ==28812== Address 0xdc11f30 is 64 bytes inside a block of size 72 > > >>>>>> alloc'd > > >>>>>> ==28812== at 0x4C2D110: memalign (in > /usr/lib/valgrind/vgpreload_me > > >>>>>> mcheck-amd64-linux.so) > > >>>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) > > >>>>>> ==28812== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) > > >>>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > > >>>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > > >>>>>> ==28812== > > >>>>>> [0] 27 global equations, 9 vertices > > >>>>>> [0] 27 equations in vector, 9 vertices > > >>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > iterations 1 > > >>>>>> [0] 441 global equations, 147 vertices > > >>>>>> [0] 441 equations in vector, 147 vertices > > >>>>>> > > >>>>>> hangs here ... > > >>>>>> > > >>>>>> Hong > > >>>>>> > > >>>>>> On Fri, Nov 10, 2017 at 8:57 AM, Mark Adams > wrote: > > >>>>>> > > >>>>>>> This printed a little funny in gmail, snes/ex56 is running clean > in > > >>>>>>> the first few loops (appended), but the last one is the one with > a reduced > > >>>>>>> processor set. Still waiting. This is with 32 bit integers. I'm > running > > >>>>>>> another with 64 bit integers. > > >>>>>>> > > >>>>>>> ... > > >>>>>>> [0] 27 global equations, 9 vertices > > >>>>>>> [0] 27 equations in vector, 9 vertices > > >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > iterations > > >>>>>>> 1 > > >>>>>>> [0] 441 global equations, 147 vertices > > >>>>>>> [0] 441 equations in vector, 147 vertices > > >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > iterations > > >>>>>>> 1 > > >>>>>>> [0] 4725 global equations, 1575 vertices > > >>>>>>> [0] 4725 equations in vector, 1575 vertices > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams > wrote: > > >>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Thu, Nov 9, 2017 at 1:56 PM, Hong > wrote: > > >>>>>>>> > > >>>>>>>>> Mark: > > >>>>>>>>> > > >>>>>>>>>> OK, well, just go with the Linux machine for the regression > test. > > >>>>>>>>>> I will keep trying to reproduce this on my Mac with an O > build. > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Valgrind error occurs on linux machines with g-build. I cannot > > >>>>>>>>> merge this branch to maint until the bug is fixed. > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> Valgrind is failing on this run on my Mac. Moving to cg, like > you I > > >>>>>>>> suppose. This takes forever. This is what I have so far. Did > you get this > > >>>>>>>> far? > > >>>>>>>> > > >>>>>>>> 07:48 hzhang/fix-submat_samerowdist *= > > >>>>>>>> /sandbox/adams/petsc/src/snes/examples/tutorials$ make > > >>>>>>>> PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 > val > > >>>>>>>> /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 > valgrind > > >>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 > -snes_max_it 2 > > >>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type > > >>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type > agg > > >>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > -snes_converged_reason > > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > -mg_levels_ksp_type > > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 > > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -mg_levels_pc_type > > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size > 3 -run_type 1 > > >>>>>>>> ==12414== Memcheck, a memory error detector > > >>>>>>>> ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian > Seward > > >>>>>>>> et al. > > >>>>>>>> ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for > > >>>>>>>> copyright info > > >>>>>>>> ==12415== Memcheck, a memory error detector > > >>>>>>>> ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian > Seward > > >>>>>>>> et al. > > >>>>>>>> ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for > > >>>>>>>> copyright info > > >>>>>>>> ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 > > >>>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg > -ksp_rtol > > >>>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 > -pc_type gamg > > >>>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 > -pc_gamg_coarse_eq_limit 10 > > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > -snes_converged_reason > > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > -mg_levels_ksp_type > > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 > > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -mg_levels_pc_type > > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size > 3 -run_type 1 > > >>>>>>>> ==12415== > > >>>>>>>> ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 > > >>>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg > -ksp_rtol > > >>>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 > -pc_type gamg > > >>>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 > -pc_gamg_coarse_eq_limit 10 > > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > -snes_converged_reason > > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > -mg_levels_ksp_type > > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 > > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -mg_levels_pc_type > > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size > 3 -run_type 1 > > >>>>>>>> ==12414== > > >>>>>>>> [0] 27 global equations, 9 vertices > > >>>>>>>> [0] 27 equations in vector, 9 vertices > > >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > > >>>>>>>> iterations 1 > > >>>>>>>> [0] 441 global equations, 147 vertices > > >>>>>>>> [0] 441 equations in vector, 147 vertices > > >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > > >>>>>>>> iterations 1 > > >>>>>>>> [0] 4725 global equations, 1575 vertices > > >>>>>>>> [0] 4725 equations in vector, 1575 vertices > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Nov 10 19:30:04 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 10 Nov 2017 19:30:04 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: Ok - I just ran this example with valgrind (and parmetis) on my laptop - and got no valgrind errors. Will have a build tonight of this branch in next-tmp - and then look at merging it tomorrow. BTW: if the destination for this branch is maint - then its best to use the name 'hzhang/fix-submat_samerowdist/maint' to make it more obvious. Satish On Fri, 10 Nov 2017, Hong wrote: > Satish : > > > I don't see any changes hzhang/fix-submat_samerowdist in past few > > days. So how did this error get fixed? > > > > I've never seen valgrind error for ex56 from nightly tests. > Mark told me about Valgrind error on his mac, and I reproduced it on our > linux machine with my own configure, > which directs to lapack. > > > > > please do not merge yet. > > > Sure. > > Hong > > > > > On Fri, 10 Nov 2017, Mark Adams wrote: > > > > > I don't see a problem with this. As far as what branch to merge it into, > > > that is a Barry/Satish/etc decision. > > > > > > On Fri, Nov 10, 2017 at 5:17 PM, Hong wrote: > > > > > > > Mark: > > > > > > > >> I would add: > > > >> > > > >> '--download-fblaslapack=1', > > > >> > > > >> This is what I have on my Linux machine (cg at ANL) and it runs clean. > > > >> > > > > I do not see any error report from nightly tests for ex56. Should I > > merge > > > > this branch to maint? > > > > Hong > > > > > > > >> > > > >> On Fri, Nov 10, 2017 at 11:03 AM, Hong wrote: > > > >> > > > >>> I use > > > >>> Using configure Options: --download-metis --download-mpich > > > >>> --download-mumps --download-parmetis --download-scalapack > > > >>> --download-superlu --download-superlu_dist --download-suitesparse > > > >>> --download-hypre --download-ptscotch --download-chaco --with-ctable=1 > > > >>> --download-cmake --with-cc=gcc --with-cxx=g++ --with-debugging=1 > > > >>> --with-visibility=0 --with-fc=gfortran > > > >>> Hong > > > >>> > > > >>> On Fri, Nov 10, 2017 at 9:59 AM, Mark Adams wrote: > > > >>> > > > >>>> This must be a configure issue. I don't see these warning: > > > >>>> > > > >>>> #!/usr/bin/python > > > >>>> if __name__ == '__main__': > > > >>>> import sys > > > >>>> import os > > > >>>> sys.path.insert(0, os.path.abspath('config')) > > > >>>> import configure > > > >>>> configure_options = [ > > > >>>> '--with-cc=clang', > > > >>>> '--with-cc++=clang++', > > > >>>> '--download-mpich=1', > > > >>>> '--download-metis=1', > > > >>>> '--download-superlu=1', > > > >>>> '--download-superlu_dist=1', > > > >>>> '--download-parmetis=1', > > > >>>> '--download-fblaslapack=1', > > > >>>> '--download-p4est=1', > > > >>>> '--with-debugging=1', > > > >>>> '--with-batch=0', > > > >>>> 'PETSC_ARCH=arch-linux2-c-dbg32', > > > >>>> '--with-openmp=0', > > > >>>> '--download-p4est=0' > > > >>>> ] > > > >>>> configure.petsc_configure(configure_options) > > > >>>> > > > >>>> ~ > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> On Fri, Nov 10, 2017 at 10:56 AM, Mark Adams > > wrote: > > > >>>> > > > >>>>> This is comming from blas. How did you configure blas? > > > >>>>> > > > >>>>> On Fri, Nov 10, 2017 at 10:38 AM, Hong wrote: > > > >>>>> > > > >>>>>> Using petsc machine, I get > > > >>>>>> hzhang at petsc /sandbox/hzhang/petsc/src/snes/examples/tutorials > > > >>>>>> (hzhang/fix-submat_samerowdist) > > > >>>>>> $ mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 > > > >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg > > -ksp_rtol > > > >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type > > gamg > > > >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 > > -pc_gamg_coarse_eq_limit 10 > > > >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > > > >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > > -snes_converged_reason > > > >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > > -mg_levels_ksp_type > > > >>>>>> chebyshev -mg_levels_esteig_ksp_type cg > > -mg_levels_esteig_ksp_max_it 10 > > > >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type > > > >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 > > -run_type 1 > > > >>>>>> > > > >>>>>> ==28811== Memcheck, a memory error detector > > > >>>>>> ==28811== Copyright (C) 2002-2013, and GNU GPL'd, by Julian > > Seward et > > > >>>>>> al. > > > >>>>>> ==28811== Using Valgrind-3.10.1 and LibVEX; rerun with -h for > > > >>>>>> copyright info > > > >>>>>> ==28811== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 > > > >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg > > -ksp_rtol > > > >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type > > gamg > > > >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 > > -pc_gamg_coarse_eq_limit 10 > > > >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > > > >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > > -snes_converged_reason > > > >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > > -mg_levels_ksp_type > > > >>>>>> chebyshev -mg_levels_esteig_ksp_type cg > > -mg_levels_esteig_ksp_max_it 10 > > > >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type > > > >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 > > -run_type 1 > > > >>>>>> ... > > > >>>>>> ==28811== Invalid read of size 16 > > > >>>>>> ==28811== at 0x8550946: dswap_k_NEHALEM (in > > > >>>>>> /usr/lib/openblas-base/libblas.so.3) > > > >>>>>> ==28811== by 0x7C6797F: dswap_ (in > > /usr/lib/openblas-base/libblas > > > >>>>>> .so.3) > > > >>>>>> ==28811== by 0x75B33B2: dgetri_ (in > > /usr/lib/lapack/liblapack.so.3 > > > >>>>>> .0) > > > >>>>>> ==28811== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) > > > >>>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > > > >>>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > > > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > > > >>>>>> ==28811== Address 0xdc650d0 is 52,480 bytes inside a block of > > size > > > >>>>>> 52,488 alloc'd > > > >>>>>> ==28811== at 0x4C2D110: memalign (in > > /usr/lib/valgrind/vgpreload_me > > > >>>>>> mcheck-amd64-linux.so) > > > >>>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) > > > >>>>>> ==28811== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) > > > >>>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > > > >>>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > > > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > > > >>>>>> ==28811== > > > >>>>>> ==28812== Invalid read of size 16 > > > >>>>>> ==28812== at 0x8550946: dswap_k_NEHALEM (in > > > >>>>>> /usr/lib/openblas-base/libblas.so.3) > > > >>>>>> ==28812== by 0x7C6797F: dswap_ (in > > /usr/lib/openblas-base/libblas > > > >>>>>> .so.3) > > > >>>>>> ==28812== by 0x75B33B2: dgetri_ (in > > /usr/lib/lapack/liblapack.so.3 > > > >>>>>> .0) > > > >>>>>> ==28812== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) > > > >>>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > > > >>>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > > > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > > > >>>>>> ==28812== Address 0xd9c7600 is 52,480 bytes inside a block of > > size > > > >>>>>> 52,488 alloc'd > > > >>>>>> ==28812== at 0x4C2D110: memalign (in > > /usr/lib/valgrind/vgpreload_me > > > >>>>>> mcheck-amd64-linux.so) > > > >>>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) > > > >>>>>> ==28812== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) > > > >>>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) > > > >>>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) > > > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > > > >>>>>> ==28812== > > > >>>>>> ==28811== Invalid read of size 16 > > > >>>>>> ==28811== at 0x8550A55: dswap_k_NEHALEM (in > > > >>>>>> /usr/lib/openblas-base/libblas.so.3) > > > >>>>>> ==28811== by 0x7C6797F: dswap_ (in > > /usr/lib/openblas-base/libblas > > > >>>>>> .so.3) > > > >>>>>> ==28811== by 0x7675179: dsteqr_ (in > > /usr/lib/lapack/liblapack.so.3 > > > >>>>>> .0) > > > >>>>>> ==28811== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) > > > >>>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > > > >>>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > > > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > > > >>>>>> ==28811== Address 0xd99cbe0 is 64 bytes inside a block of size 72 > > > >>>>>> alloc'd > > > >>>>>> ==28811== at 0x4C2D110: memalign (in > > /usr/lib/valgrind/vgpreload_me > > > >>>>>> mcheck-amd64-linux.so) > > > >>>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) > > > >>>>>> ==28811== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) > > > >>>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > > > >>>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > > > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) > > > >>>>>> ==28811== > > > >>>>>> ==28812== Invalid read of size 16 > > > >>>>>> ==28812== at 0x8550A55: dswap_k_NEHALEM (in > > > >>>>>> /usr/lib/openblas-base/libblas.so.3) > > > >>>>>> ==28812== by 0x7C6797F: dswap_ (in > > /usr/lib/openblas-base/libblas > > > >>>>>> .so.3) > > > >>>>>> ==28812== by 0x7675179: dsteqr_ (in > > /usr/lib/lapack/liblapack.so.3 > > > >>>>>> .0) > > > >>>>>> ==28812== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) > > > >>>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > > > >>>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > > > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > > > >>>>>> ==28812== Address 0xdc11f30 is 64 bytes inside a block of size 72 > > > >>>>>> alloc'd > > > >>>>>> ==28812== at 0x4C2D110: memalign (in > > /usr/lib/valgrind/vgpreload_me > > > >>>>>> mcheck-amd64-linux.so) > > > >>>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) > > > >>>>>> ==28812== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) > > > >>>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature (dt.c:582) > > > >>>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) > > > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) > > > >>>>>> ==28812== > > > >>>>>> [0] 27 global equations, 9 vertices > > > >>>>>> [0] 27 equations in vector, 9 vertices > > > >>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > > iterations 1 > > > >>>>>> [0] 441 global equations, 147 vertices > > > >>>>>> [0] 441 equations in vector, 147 vertices > > > >>>>>> > > > >>>>>> hangs here ... > > > >>>>>> > > > >>>>>> Hong > > > >>>>>> > > > >>>>>> On Fri, Nov 10, 2017 at 8:57 AM, Mark Adams > > wrote: > > > >>>>>> > > > >>>>>>> This printed a little funny in gmail, snes/ex56 is running clean > > in > > > >>>>>>> the first few loops (appended), but the last one is the one with > > a reduced > > > >>>>>>> processor set. Still waiting. This is with 32 bit integers. I'm > > running > > > >>>>>>> another with 64 bit integers. > > > >>>>>>> > > > >>>>>>> ... > > > >>>>>>> [0] 27 global equations, 9 vertices > > > >>>>>>> [0] 27 equations in vector, 9 vertices > > > >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > > iterations > > > >>>>>>> 1 > > > >>>>>>> [0] 441 global equations, 147 vertices > > > >>>>>>> [0] 441 equations in vector, 147 vertices > > > >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > > iterations > > > >>>>>>> 1 > > > >>>>>>> [0] 4725 global equations, 1575 vertices > > > >>>>>>> [0] 4725 equations in vector, 1575 vertices > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams > > wrote: > > > >>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> On Thu, Nov 9, 2017 at 1:56 PM, Hong > > wrote: > > > >>>>>>>> > > > >>>>>>>>> Mark: > > > >>>>>>>>> > > > >>>>>>>>>> OK, well, just go with the Linux machine for the regression > > test. > > > >>>>>>>>>> I will keep trying to reproduce this on my Mac with an O > > build. > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> Valgrind error occurs on linux machines with g-build. I cannot > > > >>>>>>>>> merge this branch to maint until the bug is fixed. > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>>> Valgrind is failing on this run on my Mac. Moving to cg, like > > you I > > > >>>>>>>> suppose. This takes forever. This is what I have so far. Did > > you get this > > > >>>>>>>> far? > > > >>>>>>>> > > > >>>>>>>> 07:48 hzhang/fix-submat_samerowdist *= > > > >>>>>>>> /sandbox/adams/petsc/src/snes/examples/tutorials$ make > > > >>>>>>>> PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 > > val > > > >>>>>>>> /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 > > valgrind > > > >>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 > > -snes_max_it 2 > > > >>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type > > > >>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type > > agg > > > >>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 > > > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > > > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > > -snes_converged_reason > > > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > > -mg_levels_ksp_type > > > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg > > -mg_levels_esteig_ksp_max_it 10 > > > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > > -mg_levels_pc_type > > > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size > > 3 -run_type 1 > > > >>>>>>>> ==12414== Memcheck, a memory error detector > > > >>>>>>>> ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian > > Seward > > > >>>>>>>> et al. > > > >>>>>>>> ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for > > > >>>>>>>> copyright info > > > >>>>>>>> ==12415== Memcheck, a memory error detector > > > >>>>>>>> ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian > > Seward > > > >>>>>>>> et al. > > > >>>>>>>> ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for > > > >>>>>>>> copyright info > > > >>>>>>>> ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 > > > >>>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg > > -ksp_rtol > > > >>>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 > > -pc_type gamg > > > >>>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 > > -pc_gamg_coarse_eq_limit 10 > > > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > > > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > > -snes_converged_reason > > > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > > -mg_levels_ksp_type > > > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg > > -mg_levels_esteig_ksp_max_it 10 > > > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > > -mg_levels_pc_type > > > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size > > 3 -run_type 1 > > > >>>>>>>> ==12415== > > > >>>>>>>> ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 > > > >>>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg > > -ksp_rtol > > > >>>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 > > -pc_type gamg > > > >>>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 > > -pc_gamg_coarse_eq_limit 10 > > > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 > > > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 > > -snes_converged_reason > > > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 > > -mg_levels_ksp_type > > > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg > > -mg_levels_esteig_ksp_max_it 10 > > > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > > -mg_levels_pc_type > > > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size > > 3 -run_type 1 > > > >>>>>>>> ==12414== > > > >>>>>>>> [0] 27 global equations, 9 vertices > > > >>>>>>>> [0] 27 equations in vector, 9 vertices > > > >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > > > >>>>>>>> iterations 1 > > > >>>>>>>> [0] 441 global equations, 147 vertices > > > >>>>>>>> [0] 441 equations in vector, 147 vertices > > > >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE > > > >>>>>>>> iterations 1 > > > >>>>>>>> [0] 4725 global equations, 1575 vertices > > > >>>>>>>> [0] 4725 equations in vector, 1575 vertices > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > > > > > > > From mfadams at lbl.gov Fri Nov 10 19:40:17 2017 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Nov 2017 20:40:17 -0500 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: On Fri, Nov 10, 2017 at 6:09 PM, Hong wrote: > Satish : > >> I don't see any changes hzhang/fix-submat_samerowdist in past few >> days. So how did this error get fixed? >> > > I've never seen valgrind error for ex56 from nightly tests. > Mark told me about Valgrind error on his mac, > Valgrind segv'ed on my Mac. (This makes me suspicious because valgrind has been working OK for my on my Mac, but let move on.) I ran on Linux (CG at ANL) and it ran fine. The error that Hong is seeing is in LAPACK. I download LAPACK (on CG, not my Mac) but Hong does not seem to be downloading LAPACK. So I think that the lapack that configure is picking up throwing these valgrind warnings. and I reproduced it on our linux machine with my own configure, > which directs to lapack. > >> >> please do not merge yet. >> > Sure. > > Hong > >> >> On Fri, 10 Nov 2017, Mark Adams wrote: >> >> > I don't see a problem with this. As far as what branch to merge it into, >> > that is a Barry/Satish/etc decision. >> > >> > On Fri, Nov 10, 2017 at 5:17 PM, Hong wrote: >> > >> > > Mark: >> > > >> > >> I would add: >> > >> >> > >> '--download-fblaslapack=1', >> > >> >> > >> This is what I have on my Linux machine (cg at ANL) and it runs >> clean. >> > >> >> > > I do not see any error report from nightly tests for ex56. Should I >> merge >> > > this branch to maint? >> > > Hong >> > > >> > >> >> > >> On Fri, Nov 10, 2017 at 11:03 AM, Hong wrote: >> > >> >> > >>> I use >> > >>> Using configure Options: --download-metis --download-mpich >> > >>> --download-mumps --download-parmetis --download-scalapack >> > >>> --download-superlu --download-superlu_dist --download-suitesparse >> > >>> --download-hypre --download-ptscotch --download-chaco >> --with-ctable=1 >> > >>> --download-cmake --with-cc=gcc --with-cxx=g++ --with-debugging=1 >> > >>> --with-visibility=0 --with-fc=gfortran >> > >>> Hong >> > >>> >> > >>> On Fri, Nov 10, 2017 at 9:59 AM, Mark Adams >> wrote: >> > >>> >> > >>>> This must be a configure issue. I don't see these warning: >> > >>>> >> > >>>> #!/usr/bin/python >> > >>>> if __name__ == '__main__': >> > >>>> import sys >> > >>>> import os >> > >>>> sys.path.insert(0, os.path.abspath('config')) >> > >>>> import configure >> > >>>> configure_options = [ >> > >>>> '--with-cc=clang', >> > >>>> '--with-cc++=clang++', >> > >>>> '--download-mpich=1', >> > >>>> '--download-metis=1', >> > >>>> '--download-superlu=1', >> > >>>> '--download-superlu_dist=1', >> > >>>> '--download-parmetis=1', >> > >>>> '--download-fblaslapack=1', >> > >>>> '--download-p4est=1', >> > >>>> '--with-debugging=1', >> > >>>> '--with-batch=0', >> > >>>> 'PETSC_ARCH=arch-linux2-c-dbg32', >> > >>>> '--with-openmp=0', >> > >>>> '--download-p4est=0' >> > >>>> ] >> > >>>> configure.petsc_configure(configure_options) >> > >>>> >> > >>>> ~ >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> On Fri, Nov 10, 2017 at 10:56 AM, Mark Adams >> wrote: >> > >>>> >> > >>>>> This is comming from blas. How did you configure blas? >> > >>>>> >> > >>>>> On Fri, Nov 10, 2017 at 10:38 AM, Hong >> wrote: >> > >>>>> >> > >>>>>> Using petsc machine, I get >> > >>>>>> hzhang at petsc /sandbox/hzhang/petsc/src/snes/examples/tutorials >> > >>>>>> (hzhang/fix-submat_samerowdist) >> > >>>>>> $ mpiexec -n 2 valgrind ./ex56 -cells 2,2,1 -max_conv_its 3 >> > >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg >> -ksp_rtol >> > >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 >> -pc_type gamg >> > >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >> -pc_gamg_coarse_eq_limit 10 >> > >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >> > >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 >> -snes_converged_reason >> > >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 >> -mg_levels_ksp_type >> > >>>>>> chebyshev -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 >> > >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >> > >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 >> -run_type 1 >> > >>>>>> >> > >>>>>> ==28811== Memcheck, a memory error detector >> > >>>>>> ==28811== Copyright (C) 2002-2013, and GNU GPL'd, by Julian >> Seward et >> > >>>>>> al. >> > >>>>>> ==28811== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >> > >>>>>> copyright info >> > >>>>>> ==28811== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >> > >>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg >> -ksp_rtol >> > >>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 >> -pc_type gamg >> > >>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >> -pc_gamg_coarse_eq_limit 10 >> > >>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >> > >>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 >> -snes_converged_reason >> > >>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 >> -mg_levels_ksp_type >> > >>>>>> chebyshev -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 >> > >>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type >> > >>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size 3 >> -run_type 1 >> > >>>>>> ... >> > >>>>>> ==28811== Invalid read of size 16 >> > >>>>>> ==28811== at 0x8550946: dswap_k_NEHALEM (in >> > >>>>>> /usr/lib/openblas-base/libblas.so.3) >> > >>>>>> ==28811== by 0x7C6797F: dswap_ (in >> /usr/lib/openblas-base/libblas >> > >>>>>> .so.3) >> > >>>>>> ==28811== by 0x75B33B2: dgetri_ (in >> /usr/lib/lapack/liblapack.so.3 >> > >>>>>> .0) >> > >>>>>> ==28811== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >> > >>>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >> > >>>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >> > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) >> > >>>>>> ==28811== Address 0xdc650d0 is 52,480 bytes inside a block of >> size >> > >>>>>> 52,488 alloc'd >> > >>>>>> ==28811== at 0x4C2D110: memalign (in >> /usr/lib/valgrind/vgpreload_me >> > >>>>>> mcheck-amd64-linux.so) >> > >>>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >> > >>>>>> ==28811== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >> > >>>>>> ==28811== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >> > >>>>>> ==28811== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >> > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) >> > >>>>>> ==28811== >> > >>>>>> ==28812== Invalid read of size 16 >> > >>>>>> ==28812== at 0x8550946: dswap_k_NEHALEM (in >> > >>>>>> /usr/lib/openblas-base/libblas.so.3) >> > >>>>>> ==28812== by 0x7C6797F: dswap_ (in >> /usr/lib/openblas-base/libblas >> > >>>>>> .so.3) >> > >>>>>> ==28812== by 0x75B33B2: dgetri_ (in >> /usr/lib/lapack/liblapack.so.3 >> > >>>>>> .0) >> > >>>>>> ==28812== by 0x5E3CA5C: PetscFESetUp_Basic (dtfe.c:4012) >> > >>>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >> > >>>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >> > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) >> > >>>>>> ==28812== Address 0xd9c7600 is 52,480 bytes inside a block of >> size >> > >>>>>> 52,488 alloc'd >> > >>>>>> ==28812== at 0x4C2D110: memalign (in >> /usr/lib/valgrind/vgpreload_me >> > >>>>>> mcheck-amd64-linux.so) >> > >>>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >> > >>>>>> ==28812== by 0x5E3C169: PetscFESetUp_Basic (dtfe.c:3983) >> > >>>>>> ==28812== by 0x5E320C9: PetscFESetUp (dtfe.c:3274) >> > >>>>>> ==28812== by 0x5E5786F: PetscFECreateDefault (dtfe.c:6749) >> > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) >> > >>>>>> ==28812== >> > >>>>>> ==28811== Invalid read of size 16 >> > >>>>>> ==28811== at 0x8550A55: dswap_k_NEHALEM (in >> > >>>>>> /usr/lib/openblas-base/libblas.so.3) >> > >>>>>> ==28811== by 0x7C6797F: dswap_ (in >> /usr/lib/openblas-base/libblas >> > >>>>>> .so.3) >> > >>>>>> ==28811== by 0x7675179: dsteqr_ (in >> /usr/lib/lapack/liblapack.so.3 >> > >>>>>> .0) >> > >>>>>> ==28811== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >> > >>>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature >> (dt.c:582) >> > >>>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >> > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) >> > >>>>>> ==28811== Address 0xd99cbe0 is 64 bytes inside a block of size >> 72 >> > >>>>>> alloc'd >> > >>>>>> ==28811== at 0x4C2D110: memalign (in >> /usr/lib/valgrind/vgpreload_me >> > >>>>>> mcheck-amd64-linux.so) >> > >>>>>> ==28811== by 0x51590F6: PetscMallocAlign (mal.c:39) >> > >>>>>> ==28811== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >> > >>>>>> ==28811== by 0x5E00BD8: PetscDTGaussTensorQuadrature >> (dt.c:582) >> > >>>>>> ==28811== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >> > >>>>>> ==28811== by 0x41056E: main (ex56.c:395) >> > >>>>>> ==28811== >> > >>>>>> ==28812== Invalid read of size 16 >> > >>>>>> ==28812== at 0x8550A55: dswap_k_NEHALEM (in >> > >>>>>> /usr/lib/openblas-base/libblas.so.3) >> > >>>>>> ==28812== by 0x7C6797F: dswap_ (in >> /usr/lib/openblas-base/libblas >> > >>>>>> .so.3) >> > >>>>>> ==28812== by 0x7675179: dsteqr_ (in >> /usr/lib/lapack/liblapack.so.3 >> > >>>>>> .0) >> > >>>>>> ==28812== by 0x5DFFA22: PetscDTGaussQuadrature (dt.c:508) >> > >>>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature >> (dt.c:582) >> > >>>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >> > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) >> > >>>>>> ==28812== Address 0xdc11f30 is 64 bytes inside a block of size >> 72 >> > >>>>>> alloc'd >> > >>>>>> ==28812== at 0x4C2D110: memalign (in >> /usr/lib/valgrind/vgpreload_me >> > >>>>>> mcheck-amd64-linux.so) >> > >>>>>> ==28812== by 0x51590F6: PetscMallocAlign (mal.c:39) >> > >>>>>> ==28812== by 0x5DFF766: PetscDTGaussQuadrature (dt.c:504) >> > >>>>>> ==28812== by 0x5E00BD8: PetscDTGaussTensorQuadrature >> (dt.c:582) >> > >>>>>> ==28812== by 0x5E57D7A: PetscFECreateDefault (dtfe.c:6763) >> > >>>>>> ==28812== by 0x41056E: main (ex56.c:395) >> > >>>>>> ==28812== >> > >>>>>> [0] 27 global equations, 9 vertices >> > >>>>>> [0] 27 equations in vector, 9 vertices >> > >>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >> iterations 1 >> > >>>>>> [0] 441 global equations, 147 vertices >> > >>>>>> [0] 441 equations in vector, 147 vertices >> > >>>>>> >> > >>>>>> hangs here ... >> > >>>>>> >> > >>>>>> Hong >> > >>>>>> >> > >>>>>> On Fri, Nov 10, 2017 at 8:57 AM, Mark Adams >> wrote: >> > >>>>>> >> > >>>>>>> This printed a little funny in gmail, snes/ex56 is running >> clean in >> > >>>>>>> the first few loops (appended), but the last one is the one >> with a reduced >> > >>>>>>> processor set. Still waiting. This is with 32 bit integers. I'm >> running >> > >>>>>>> another with 64 bit integers. >> > >>>>>>> >> > >>>>>>> ... >> > >>>>>>> [0] 27 global equations, 9 vertices >> > >>>>>>> [0] 27 equations in vector, 9 vertices >> > >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >> iterations >> > >>>>>>> 1 >> > >>>>>>> [0] 441 global equations, 147 vertices >> > >>>>>>> [0] 441 equations in vector, 147 vertices >> > >>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >> iterations >> > >>>>>>> 1 >> > >>>>>>> [0] 4725 global equations, 1575 vertices >> > >>>>>>> [0] 4725 equations in vector, 1575 vertices >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> On Fri, Nov 10, 2017 at 9:06 AM, Mark Adams >> wrote: >> > >>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> On Thu, Nov 9, 2017 at 1:56 PM, Hong >> wrote: >> > >>>>>>>> >> > >>>>>>>>> Mark: >> > >>>>>>>>> >> > >>>>>>>>>> OK, well, just go with the Linux machine for the regression >> test. >> > >>>>>>>>>> I will keep trying to reproduce this on my Mac with an O >> build. >> > >>>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> Valgrind error occurs on linux machines with g-build. I cannot >> > >>>>>>>>> merge this branch to maint until the bug is fixed. >> > >>>>>>>>> >> > >>>>>>>> >> > >>>>>>>> Valgrind is failing on this run on my Mac. Moving to cg, like >> you I >> > >>>>>>>> suppose. This takes forever. This is what I have so far. Did >> you get this >> > >>>>>>>> far? >> > >>>>>>>> >> > >>>>>>>> 07:48 hzhang/fix-submat_samerowdist *= >> > >>>>>>>> /sandbox/adams/petsc/src/snes/examples/tutorials$ make >> > >>>>>>>> PETSC_DIR=/sandbox/adams/petsc PETSC_ARCH=arch-linux2-c-dbg32 >> val >> > >>>>>>>> /sandbox/adams/petsc/arch-linux2-c-dbg32/bin/mpiexec -n 2 >> valgrind >> > >>>>>>>> ./ex56 -cells 2,2,1 -max_conv_its 3 -petscspace_order 2 >> -snes_max_it 2 >> > >>>>>>>> -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-11 -ksp_norm_type >> > >>>>>>>> unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type >> agg >> > >>>>>>>> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 >> > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >> > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 >> -snes_converged_reason >> > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 >> -mg_levels_ksp_type >> > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 >> > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >> -mg_levels_pc_type >> > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size >> 3 -run_type 1 >> > >>>>>>>> ==12414== Memcheck, a memory error detector >> > >>>>>>>> ==12414== Copyright (C) 2002-2013, and GNU GPL'd, by Julian >> Seward >> > >>>>>>>> et al. >> > >>>>>>>> ==12414== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >> > >>>>>>>> copyright info >> > >>>>>>>> ==12415== Memcheck, a memory error detector >> > >>>>>>>> ==12415== Copyright (C) 2002-2013, and GNU GPL'd, by Julian >> Seward >> > >>>>>>>> et al. >> > >>>>>>>> ==12415== Using Valgrind-3.10.1 and LibVEX; rerun with -h for >> > >>>>>>>> copyright info >> > >>>>>>>> ==12415== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >> > >>>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type >> cg -ksp_rtol >> > >>>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 >> -pc_type gamg >> > >>>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >> -pc_gamg_coarse_eq_limit 10 >> > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >> > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 >> -snes_converged_reason >> > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 >> -mg_levels_ksp_type >> > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 >> > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >> -mg_levels_pc_type >> > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size >> 3 -run_type 1 >> > >>>>>>>> ==12415== >> > >>>>>>>> ==12414== Command: ./ex56 -cells 2,2,1 -max_conv_its 3 >> > >>>>>>>> -petscspace_order 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type >> cg -ksp_rtol >> > >>>>>>>> 1.e-11 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 >> -pc_type gamg >> > >>>>>>>> -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >> -pc_gamg_coarse_eq_limit 10 >> > >>>>>>>> -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 >> > >>>>>>>> -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 >> -snes_converged_reason >> > >>>>>>>> -use_mat_nearnullspace true -mg_levels_ksp_max_it 1 >> -mg_levels_ksp_type >> > >>>>>>>> chebyshev -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 >> > >>>>>>>> -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >> -mg_levels_pc_type >> > >>>>>>>> jacobi -pc_gamg_mat_partitioning_type parmetis -mat_block_size >> 3 -run_type 1 >> > >>>>>>>> ==12414== >> > >>>>>>>> [0] 27 global equations, 9 vertices >> > >>>>>>>> [0] 27 equations in vector, 9 vertices >> > >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >> > >>>>>>>> iterations 1 >> > >>>>>>>> [0] 441 global equations, 147 vertices >> > >>>>>>>> [0] 441 equations in vector, 147 vertices >> > >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE >> > >>>>>>>> iterations 1 >> > >>>>>>>> [0] 4725 global equations, 1575 vertices >> > >>>>>>>> [0] 4725 equations in vector, 1575 vertices >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>> >> > >>>>>> >> > >>>>> >> > >>>> >> > >>> >> > >> >> > > >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat Nov 11 09:48:20 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 11 Nov 2017 09:48:20 -0600 Subject: [petsc-users] unsorted local columns in 3.8? In-Reply-To: References: Message-ID: I've merged this branch to maint. Satish On Sat, 11 Nov 2017, Satish Balay wrote: > Will have a build tonight of this branch in next-tmp - and then look > at merging it tomorrow. From bsmith at mcs.anl.gov Sat Nov 11 14:12:14 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sat, 11 Nov 2017 20:12:14 +0000 Subject: [petsc-users] Newton LS - better results on single processor In-Reply-To: References: <6A14B327-7D3E-4A0E-81D1-0595DEC2B099@mcs.anl.gov> Message-ID: > On Nov 10, 2017, at 4:18 PM, zakaryah . wrote: > > Thanks for the advice. > > I put a VecView() on the solution vector, both before and after all the solves, and in the SNES Function and Jacobian, as well as a MatView() in the Jacobian, and a VecView() on the residual after the solves. Then I run a tiny problem with -pc_type redundant -redundant_pc_type lu, as Stefano suggested, and I compare the vectors and matrices with -n 1 to -n 2. Although the monitor shows somewhat different residuals for the KSP and for the SNES, the differences are very small. For example, after the first SNESSolve(), the SNES residual is 6.5e-18 with -n 1 and 6.0e-18 with -n 2, and of course I don't care about that tiny difference unless it indicates some mistake in my code. The VecView() and MatView() show that the state vector, the function vector, and the function Jacobian are identical (up to the default output precision of the view routines). The residuals are of course slightly different. > > For this problem it took 8 Riks iterations for the loading coefficient to reach 1 (i.e. to finish the iterations). For the last solve, the residuals and their differences were larger: 8.4e-15 with -n 1 and 8.7e-15 with -n 2. I think this supports my hypothesis that the iterations which feed one SNESSolve() solution into the initial guess for the next solve can amplify small differences. > > To check a bit deeper, I removed all of the view calls except to the SNES residual, and ran on a more realistic problem size, with the SNES defaults (and KSP defaults, PC defaults, etc). > > I will call the residual after the first SNESSolve Ri, and the residual after the last SNESSolve Rf. With -n 1, Ri and Rf are both spatially smooth (as I expect). I think the standard deviation (over space) of the residual is a useful way to quantify its amplitude; for -n 1, sd(Ri) = 5.1e-13 and sd(Rf) = 6.8e-12. With -n 2, both Ri and Rf have a discontinuity in x, at x=0, x=20, x=21, and x=41 (i.e. at the global boundary and at the boundary between the two processes). For -n 2, sd(Ri) = 1.2e-12 and sd(Rf) = 5.7e-12, with all of the additional fluctuations coming from those boundary coordinates. In other words, the fluctuations of the residual are an order of magnitude larger at the boundary between processes. So the residual is order 5e-11 at the boundaries and order 5e-12 elsewhere? This is nothing. Residuals that small are close to meaningless anyways. If the residuals are 5e-6 at the boundaries and 5e-12 elsewhere then I would be worried. > > If I consider the residuals as a function of the other dimensions (y or z instead of x), the entire range of which is owned by each processor, I don't see any discontinuity. > > I suppose that all of this has something to do with the spectrums of the matrices involved in the solve but I don't know enough to improve the results I'm obtaining. > > On Thu, Nov 9, 2017 at 11:09 PM, Smith, Barry F. wrote: > > > > On Nov 9, 2017, at 3:33 PM, zakaryah . wrote: > > > > Hi Stefano - when I referred to the iterations, I was trying to point out that my method solves a series of nonlinear systems, with the solution to the first problem being used to initialize the state vector for the second problem, etc. The reason I mentioned that was I thought perhaps I can expect the residuals from single process solve to differ from the residuals from multiprocess solve by a very small amount, say machine precision or the tolerance of the KSP/SNES, that would be fine normally. But, if there is a possibility that those differences are somehow amplified by each of the iterations (solution->initial state), that could explain what I see. > > > > I agree that it is more likely that I have a bug in my code but I'm having trouble finding it. > > Run a tiny problem on one and two processes with LU linear solver and the same mesh. So in the first case all values live on the first process and in the second the same first half live on one process and the second half on the second process. > > Now track the values in the actual vectors and matrices. For example you can just put in VecView() and MatView() on all objects you pass into the solver and then put them in the SNESComputeFunction/Jacobian routines. Print both the vectors inputed to these routines and the vectors/matrices created in the routines. The output differences from the two runs should be small, determine when they significantly vary. This will tell you the likely location of the bug in your source code. (For example if certain values of the Jacobian differ) > > Good luck, I've done this plenty of times and if it is a "parallelization" bug this will help you find it much faster than guessing where the problem is and trying code inspect to find the bug. > > Barry > > > > > I ran a small problem with -pc_type redundant -redundant_pc_type lu, as you suggested. What I think is the relevant portion of the output is here (i.e. there are small differences in the KSP residuals and SNES residuals): > > > > -n 1, first "iteration" as described above: > > > > 0 SNES Function norm 6.053565720454e-02 > > 0 KSP Residual norm 4.883115701982e-05 > > > > 0 KSP preconditioned resid norm 4.883115701982e-05 true resid norm 6.053565720454e-02 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 8.173640409069e-20 > > > > 1 KSP preconditioned resid norm 8.173640409069e-20 true resid norm 1.742143029296e-16 ||r(i)||/||b|| 2.877879104227e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 6.053565720454e-02 gnorm 2.735518570862e-07 > > > > 1 SNES Function norm 2.735518570862e-07 > > > > 0 KSP Residual norm 1.298536630766e-10 > > > > 0 KSP preconditioned resid norm 1.298536630766e-10 true resid norm 2.735518570862e-07 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 2.152782096751e-25 > > > > 1 KSP preconditioned resid norm 2.152782096751e-25 true resid norm 4.755555202641e-22 ||r(i)||/||b|| 1.738447420279e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 2.735518570862e-07 gnorm 1.917989238989e-17 > > > > 2 SNES Function norm 1.917989238989e-17 > > > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 > > > > > > > > -n 2, first "iteration" as described above: > > > > 0 SNES Function norm 6.053565720454e-02 > > > > 0 KSP Residual norm 4.883115701982e-05 > > > > 0 KSP preconditioned resid norm 4.883115701982e-05 true resid norm 6.053565720454e-02 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.007084240718e-19 > > > > 1 KSP preconditioned resid norm 1.007084240718e-19 true resid norm 1.868472589717e-16 ||r(i)||/||b|| 3.086565300520e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 6.053565720454e-02 gnorm 2.735518570379e-07 > > > > 1 SNES Function norm 2.735518570379e-07 > > > > 0 KSP Residual norm 1.298536630342e-10 > > > > 0 KSP preconditioned resid norm 1.298536630342e-10 true resid norm 2.735518570379e-07 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.885083482938e-25 > > > > 1 KSP preconditioned resid norm 1.885083482938e-25 true resid norm 4.735707460766e-22 ||r(i)||/||b|| 1.731191852267e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 2.735518570379e-07 gnorm 1.851472273258e-17 > > > > > > 2 SNES Function norm 1.851472273258e-17 > > > > > > -n 1, final "iteration": > > 0 SNES Function norm 9.695669610792e+01 > > > > 0 KSP Residual norm 7.898912593878e-03 > > > > 0 KSP preconditioned resid norm 7.898912593878e-03 true resid norm 9.695669610792e+01 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.720960785852e-17 > > > > 1 KSP preconditioned resid norm 1.720960785852e-17 true resid norm 1.237111121391e-13 ||r(i)||/||b|| 1.275941911237e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 9.695669610792e+01 gnorm 1.026572731653e-01 > > > > 1 SNES Function norm 1.026572731653e-01 > > > > 0 KSP Residual norm 1.382450412926e-04 > > > > 0 KSP preconditioned resid norm 1.382450412926e-04 true resid norm 1.026572731653e-01 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 5.018078565710e-20 > > > > 1 KSP preconditioned resid norm 5.018078565710e-20 true resid norm 9.031463071676e-17 ||r(i)||/||b|| 8.797684560673e-16 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 1.026572731653e-01 gnorm 7.982937980399e-06 > > > > 2 SNES Function norm 7.982937980399e-06 > > > > 0 KSP Residual norm 4.223898196692e-08 > > > > 0 KSP preconditioned resid norm 4.223898196692e-08 true resid norm 7.982937980399e-06 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.038123933240e-22 > > > > 1 KSP preconditioned resid norm 1.038123933240e-22 true resid norm 3.213931469966e-20 ||r(i)||/||b|| 4.026000800530e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 7.982937980399e-06 gnorm 9.776066323463e-13 > > > > 3 SNES Function norm 9.776066323463e-13 > > > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3 > > > > -n 2, final "iteration": > > > > 0 SNES Function norm 9.695669610792e+01 > > > > 0 KSP Residual norm 7.898912593878e-03 > > > > 0 KSP preconditioned resid norm 7.898912593878e-03 true resid norm 9.695669610792e+01 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.752819851736e-17 > > > > 1 KSP preconditioned resid norm 1.752819851736e-17 true resid norm 1.017605437996e-13 ||r(i)||/||b|| 1.049546322064e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 9.695669610792e+01 gnorm 1.026572731655e-01 > > > > 1 SNES Function norm 1.026572731655e-01 > > > > 0 KSP Residual norm 1.382450412926e-04 > > > > 0 KSP preconditioned resid norm 1.382450412926e-04 true resid norm 1.026572731655e-01 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.701690118486e-19 > > > > 1 KSP preconditioned resid norm 1.701690118486e-19 true resid norm 9.077679331860e-17 ||r(i)||/||b|| 8.842704517606e-16 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 1.026572731655e-01 gnorm 7.982937883350e-06 > > > > 2 SNES Function norm 7.982937883350e-06 > > > > 0 KSP Residual norm 4.223898196594e-08 > > > > 0 KSP preconditioned resid norm 4.223898196594e-08 true resid norm 7.982937883350e-06 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP Residual norm 1.471638984554e-23 > > > > 1 KSP preconditioned resid norm 1.471638984554e-23 true resid norm 2.483672977401e-20 ||r(i)||/||b|| 3.111226735938e-15 > > > > Linear solve converged due to CONVERGED_RTOL iterations 1 > > > > Line search: Using full step: fnorm 7.982937883350e-06 gnorm 1.019121417798e-12 > > > > 3 SNES Function norm 1.019121417798e-12 > > > > > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3 > > > > > > > > Of course these differences are still very small, but this is only true for such a small problem size. For a regular sized problem, the differences at the final iteration can exceed 1 and even 100 at a particular grid point (i.e. in a sense that doesn't scale with problem size). > > > > I also compared -n 1 and -n 2 with the -snes_monitor_solution -ksp_view_rhs -ksp_view_mat -ksp_view_solution options on a tiny problem (5x5x5), and I was not able to find any differences in the Jacobian or the vectors, but I'm suspicious that this could be due to the output format, because even for the tiny problem there are non-trivial differences in the residuals of both the SNES and the KSP. > > > > In all cases, the differences in the residuals are localized to the boundary between parts of the displacement vector owned by the two processes. The SNES residual with -n 2 typically looks discontinuous across that boundary. > > > > > > On Thu, Nov 9, 2017 at 11:16 AM, zakaryah . wrote: > > Thanks Stefano, I will try what you suggest. > > > > ?Matt - my DM is a composite between the redundant field (loading coefficient, which is included in the Newton solve in Riks' method) and the displacements, which are represented by a 3D DA with 3 dof. I am using finite difference. > > > > Probably my problem comes from confusion over how the composite DM is organized. I am using FormFunction()?, and within that I call DMCompositeGetLocalVectors(), DMCompositeScatter(), DMDAVecGetArray(), and for the Jacobian, DMCompositeGetLocalISs() and MatGetLocalSubmatrix() to split J into Jbb, Jbh, Jhb, and Jhh, where b is the loading coefficient, and h is the displacements). The values of each submatrix are set using MatSetValuesLocal(). > > > > ?I'm most suspicious of the part of the Jacobian routine where I calculate the rows of Jhb, the columns of Jbh, and the corresponding values. I take the DA coordinates and ix,iy,iz, then calculate the row of Jhb as ((((iz-info->gzs)*info->gym + (iy-info->gys))*info->gxm + (ix-info->gxs))*info->dof+c), where info is the DA local info and c is the degree of freedom. The same calculation is performed for the column of Jbh. I suspect that the indexing of the DA vector is not so simple, but I don't know for a fact that I'm doing this incorrectly nor how to do this properly. > > > > ?Thanks for all the help!? > > > > > > On Nov 9, 2017 8:44 AM, "Matthew Knepley" wrote: > > On Thu, Nov 9, 2017 at 12:14 AM, zakaryah . wrote: > > Well the saga of my problem continues. As I described previously in an epic thread, I'm using the SNES to solve problems involving an elastic material on a rectangular grid, subjected to external forces. In any case, I'm occasionally getting poor convergence using Newton's method with line search. In troubleshooting by visualizing the residual, I saw that in data sets which had good convergence, the residual was nevertheless significantly larger along the boundary between different processors. Likewise, in data sets with poor convergence, the residual became very large on the boundary between different processors. The residual is not significantly larger on the physical boundary, i.e. the global boundary. When I run on a single process, convergence seems to be good on all data sets. > > > > Any clues to fix this? > > > > It sounds like something is wrong with communication across domains: > > > > - If this is FEM, it sounds like you are not adding contributions from the other domain to shared vertices/edges/faces > > > > - If this is FDM/FVM, maybe the ghosts are not updated > > > > What DM are you using? Are you using the Local assembly functions (FormFunctionLocal), or just FormFunction()? > > > > Thanks, > > > > Matt > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > From knepley at gmail.com Sun Nov 12 10:50:50 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 12 Nov 2017 11:50:50 -0500 Subject: [petsc-users] multi level octree AMR In-Reply-To: <0c4b331e-f56e-f376-053c-32cce996cd86@univ-amu.fr> References: <0c4b331e-f56e-f376-053c-32cce996cd86@univ-amu.fr> Message-ID: On Thu, Nov 9, 2017 at 10:35 AM, Yann JOBIC wrote: > Hi, > > I succeed in having a basic AMR running on my FE advection/diffusion > problem ( https://mycore.core-cloud.net/index.php/s/DmiiTwKUpV9z5qL) > > I now want to have multiple levels in my octree AMR from p4est. > Can we be more specific here. I assume p4est did more than 1 level of refinement to create the adapted mesh. Do you mean several meshes (a hierarchy)? > I tried a lot with IS tagged arrays, but i didn't succeed so far. When i > get from VecTaggerAbsoluteSetBox an IS array of the cells, it seems that > the DM cell numbering is changed after multiple time iterations (from > TSSolve), even if the DM is not changed at all. If the DM does not change, the point numbering and the dof numbering stay the same. > I tried to identify the cells to refine/coarse with ISDifference and > ISExpand, from two different time solutions of the same DM. That made me > wondering if i'm using the correct tool (IS array). > Can you give a pseudocode description of what you are doing? It sounds like you just want to call DMAdaptLabel() a few times in a row, with different labels and DMs. Thanks, Matt > Am i in the right direction ? > > Thanks, > > Regards, > > Yann > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Nov 12 11:31:25 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 12 Nov 2017 12:31:25 -0500 Subject: [petsc-users] DMPlexVecGetClosure for DM forest In-Reply-To: References: Message-ID: On Fri, Nov 10, 2017 at 5:13 AM, Yann JOBIC wrote: > On 11/09/2017 07:22 PM, Matthew Knepley wrote: > > On Thu, Nov 9, 2017 at 1:20 PM, Yann Jobic wrote: > >> Hello, >> I'm trying to access to the values of a p4est forest. >> I know how to do that by converting my forest to a DMPlex, and then use >> DMPlexVecGetClosure over the converted DM. >> However, i want to assign a label and access to the values of the forest >> directly. >> > > What do you mean by "directly". p4est only has topology. We use a Section > to map points to values, just like Plex. > > I feel stupid, but i don't know how to use a section to map points to > values. > There is a manual section on it, however it is quite simple. Everything in the mesh is numbered consecutively, so each point (cell, face, edge, vertex) has a number. The Section maps this number p to a pair (dof, off) p --> (dof, off) where dof is a number of degrees of freedom, and off is an offset into the Vec holding these values. > In my code i use (from ex11.c and DMComputeL2GradientDiff_Plex) : > ierr = DMGetDefaultSection(forest, §ion);CHKERRQ(ierr); > This default section is the Section for the solution field. Before here just call ierr = DMConvert(forest, DMPLEX, &plex);CHKERRQ(ierr); and use that in all the subsequent calls. Thanks, Matt ierr = DMForestGetCellChart(forest,&cStart,&cEnd);CHKERRQ(ierr); > ierr = DMGetLocalVector(forest, &localX);CHKERRQ(ierr); > ierr = DMGlobalToLocalBegin(forest, u, INSERT_VALUES, > localX);CHKERRQ(ierr); > ierr = DMGlobalToLocalEnd (forest, u, INSERT_VALUES, > localX);CHKERRQ(ierr); > > for (c = cStart; c < cEnd; c++) { > > DMPlexVecGetClosure(forest, section, localX, c, NULL, &x); > > [...] > > And i would like to get "x" for a DM forest. > > I looked at dm/impls/forest/p4est/pforest.c, but it looks like quite > difficult to import. > I also search in vec/is/utils/vsectionis.c in order to find the correct > section function, but i didn't catch how to use the right one. > > It looks so simple to use DMPlexVecGetClosure for DM Plex, getting into > the code of DMPlexVecGetClosure is also kind of difficult, at my level i > mean. > > Where can i find the correct way to do it ? Is there an example for what i > want to do ? > > Thanks, > > Yann > > > Thanks, > > Matt > > >> Is it possible ? >> Thanks, >> Yann >> >> >> --- >> L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le >> logiciel antivirus Avast. >> https://www.avast.com/antivirus >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > ___________________________ > > Yann JOBIC > HPC engineer > Polytech Marseille DME > IUSTI-CNRS UMR 6595 > Technop?le de Ch?teau Gombert5 rue Enrico Fermi > 13453 Marseille cedex 13 > Tel : (33) 4 91 10 69 39 > ou (33) 4 91 10 69 43 > Fax : (33) 4 91 10 69 69 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From epscodes at gmail.com Sun Nov 12 22:35:10 2017 From: epscodes at gmail.com (Xiangdong) Date: Sun, 12 Nov 2017 23:35:10 -0500 Subject: [petsc-users] questions about vectorization Message-ID: Hello everyone, Can someone comment on the vectorization of PETSc? For example, for the MatMult function, will it perform better or run faster if it is compiled with avx2 or avx512? Thank you. Best, Xiangdong -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Nov 13 06:04:07 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 13 Nov 2017 12:04:07 +0000 Subject: [petsc-users] Coloring of a finite volume unstructured mesh In-Reply-To: <7a19-5a095380-7-1097f660@264749574> References: <7a19-5a095380-7-1097f660@264749574> Message-ID: > On Nov 13, 2017, at 2:10 AM, SIERRA-AUSIN Javier wrote: > > Hi thanks for your answer, > > I would like to precise that in my particular case I deal with an unstructured grid with an stencil that takes two distance neighbour (center of the cell). And indeed, you are right the coupling is between faces but since it is not order 1 (upwind), the reconstruction of the face state takes several (2 in my case) values at the cell of neighbors. > So that in 2d if we perturbe a cell we would perturbe 13 around (distance 2 neighbours) and in 3d, 25 neighbors. > My question is how I can provide PETSc that structure of the nonzero structure of the Jacobian? With AO? No, AO has nothing to do with it. > Would it be like src/snes/examples/tutorials/ex10d/ex10.c, but having an adjacency matrix with the stencil connectivity instead of the geometrical/parmetis connectivity? Yes, pretty much. Note that example uses MatSetValuesLocal() but most likely you will use MatSetValues() So basically you need to preallocate the matrix with enough space and then call MatSetValues() to indicate all possible nonzero entries in the Jacobian due to the stencil. > If not I would really appreciate an example illustrating how I should do it. > > Thanks in advance!! > > Javier > > > > On Friday, November 10, 2017 05:09 CET, "Smith, Barry F." wrote: > >> >> To use the PETSc coloring based Jacobian computer (which uses finite differences) you absolutely have to be able to provide the nonzero structure of the Jacobian. >> >> Now once you provide the nonzero structure of the Jacobian the PETSc MatColoring routines can actually compute the coloring for you. >> >> So in other words you need not worry about the coloring at all, you just need to worry about providing the nonzero structure. Since you are using a finite volume method presumably all your coupling is between faces? In this case explicitly computing the nonzero structure of the Jacobian is probably pretty straightforward and you should just do it. >> >> Barry >> >> >> > On Nov 7, 2017, at 10:13 AM, Matthew Knepley wrote: >> > >> > On Tue, Nov 7, 2017 at 10:18 AM, SIERRA-AUSIN Javier wrote: >> > Hi, >> > >> > I would like to ask you concerning the computation of the Jacobian matrix via finite difference and coloring of the connectivity graph. >> > I wonder whether it is possible or not to color the Jacobian matrix of a given solver that evaluates the RHS with its associated connectivity in the global indeces of my solver (not PETSc). >> > As well, if it is possible to do this from an already partioned domain in parallel. >> > All of this is better explained in this post : https://scicomp.stackexchange.com/questions/28209/linking-petsc-with-an-already-parallel-in-house-finite-volume-solver >> > >> > The simplest thing you can do is to use the finite-difference Jacobian action (MatMFFD). This is setup automatically by SNES >> > if you give a FormFunction pointer, but no FormJacobian routine. Just tell the PETSc Vecs to use your ParMetis layout (by >> > setting the local sizes), and it should run fine in SNES. >> > >> > However, usually you need some kind of preconditioning. Thus you either have to form the Jacobian or some approximation. If >> > you cannot form an approximation, then you can use coloring. Once option is to create a DMPlex with your mesh information. >> > This can be done in parallel after you have already partitioned with ParMetis (as long as you know the "overlap" of vertices, or >> > adjacency of cells). Then the coloring can be done automatically using that DM information. Otherwise, you will have to supply >> > a coloring to the SNES. >> > >> > Thanks, >> > >> > Matt >> > >> > Thanks in advance, >> > >> > Javier. >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> > -- Norbert Wiener >> > >> > https://www.cse.buffalo.edu/~knepley/ >> > > > > From mfadams at lbl.gov Mon Nov 13 07:26:50 2017 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 13 Nov 2017 08:26:50 -0500 Subject: [petsc-users] questions about vectorization In-Reply-To: References: Message-ID: On Sun, Nov 12, 2017 at 11:35 PM, Xiangdong wrote: > Hello everyone, > > Can someone comment on the vectorization of PETSc? For example, for the > MatMult function, will it perform better or run faster if it is compiled > with avx2 or avx512? > There are no AVX instructions that I know of in PETSCc, but there are kernels from 3rd parties that are: 1) the 'hypre' AMG solver and 2) the new MKL sparse matrix class (wraps MKL, which is probably vectorized). > > Thank you. > > Best, > Xiangdong > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Nov 13 07:46:19 2017 From: jed at jedbrown.org (Jed Brown) Date: Mon, 13 Nov 2017 06:46:19 -0700 Subject: [petsc-users] questions about vectorization In-Reply-To: References: Message-ID: <87mv3q9tqs.fsf@jedbrown.org> Mark Adams writes: > On Sun, Nov 12, 2017 at 11:35 PM, Xiangdong wrote: > >> Hello everyone, >> >> Can someone comment on the vectorization of PETSc? For example, for the >> MatMult function, will it perform better or run faster if it is compiled >> with avx2 or avx512? >> > > There are no AVX instructions that I know of in PETSCc, but there are > kernels from 3rd parties that are: 1) the 'hypre' AMG solver and 2) the new > MKL sparse matrix class (wraps MKL, which is probably vectorized). There are explicit AVX/AVX512F instructions for the SELL matrix format. Many other operations will be hit-or-miss whether the compiler actually produces vectorized code or whether there would be any benefit in doing so (most operations in PETSc are limited by memory bandwidth rather than vectorization; that includes matrix-vector products on most current architectures except KNL with MCDRAM, where the SELL format is beneficial). From hongzhang at anl.gov Mon Nov 13 10:32:03 2017 From: hongzhang at anl.gov (Zhang, Hong) Date: Mon, 13 Nov 2017 16:32:03 +0000 Subject: [petsc-users] questions about vectorization In-Reply-To: References: Message-ID: Most operations in PETSc would not benefit much from vectorization since they are memory-bounded. But this does not discourage you from compiling PETSc with AVX2/AVX512. We have added a new matrix format (currently named ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster than the AIJ format. The MatMult kernel is hand-optimized with AVX intrinsics. It works on any Intel processors that support AVX or AVX2 or AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we have been optimizing the AIJ MatMult kernel for these architectures as well. And one has to use AVX compiler flags in order to take advantage of the optimized kernels and the new matrix format. Hong (Mr.) > On Nov 12, 2017, at 10:35 PM, Xiangdong wrote: > > Hello everyone, > > Can someone comment on the vectorization of PETSc? For example, for the MatMult function, will it perform better or run faster if it is compiled with avx2 or avx512? > > Thank you. > > Best, > Xiangdong From tobias.jawecki at tuwien.ac.at Mon Nov 13 12:22:11 2017 From: tobias.jawecki at tuwien.ac.at (Tobias Jawecki) Date: Mon, 13 Nov 2017 19:22:11 +0100 Subject: [petsc-users] Lapack with Quadruple Precision in PETSc and SLEPc Message-ID: <2ccd1b2c-c6b0-34e6-482f-68831343a5af@tuwien.ac.at> Dear all, I am interested in computations with higher precision. The application is mainly error analysis of high order Magnus integrators. In some cases the asymptotic behavior of the error can only be observed when the error is already on double precision and round-off errors of the subroutines can matter. Computation with higher precision then helps to compute a reference solution. Using PETSc/SLEPc with the float128 setting works for me, but I do have some questions on how it works with Lapack/Blas. Does PETSc use the standard Lapack package with double precision changed to quadruple precision? I did not find a lot about using the standard Lapack with quadruple precision online, I just saw that all the routines in Lapack expect double precision input (according to netlib.org). For example the Lapack routine dstevr for the eigenvalue computation of small problems which appears in the Lanczos code of SLEPc. I assume some Lapack methods use parameters which are adjusted on double precision? To give an example what kind of parameters would need to be considered: Using the Matrix Function structure of SLEPc to compute the Matrix Exponential Function with Krylov methods seems to use a Pad? approximation for the small problem. The used Pad? approximation is of order 6 with scaling and squaring s.t. |H 2^(-s)|<0.5. This choice of parameter leads to an error smaller than 1e-16 (N. Higham book on Matrix Functions) for approximating the Matrix Exponential of the small problem. The solution is then correct only up to double precision even if higher precision is used for the computations. My question is how PETSc and SLEPc handle the Lapack methods which are needed on quadruple precision. My main concern is that some Lapack methods use parameters which are chosen so that the results are correct up to double precision. Thanks for your efforts and greetings, Tobias Jawecki From bsmith at mcs.anl.gov Mon Nov 13 12:50:10 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 13 Nov 2017 18:50:10 +0000 Subject: [petsc-users] Lapack with Quadruple Precision in PETSc and SLEPc In-Reply-To: <2ccd1b2c-c6b0-34e6-482f-68831343a5af@tuwien.ac.at> References: <2ccd1b2c-c6b0-34e6-482f-68831343a5af@tuwien.ac.at> Message-ID: <84C471AF-F20C-40FE-A6C0-004E591B8FD0@mcs.anl.gov> Tobias, When you use PETSc in quad precision you need to ./configure with --download-f2cblaslapack this uses a version of BLAS/LAPACK obtained by running f2c on the reference version of BLAS/LAPACK (that is, fortran code from netlib) and then massages the source code for quad precision. The tool we use to do the conversion and massaging is $PETSC_DIR/bin/maint/toclapack.sh It is possible that we do not do all the conversions for everything in LAPACK properly for quad precision. What we have tested with PETSc all seems to handle the full precision but the truth is we do not necessary test all the parts of LAPACK that may depend on such constants. If you find things we missed we would eagerly accept your corrections. Barry > On Nov 13, 2017, at 12:22 PM, Tobias Jawecki wrote: > > Dear all, > > I am interested in computations with higher precision. The application is mainly error analysis of high order Magnus integrators. In some cases the asymptotic behavior of the error can only be observed when the error is already on double precision and round-off errors of the subroutines can matter. Computation with higher precision then helps to compute a reference solution. > > Using PETSc/SLEPc with the float128 setting works for me, but I do have some questions on how it works with Lapack/Blas. > Does PETSc use the standard Lapack package with double precision changed to quadruple precision? > I did not find a lot about using the standard Lapack with quadruple precision online, I just saw that all the routines in Lapack expect double precision input (according to netlib.org). > > For example the Lapack routine dstevr for the eigenvalue computation of small problems which appears in the Lanczos code of SLEPc. > > I assume some Lapack methods use parameters which are adjusted on double precision? > > To give an example what kind of parameters would need to be considered: > Using the Matrix Function structure of SLEPc to compute the Matrix Exponential Function with Krylov methods seems to use a Pad? approximation for the small problem. The used Pad? approximation is of order 6 with scaling and squaring s.t. |H 2^(-s)|<0.5. This choice of parameter leads to an error smaller than 1e-16 (N. Higham book on Matrix Functions) for approximating the Matrix Exponential of the small problem. The solution is then correct only up to double precision even if higher precision is used for the computations. > > My question is how PETSc and SLEPc handle the Lapack methods which are needed on quadruple precision. My main concern is that some Lapack methods use parameters which are chosen so that the results are correct up to double precision. > > Thanks for your efforts and greetings, > Tobias Jawecki From jroman at dsic.upv.es Mon Nov 13 12:52:40 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 13 Nov 2017 19:52:40 +0100 Subject: [petsc-users] Lapack with Quadruple Precision in PETSc and SLEPc In-Reply-To: <84C471AF-F20C-40FE-A6C0-004E591B8FD0@mcs.anl.gov> References: <2ccd1b2c-c6b0-34e6-482f-68831343a5af@tuwien.ac.at> <84C471AF-F20C-40FE-A6C0-004E591B8FD0@mcs.anl.gov> Message-ID: Yes. To complement Barry?s answer: The matrix exponential is a particular case, since it is not directly available in LAPACK. First of all, I would suggest to upgrade to slepc-3.8 that has a new implementation of Higham?s method (Pad? up to order 13). This might be more accurate than the basic Pad? of order 6, but still it relies on some parameters that assume double precision (theta_m in Higham?s paper SIMAX 2005). So in this case the problem is in the SLEPc implementation, and has nothing to do with LAPACK. I have to think what needs to be changed for quad precision. Jose > El 13 nov 2017, a las 19:50, Smith, Barry F. escribi?: > > > > Tobias, > > When you use PETSc in quad precision you need to ./configure with --download-f2cblaslapack this uses a version of BLAS/LAPACK obtained by running f2c on the reference version of BLAS/LAPACK (that is, fortran code from netlib) and then massages the source code for quad precision. > > The tool we use to do the conversion and massaging is $PETSC_DIR/bin/maint/toclapack.sh > > It is possible that we do not do all the conversions for everything in LAPACK properly for quad precision. What we have tested with PETSc all seems to handle the full precision but the truth is we do not necessary test all the parts of LAPACK that may depend on such constants. > > If you find things we missed we would eagerly accept your corrections. > > Barry > >> On Nov 13, 2017, at 12:22 PM, Tobias Jawecki wrote: >> >> Dear all, >> >> I am interested in computations with higher precision. The application is mainly error analysis of high order Magnus integrators. In some cases the asymptotic behavior of the error can only be observed when the error is already on double precision and round-off errors of the subroutines can matter. Computation with higher precision then helps to compute a reference solution. >> >> Using PETSc/SLEPc with the float128 setting works for me, but I do have some questions on how it works with Lapack/Blas. >> Does PETSc use the standard Lapack package with double precision changed to quadruple precision? >> I did not find a lot about using the standard Lapack with quadruple precision online, I just saw that all the routines in Lapack expect double precision input (according to netlib.org). >> >> For example the Lapack routine dstevr for the eigenvalue computation of small problems which appears in the Lanczos code of SLEPc. >> >> I assume some Lapack methods use parameters which are adjusted on double precision? >> >> To give an example what kind of parameters would need to be considered: >> Using the Matrix Function structure of SLEPc to compute the Matrix Exponential Function with Krylov methods seems to use a Pad? approximation for the small problem. The used Pad? approximation is of order 6 with scaling and squaring s.t. |H 2^(-s)|<0.5. This choice of parameter leads to an error smaller than 1e-16 (N. Higham book on Matrix Functions) for approximating the Matrix Exponential of the small problem. The solution is then correct only up to double precision even if higher precision is used for the computations. >> >> My question is how PETSc and SLEPc handle the Lapack methods which are needed on quadruple precision. My main concern is that some Lapack methods use parameters which are chosen so that the results are correct up to double precision. >> >> Thanks for your efforts and greetings, >> Tobias Jawecki > From gregory.meyer at gmail.com Mon Nov 13 13:07:06 2017 From: gregory.meyer at gmail.com (Greg Meyer) Date: Mon, 13 Nov 2017 19:07:06 +0000 Subject: [petsc-users] Building library with PETSc makefile Message-ID: Hi, I'm extending PETSc for my particular application and looking to make my own library. It would be great to do this using PETSc's makefile structure, since I would like to build it based on how PETSc was configured (static vs. shared, with appropriate linker flags, etc). However I've had a bit of trouble parsing the petsc makefile structure to figure out what variables, commands, etc. I should put in my makefile target to do so. Can anyone provide a sample makefile for building a custom library, shared or static depending on PETSc configuration? Thanks in advance, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Nov 13 13:33:17 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 13 Nov 2017 13:33:17 -0600 Subject: [petsc-users] Building library with PETSc makefile In-Reply-To: References: Message-ID: You might want to check on ctetgen on how its using PETSc makefiles to build ctetgen library. You can get this with --download-ctetgen or https://bitbucket.org/petsc/ctetgen [this uses the 'all-legacy' infrastructure - not the currently used 'all-gnumake'] Satish On Mon, 13 Nov 2017, Greg Meyer wrote: > Hi, > > I'm extending PETSc for my particular application and looking to make my > own library. It would be great to do this using PETSc's makefile structure, > since I would like to build it based on how PETSc was configured (static > vs. shared, with appropriate linker flags, etc). However I've had a bit of > trouble parsing the petsc makefile structure to figure out what variables, > commands, etc. I should put in my makefile target to do so. > > Can anyone provide a sample makefile for building a custom library, shared > or static depending on PETSc configuration? > > Thanks in advance, > Greg > From stefano.zampini at gmail.com Mon Nov 13 14:41:54 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 13 Nov 2017 23:41:54 +0300 Subject: [petsc-users] Building library with PETSc makefile In-Reply-To: References: Message-ID: Here's another example https://bitbucket.org/dalcinl/petiga/ Il 13 Nov 2017 10:33 PM, "Satish Balay" ha scritto: > You might want to check on ctetgen on how its using PETSc makefiles to > build ctetgen library. > > You can get this with --download-ctetgen or https://bitbucket.org/petsc/ > ctetgen > > [this uses the 'all-legacy' infrastructure - not the currently used > 'all-gnumake'] > > Satish > > On Mon, 13 Nov 2017, Greg Meyer wrote: > > > Hi, > > > > I'm extending PETSc for my particular application and looking to make my > > own library. It would be great to do this using PETSc's makefile > structure, > > since I would like to build it based on how PETSc was configured (static > > vs. shared, with appropriate linker flags, etc). However I've had a bit > of > > trouble parsing the petsc makefile structure to figure out what > variables, > > commands, etc. I should put in my makefile target to do so. > > > > Can anyone provide a sample makefile for building a custom library, > shared > > or static depending on PETSc configuration? > > > > Thanks in advance, > > Greg > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From epscodes at gmail.com Mon Nov 13 22:49:47 2017 From: epscodes at gmail.com (Xiangdong) Date: Mon, 13 Nov 2017 23:49:47 -0500 Subject: [petsc-users] questions about vectorization In-Reply-To: References: Message-ID: 1) How about the vectorization of BAIJ format? If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to do anything special (more than AVX flag) for the compiler to vectorize it? 2) Could you please update the linear solver table to label the preconditioners/solvers compatible with ELL format? http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html Thank you. Xiangdong On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong wrote: > Most operations in PETSc would not benefit much from vectorization since > they are memory-bounded. But this does not discourage you from compiling > PETSc with AVX2/AVX512. We have added a new matrix format (currently named > ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster > than the AIJ format. The MatMult kernel is hand-optimized with AVX > intrinsics. It works on any Intel processors that support AVX or AVX2 or > AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we > have been optimizing the AIJ MatMult kernel for these architectures as > well. And one has to use AVX compiler flags in order to take advantage of > the optimized kernels and the new matrix format. > > Hong (Mr.) > > > On Nov 12, 2017, at 10:35 PM, Xiangdong wrote: > > > > Hello everyone, > > > > Can someone comment on the vectorization of PETSc? For example, for the > MatMult function, will it perform better or run faster if it is compiled > with avx2 or avx512? > > > > Thank you. > > > > Best, > > Xiangdong > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue Nov 14 14:13:03 2017 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 14 Nov 2017 20:13:03 +0000 Subject: [petsc-users] questions about vectorization In-Reply-To: References: Message-ID: On Nov 13, 2017, at 10:49 PM, Xiangdong > wrote: 1) How about the vectorization of BAIJ format? BAIJ kernels are optimized with manual unrolling, but not with AVX intrinsics. So the vectorization relies on the compiler's ability. It may or may not get vectorized depending on the compiler's optimization decisions. But vectorization is not essential for the performance of most BAIJ kernels. If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to do anything special (more than AVX flag) for the compiler to vectorize it? In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal for AVX512. But other block sizes would make vectorization less profitable because of the remainders. 2) Could you please update the linear solver table to label the preconditioners/solvers compatible with ELL format? http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html This is still in a working progress. The easiest thing to do would be to use ELL for the Jacobian matrix and other formats (e.g. AIJ) for the preconditioners. Then you would not need to worry about which preconditioners are compatible. An example can be found at ts/examples/tutorials/advection-diffusion-reaction/ex5adj.c. For preconditioners such as block jacobi and mg (with bjacobi or with sor), you can use ELL for both the preconditioner and the Jacobian, and expect a considerable gain since MatMult is the dominating operation. The makefile for ex5adj includes a few use cases that demonstrate how ELL plays with various preconditioners. Hong (Mr.) Thank you. Xiangdong On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong > wrote: Most operations in PETSc would not benefit much from vectorization since they are memory-bounded. But this does not discourage you from compiling PETSc with AVX2/AVX512. We have added a new matrix format (currently named ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster than the AIJ format. The MatMult kernel is hand-optimized with AVX intrinsics. It works on any Intel processors that support AVX or AVX2 or AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we have been optimizing the AIJ MatMult kernel for these architectures as well. And one has to use AVX compiler flags in order to take advantage of the optimized kernels and the new matrix format. Hong (Mr.) > On Nov 12, 2017, at 10:35 PM, Xiangdong > wrote: > > Hello everyone, > > Can someone comment on the vectorization of PETSc? For example, for the MatMult function, will it perform better or run faster if it is compiled with avx2 or avx512? > > Thank you. > > Best, > Xiangdong -------------- next part -------------- An HTML attachment was scrubbed... URL: From rtmills at anl.gov Tue Nov 14 15:56:33 2017 From: rtmills at anl.gov (Richard Tran Mills) Date: Tue, 14 Nov 2017 13:56:33 -0800 Subject: [petsc-users] questions about vectorization In-Reply-To: <35272c2bb769472b852a569f88718de9@CY1PR09MB0741.namprd09.prod.outlook.com> References: <35272c2bb769472b852a569f88718de9@CY1PR09MB0741.namprd09.prod.outlook.com> Message-ID: Xiangdong, If you are running on an Intel-based system with support for recent instruction sets like AVX2 or AVX-512, and you have access to the Intel compilers, then telling the compiler to target these instruction sets (e.g., "-xCORE-AVX2" or "-xMIC-AVX512") will probably give you some noticeable gain in performance. It will be much less than you would expect from something very CPU-bound like xGEMM code, but, in my experience, it will be noticeable (remember, even if you have a memory-bound code, your code's performance won't be bound by the memory subsystem 100% of the time). I don't know how well the non-Intel compilers are able to auto-vectorize, so your mileage may vary for those. As Hong has pointed out, there are some places in the PETSc source in which we have introduced code using AVX/AVX512 intrinsics. For those codes, you should see benefit with any compiler that supports these intrinsics, as one is not relying on the auto-vectorizer then. Best regards, Richard On Mon, Nov 13, 2017 at 8:32 AM, Zhang, Hong wrote: > Most operations in PETSc would not benefit much from vectorization since > they are memory-bounded. But this does not discourage you from compiling > PETSc with AVX2/AVX512. We have added a new matrix format (currently named > ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster > than the AIJ format. The MatMult kernel is hand-optimized with AVX > intrinsics. It works on any Intel processors that support AVX or AVX2 or > AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we > have been optimizing the AIJ MatMult kernel for these architectures as > well. And one has to use AVX compiler flags in order to take advantage of > the optimized kernels and the new matrix format. > > Hong (Mr.) > > > On Nov 12, 2017, at 10:35 PM, Xiangdong wrote: > > > > Hello everyone, > > > > Can someone comment on the vectorization of PETSc? For example, for the > MatMult function, will it perform better or run faster if it is compiled > with avx2 or avx512? > > > > Thank you. > > > > Best, > > Xiangdong > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rtmills at anl.gov Tue Nov 14 16:40:55 2017 From: rtmills at anl.gov (Richard Tran Mills) Date: Tue, 14 Nov 2017 14:40:55 -0800 Subject: [petsc-users] questions about vectorization In-Reply-To: References: Message-ID: On Tue, Nov 14, 2017 at 12:13 PM, Zhang, Hong wrote: > > > On Nov 13, 2017, at 10:49 PM, Xiangdong wrote: > > 1) How about the vectorization of BAIJ format? > > > BAIJ kernels are optimized with manual unrolling, but not with AVX > intrinsics. So the vectorization relies on the compiler's ability. > It may or may not get vectorized depending on the compiler's optimization > decisions. But vectorization is not essential for the performance of most > BAIJ kernels. > I know that this has come up in previous discussions, but I'm guessing that the manual unrolling actually impedes the ability of many modern compilers to optimize the BAIJ calculations. I suppose we ought to have a switch to enable or disable the use of the unrolled versions? (And, further down the road, some sort of performance model to tell us what the setting for the switch should be...) --Richard > If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to do > anything special (more than AVX flag) for the compiler to vectorize it? > > > In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal > for AVX512. But other block sizes would make vectorization less profitable > because of the remainders. > > 2) Could you please update the linear solver table to label the > preconditioners/solvers compatible with ELL format? > http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html > > > This is still in a working progress. The easiest thing to do would be to > use ELL for the Jacobian matrix and other formats (e.g. AIJ) for the > preconditioners. > Then you would not need to worry about which preconditioners are > compatible. An example can be found at ts/examples/tutorials/ > advection-diffusion-reaction/ex5adj.c. > For preconditioners such as block jacobi and mg (with bjacobi or with > sor), you can use ELL for both the preconditioner and the Jacobian, > and expect a considerable gain since MatMult is the dominating operation. > > The makefile for ex5adj includes a few use cases that demonstrate how ELL > plays with various preconditioners. > > Hong (Mr.) > > Thank you. > > Xiangdong > > On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong wrote: > >> Most operations in PETSc would not benefit much from vectorization since >> they are memory-bounded. But this does not discourage you from compiling >> PETSc with AVX2/AVX512. We have added a new matrix format (currently named >> ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster >> than the AIJ format. The MatMult kernel is hand-optimized with AVX >> intrinsics. It works on any Intel processors that support AVX or AVX2 or >> AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we >> have been optimizing the AIJ MatMult kernel for these architectures as >> well. And one has to use AVX compiler flags in order to take advantage of >> the optimized kernels and the new matrix format. >> >> Hong (Mr.) >> >> > On Nov 12, 2017, at 10:35 PM, Xiangdong wrote: >> > >> > Hello everyone, >> > >> > Can someone comment on the vectorization of PETSc? For example, for the >> MatMult function, will it perform better or run faster if it is compiled >> with avx2 or avx512? >> > >> > Thank you. >> > >> > Best, >> > Xiangdong >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Nov 14 16:42:56 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 14 Nov 2017 22:42:56 +0000 Subject: [petsc-users] questions about vectorization In-Reply-To: References: Message-ID: <4ED000A6-61AA-4778-A07B-6A1D625BB941@mcs.anl.gov> Use MKL versions of block formats? > On Nov 14, 2017, at 4:40 PM, Richard Tran Mills wrote: > > On Tue, Nov 14, 2017 at 12:13 PM, Zhang, Hong wrote: > > >> On Nov 13, 2017, at 10:49 PM, Xiangdong wrote: >> >> 1) How about the vectorization of BAIJ format? > > BAIJ kernels are optimized with manual unrolling, but not with AVX intrinsics. So the vectorization relies on the compiler's ability. > It may or may not get vectorized depending on the compiler's optimization decisions. But vectorization is not essential for the performance of most BAIJ kernels. > > I know that this has come up in previous discussions, but I'm guessing that the manual unrolling actually impedes the ability of many modern compilers to optimize the BAIJ calculations. I suppose we ought to have a switch to enable or disable the use of the unrolled versions? (And, further down the road, some sort of performance model to tell us what the setting for the switch should be...) > > --Richard > > >> If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to do anything special (more than AVX flag) for the compiler to vectorize it? > > In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal for AVX512. But other block sizes would make vectorization less profitable because of the remainders. > >> 2) Could you please update the linear solver table to label the preconditioners/solvers compatible with ELL format? >> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html > > This is still in a working progress. The easiest thing to do would be to use ELL for the Jacobian matrix and other formats (e.g. AIJ) for the preconditioners. > Then you would not need to worry about which preconditioners are compatible. An example can be found at ts/examples/tutorials/advection-diffusion-reaction/ex5adj.c. > For preconditioners such as block jacobi and mg (with bjacobi or with sor), you can use ELL for both the preconditioner and the Jacobian, > and expect a considerable gain since MatMult is the dominating operation. > > The makefile for ex5adj includes a few use cases that demonstrate how ELL plays with various preconditioners. > > Hong (Mr.) > >> Thank you. >> >> Xiangdong >> >> On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong wrote: >> Most operations in PETSc would not benefit much from vectorization since they are memory-bounded. But this does not discourage you from compiling PETSc with AVX2/AVX512. We have added a new matrix format (currently named ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster than the AIJ format. The MatMult kernel is hand-optimized with AVX intrinsics. It works on any Intel processors that support AVX or AVX2 or AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we have been optimizing the AIJ MatMult kernel for these architectures as well. And one has to use AVX compiler flags in order to take advantage of the optimized kernels and the new matrix format. >> >> Hong (Mr.) >> >> > On Nov 12, 2017, at 10:35 PM, Xiangdong wrote: >> > >> > Hello everyone, >> > >> > Can someone comment on the vectorization of PETSc? For example, for the MatMult function, will it perform better or run faster if it is compiled with avx2 or avx512? >> > >> > Thank you. >> > >> > Best, >> > Xiangdong >> >> > > From rtmills at anl.gov Tue Nov 14 16:59:50 2017 From: rtmills at anl.gov (Richard Tran Mills) Date: Tue, 14 Nov 2017 14:59:50 -0800 Subject: [petsc-users] questions about vectorization In-Reply-To: References: Message-ID: Yes, that's worth a try. Xiangdong, if you want to employ the MKL implementations for BAIJ MatMult() and friends, you can do so by configuring petsc-master with a recent version of MKL and then using the option "-mat_type baijmkl" (on the command line or set in your PETSC_OPTIONS environment variable). Note that the above requires a version of MKL that is recent enough to have the sparse inspector-executor routines. MKL is now free, so I recommend installing the latest version. (You can also try using the sparse MKL routines with AIJ format matrices by using either "-mat_type aijmkl" or "-mat_seqaij_type seqaijmkl". This will use MKL for MatMult()-type operations and some sparse matrix-matrix products.) Best regards, Richard On Tue, Nov 14, 2017 at 2:42 PM, Smith, Barry F. wrote: > > Use MKL versions of block formats? > > > On Nov 14, 2017, at 4:40 PM, Richard Tran Mills wrote: > > > > On Tue, Nov 14, 2017 at 12:13 PM, Zhang, Hong wrote: > > > > > >> On Nov 13, 2017, at 10:49 PM, Xiangdong wrote: > >> > >> 1) How about the vectorization of BAIJ format? > > > > BAIJ kernels are optimized with manual unrolling, but not with AVX > intrinsics. So the vectorization relies on the compiler's ability. > > It may or may not get vectorized depending on the compiler's > optimization decisions. But vectorization is not essential for the > performance of most BAIJ kernels. > > > > I know that this has come up in previous discussions, but I'm guessing > that the manual unrolling actually impedes the ability of many modern > compilers to optimize the BAIJ calculations. I suppose we ought to have a > switch to enable or disable the use of the unrolled versions? (And, further > down the road, some sort of performance model to tell us what the setting > for the switch should be...) > > > > --Richard > > > > > >> If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to > do anything special (more than AVX flag) for the compiler to vectorize it? > > > > In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal > for AVX512. But other block sizes would make vectorization less profitable > because of the remainders. > > > >> 2) Could you please update the linear solver table to label the > preconditioners/solvers compatible with ELL format? > >> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html > > > > This is still in a working progress. The easiest thing to do would be to > use ELL for the Jacobian matrix and other formats (e.g. AIJ) for the > preconditioners. > > Then you would not need to worry about which preconditioners are > compatible. An example can be found at ts/examples/tutorials/ > advection-diffusion-reaction/ex5adj.c. > > For preconditioners such as block jacobi and mg (with bjacobi or with > sor), you can use ELL for both the preconditioner and the Jacobian, > > and expect a considerable gain since MatMult is the dominating operation. > > > > The makefile for ex5adj includes a few use cases that demonstrate how > ELL plays with various preconditioners. > > > > Hong (Mr.) > > > >> Thank you. > >> > >> Xiangdong > >> > >> On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong > wrote: > >> Most operations in PETSc would not benefit much from vectorization > since they are memory-bounded. But this does not discourage you from > compiling PETSc with AVX2/AVX512. We have added a new matrix format > (currently named ELL, but will be changed to SELL shortly) that can make > MatMult ~2X faster than the AIJ format. The MatMult kernel is > hand-optimized with AVX intrinsics. It works on any Intel processors that > support AVX or AVX2 or AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. > On the other hand, we have been optimizing the AIJ MatMult kernel for these > architectures as well. And one has to use AVX compiler flags in order to > take advantage of the optimized kernels and the new matrix format. > >> > >> Hong (Mr.) > >> > >> > On Nov 12, 2017, at 10:35 PM, Xiangdong wrote: > >> > > >> > Hello everyone, > >> > > >> > Can someone comment on the vectorization of PETSc? For example, for > the MatMult function, will it perform better or run faster if it is > compiled with avx2 or avx512? > >> > > >> > Thank you. > >> > > >> > Best, > >> > Xiangdong > >> > >> > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Tue Nov 14 22:54:55 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Wed, 15 Nov 2017 17:54:55 +1300 Subject: [petsc-users] ISGlobalToLocalMappingApplyBlock Message-ID: <5b548602-6be9-a0e0-fe9b-789ed4289ea4@auckland.ac.nz> hi I'm trying to use ISGlobalToLocalMappingApplyBlock() and am a bit puzzled about the results it's giving. I've attached a small test to illustrate. It just sets up a local-to-global mapping with 10 elements. Running on two processes the first has global indices 0 - 4 and the the second has 5 - 9. I then try to find the local index corresponding to global index 8. If I set the blocksize parameter to 1, it correctly gives the results -1 on rank 0 and 3 on rank 1. But if I set the blocksize to 2 (or more), the results are -253701943 on rank 0 and -1 on rank 1. Neither of these are what I expected- I thought they should be the same as in the blocksize 1 case. I'm presuming the global indices I pass in to ISGlobalToLocalMappingApplyBlock() should be global block indices (i.e. not scaled up by blocksize). If I do scale them up it doesn't give the answers I expect either. Or am I wrong to expect this to give the same results regardless of blocksize? Cheers, Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- A non-text attachment was scrubbed... Name: testl2g.F90 Type: text/x-fortran Size: 3011 bytes Desc: not available URL: From matteo.semplice at unito.it Wed Nov 15 02:11:22 2017 From: matteo.semplice at unito.it (Matteo Semplice) Date: Wed, 15 Nov 2017 09:11:22 +0100 Subject: [petsc-users] indices into Vec/Mat associated to a DMPlex Message-ID: Hi. I am struggling with indices into matrices associated to a DMPLex mesh. I can explain my problem to the following minimal example. Let's say I want to assemble the matrix to solve an equation (say Laplace) with data attached to cells and the finite volume method. In principle I - loop over the faces of the DMPlex (height=1) - for each face, find neighbouring cells (say points i and j in the DMPlex) and compute the contributions coming from that face (in the example it would be something like (u_j-u_i)* with x=centroids, n=scaled normal to face and <,> the inner product) - insert the contributions +/- in rows/columns n(i) and n(j) of the matrix where n(k) is the index into Vec/Mat of the unknown associated to the k-th cell My problem is how to find out n(k). I assume that the Section should be able to tell me, but I cannot find the correct function to call. I see that FEM methods can use DMPlexMatSetClosure but here we'd need a DMPlexMatSetCone... On a side note, I noticed that the grad field in PetscFVFaceGeom is not computed by DMPlexComputeGeometryFVM. Is it meant or could it be used to store the precomputed? values? But even if so, this wouldn't sovle the problem of where to put the elements in the matrix, right? Matteo From dave.mayhem23 at gmail.com Wed Nov 15 02:34:17 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Wed, 15 Nov 2017 08:34:17 +0000 Subject: [petsc-users] ISGlobalToLocalMappingApplyBlock In-Reply-To: <5b548602-6be9-a0e0-fe9b-789ed4289ea4@auckland.ac.nz> References: <5b548602-6be9-a0e0-fe9b-789ed4289ea4@auckland.ac.nz> Message-ID: On Wed, 15 Nov 2017 at 05:55, Adrian Croucher wrote: > hi > > I'm trying to use ISGlobalToLocalMappingApplyBlock() and am a bit > puzzled about the results it's giving. > > I've attached a small test to illustrate. It just sets up a > local-to-global mapping with 10 elements. Running on two processes the > first has global indices 0 - 4 and the the second has 5 - 9. I then try > to find the local index corresponding to global index 8. > > If I set the blocksize parameter to 1, it correctly gives the results -1 > on rank 0 and 3 on rank 1. > > But if I set the blocksize to 2 (or more), the results are -253701943 on > rank 0 and -1 on rank 1. Neither of these are what I expected- I thought > they should be the same as in the blocksize 1 case. The man page says to use "block global numbering" > > I'm presuming the global indices I pass in to > ISGlobalToLocalMappingApplyBlock() should be global block indices (i.e. > not scaled up by blocksize). Yes, the indices should relate to the blocks If I do scale them up it doesn't give the > answers I expect either. > > Or am I wrong to expect this to give the same results regardless of > blocksize? Yep. However the large negative number being printed looks an uninitialized variable. This seems odd as with mode = MASK nout should equal N and any requested block indices not in the IS should result in -1 being inserted in your local_indices array. What's the value of nout? Thanks, Dave > > Cheers, Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Nov 15 04:39:09 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 15 Nov 2017 05:39:09 -0500 Subject: [petsc-users] indices into Vec/Mat associated to a DMPlex In-Reply-To: References: Message-ID: On Wed, Nov 15, 2017 at 3:11 AM, Matteo Semplice wrote: > Hi. > > I am struggling with indices into matrices associated to a DMPLex mesh. I > can explain my problem to the following minimal example. > > Let's say I want to assemble the matrix to solve an equation (say Laplace) > with data attached to cells and the finite volume method. In principle I > > - loop over the faces of the DMPlex (height=1) > > - for each face, find neighbouring cells (say points i and j in the > DMPlex) and compute the contributions coming from that face (in the example > it would be something like (u_j-u_i)* with x=centroids, n=scaled > normal to face and <,> the inner product) > > - insert the contributions +/- in rows/columns n(i) and n(j) of > the matrix where n(k) is the index into Vec/Mat of the unknown associated > to the k-th cell > > My problem is how to find out n(k). I assume that the Section should be > able to tell me, but I cannot find the correct function to call. I see that > FEM methods can use DMPlexMatSetClosure but here we'd need a > DMPlexMatSetCone... > Everything always reduces to raw Section calls. For instance, for cell c PetscSectionGetDof(sec, c, &dof); PetscSectionGetOffset(sec, c, &off); give the number of degrees of freedom on the cell, and the offset into storage (a local vector for the local section, and global vector for the global section). The Closure stuff just calls this for every point in the closure. > On a side note, I noticed that the grad field in PetscFVFaceGeom is not > computed by DMPlexComputeGeometryFVM. Is it meant or could it be used to > store the precomputed values? There are separate functions for reconstructing the gradient since it is so expensive (and can be inaccurate for some unstructured cases at boundaries). > But even if so, this wouldn't sovle the problem of where to put the > elements in the matrix, right? No. Matt > > Matteo -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at unito.it Wed Nov 15 05:09:21 2017 From: matteo.semplice at unito.it (Matteo Semplice) Date: Wed, 15 Nov 2017 12:09:21 +0100 Subject: [petsc-users] indices into Vec/Mat associated to a DMPlex In-Reply-To: References: Message-ID: On 15/11/2017 11:39, Matthew Knepley wrote: > On Wed, Nov 15, 2017 at 3:11 AM, Matteo Semplice > > wrote: > > Hi. > > I am struggling with indices into matrices associated to a DMPLex > mesh. I can explain my problem to the following minimal example. > > Let's say I want to assemble the matrix to solve an equation (say > Laplace) with data attached to cells and the finite volume method. > In principle I > > - loop over the faces of the DMPlex (height=1) > > - for each face, find neighbouring cells (say points i and j in > the DMPlex) and compute the contributions coming from that face > (in the example it would be something like (u_j-u_i)* > with x=centroids, n=scaled normal to face and <,> the inner product) > > - insert the contributions +/- in rows/columns n(i) and > n(j) of the matrix where n(k) is the index into Vec/Mat of the > unknown associated to the k-th cell > > My problem is how to find out n(k). I assume that the Section > should be able to tell me, but I cannot find the correct function > to call. I see that FEM methods can use DMPlexMatSetClosure but > here we'd need a DMPlexMatSetCone... > > > Everything always reduces to raw Section calls. For instance, for cell c > > ? PetscSectionGetDof(sec, c, &dof); > ? PetscSectionGetOffset(sec, c, &off); > > give the number of degrees of freedom on the cell, and the offset into > storage (a local vector for the local section, and global vector for > the global section). > The Closure stuff just calls this for every point in the closure. All right, so the offset is also the (global) index and that is to be used in calls to MatSetValues. Thanks a lot! ??? Matteo -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Nov 15 05:16:04 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 15 Nov 2017 06:16:04 -0500 Subject: [petsc-users] indices into Vec/Mat associated to a DMPlex In-Reply-To: References: Message-ID: On Wed, Nov 15, 2017 at 6:09 AM, Matteo Semplice wrote: > On 15/11/2017 11:39, Matthew Knepley wrote: > > On Wed, Nov 15, 2017 at 3:11 AM, Matteo Semplice > wrote: > >> Hi. >> >> I am struggling with indices into matrices associated to a DMPLex mesh. I >> can explain my problem to the following minimal example. >> >> Let's say I want to assemble the matrix to solve an equation (say >> Laplace) with data attached to cells and the finite volume method. In >> principle I >> >> - loop over the faces of the DMPlex (height=1) >> >> - for each face, find neighbouring cells (say points i and j in the >> DMPlex) and compute the contributions coming from that face (in the example >> it would be something like (u_j-u_i)* with x=centroids, n=scaled >> normal to face and <,> the inner product) >> >> - insert the contributions +/- in rows/columns n(i) and n(j) >> of the matrix where n(k) is the index into Vec/Mat of the unknown >> associated to the k-th cell >> >> My problem is how to find out n(k). I assume that the Section should be >> able to tell me, but I cannot find the correct function to call. I see that >> FEM methods can use DMPlexMatSetClosure but here we'd need a >> DMPlexMatSetCone... >> > > Everything always reduces to raw Section calls. For instance, for cell c > > PetscSectionGetDof(sec, c, &dof); > PetscSectionGetOffset(sec, c, &off); > > give the number of degrees of freedom on the cell, and the offset into > storage (a local vector for the local section, and global vector for the > global section). > The Closure stuff just calls this for every point in the closure. > > > All right, so the offset is also the (global) index and that is to be used > in calls to MatSetValues. > Not quite. For all owned dofs, it is (in the global section). For all unowned dofs, it is -(off+1), so you have to convert it if you want to set off-process values. Thanks, Matt > Thanks a lot! > > Matteo > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Nov 15 06:38:55 2017 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 15 Nov 2017 07:38:55 -0500 Subject: [petsc-users] Possible to recover ILU(k) from hypre/pilut? Message-ID: I've found ILU(0) or (1) to be working well for my problem, but the petsc implementation is serial only. Running with -pc_type hypre -pc_hypre_type pilut with default settings has considerably worse convergence. I've tried using -pc_hypre_pilut_factorrowsize (number of actual elements in row) to trick it into doing ILU(0), to no effect. Is there any way to recover classical ILU(k) from pilut? Hypre's docs state pilut is no longer supported, and Euclid should be used for anything moving forward. pc_hypre_boomeramg has options for Euclid smoothers. Any hope of a pc_hypre_type euclid? Partially unrelated, PC block-jacobi fails with MFFD type not supported, but additive schwarz with 0 overlap, which I think is identical, works fine. Is this a bug? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Nov 15 07:47:55 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 15 Nov 2017 13:47:55 +0000 Subject: [petsc-users] Possible to recover ILU(k) from hypre/pilut? In-Reply-To: References: Message-ID: <022F4CE4-2E18-40C4-835B-2825F6D65F73@mcs.anl.gov> > On Nov 15, 2017, at 6:38 AM, Mark Lohry wrote: > > I've found ILU(0) or (1) to be working well for my problem, but the petsc implementation is serial only. Running with -pc_type hypre -pc_hypre_type pilut with default settings has considerably worse convergence. I've tried using -pc_hypre_pilut_factorrowsize (number of actual elements in row) to trick it into doing ILU(0), to no effect. > > Is there any way to recover classical ILU(k) from pilut? > > Hypre's docs state pilut is no longer supported, and Euclid should be used for anything moving forward. pc_hypre_boomeramg has options for Euclid smoothers. Any hope of a pc_hypre_type euclid? Not unless someone outside the PETSc team decides to put it back in. > > > Partially unrelated, PC block-jacobi fails with MFFD type not supported, but additive schwarz with 0 overlap, which I think is identical, works fine. Is this a bug? Huh, is this related to hypre, or plan PETSc? Please send all information, command line options etc that reproduce the problem, preferably on a PETSc example. Barry > > > Thanks, > Mark From mlohry at gmail.com Wed Nov 15 07:55:37 2017 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 15 Nov 2017 08:55:37 -0500 Subject: [petsc-users] Possible to recover ILU(k) from hypre/pilut? In-Reply-To: <022F4CE4-2E18-40C4-835B-2825F6D65F73@mcs.anl.gov> References: <022F4CE4-2E18-40C4-835B-2825F6D65F73@mcs.anl.gov> Message-ID: > > > > Partially unrelated, PC block-jacobi fails with MFFD type not supported, > but additive schwarz with 0 overlap, which I think is identical, works > fine. Is this a bug? > > Huh, is this related to hypre, or plan PETSc? Please send all > information, command line options etc that reproduce the problem, > preferably on a PETSc example. Unrelated to hypre, pure petsc. Using: SNESSetJacobian(ctx.snes, ctx.JPre, ctx.JPre, SNESComputeJacobianDefaultColor, fdcoloring); and -snes_mf_operator, -pc_type asm works as expected, ksp_view: PC Object: 32 MPI processes type: asm total subdomain blocks = 32, amount of overlap = 1 restriction/interpolation type - RESTRICT Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (sub_) 1 MPI processes type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=3600, cols=3600 package used to perform factorization: petsc total: nonzeros=690000, allocated nonzeros=690000 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 720 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=3600, cols=3600 total: nonzeros=690000, allocated nonzeros=690000 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 720 nodes, limit used is 5 linear system matrix followed by preconditioner matrix: Mat Object: 32 MPI processes type: mffd rows=76800, cols=76800 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: 32 MPI processes type: mpiaij rows=76800, cols=76800 total: nonzeros=16320000, allocated nonzeros=16320000 total number of mallocs used during MatSetValues calls =0 -pc_type bjacobi: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Not coded for this matrix type [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.8.1, Nov, 04, 2017 [0]PETSC ERROR: maDG on a arch-linux2-c-opt named templeton by mlohry Wed Nov 15 08:53:20 2017 [0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/dev/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=mpicc --with-cxx=mpic++ --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=1 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 --with-shared-libraries=1 --download-parmetis --download-metis --download-hypre=yes --download-superlu_dist=yes --with-64-bit-indices [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 307 in /home/mlohry/dev/build/external/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 119 in /home/mlohry/dev/build/external/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c [0]PETSC ERROR: #3 PCSetUp() line 924 in /home/mlohry/dev/build/external/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #4 KSPSetUp() line 381 in /home/mlohry/dev/build/external/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #5 KSPSolve() line 612 in /home/mlohry/dev/build/external/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 SNESSolve_NEWTONLS() line 224 in /home/mlohry/dev/build/external/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #7 SNESSolve() line 4108 in /home/mlohry/dev/build/external/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #8 TS_SNESSolve() line 176 in /home/mlohry/dev/build/external/petsc/src/ts/impls/implicit/theta/theta.c [0]PETSC ERROR: #9 TSStep_Theta() line 216 in /home/mlohry/dev/build/external/petsc/src/ts/impls/implicit/theta/theta.c [0]PETSC ERROR: #10 TSStep() line 4120 in /home/mlohry/dev/build/external/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #11 TSSolve() line 4373 in /home/mlohry/dev/build/external/petsc/src/ts/interface/ts.c On Wed, Nov 15, 2017 at 8:47 AM, Smith, Barry F. wrote: > > > > On Nov 15, 2017, at 6:38 AM, Mark Lohry wrote: > > > > I've found ILU(0) or (1) to be working well for my problem, but the > petsc implementation is serial only. Running with -pc_type hypre > -pc_hypre_type pilut with default settings has considerably worse > convergence. I've tried using -pc_hypre_pilut_factorrowsize (number of > actual elements in row) to trick it into doing ILU(0), to no effect. > > > > Is there any way to recover classical ILU(k) from pilut? > > > > Hypre's docs state pilut is no longer supported, and Euclid should be > used for anything moving forward. pc_hypre_boomeramg has options for Euclid > smoothers. Any hope of a pc_hypre_type euclid? > > Not unless someone outside the PETSc team decides to put it back in. > > > > > > Partially unrelated, PC block-jacobi fails with MFFD type not supported, > but additive schwarz with 0 overlap, which I think is identical, works > fine. Is this a bug? > > Huh, is this related to hypre, or plan PETSc? Please send all > information, command line options etc that reproduce the problem, > preferably on a PETSc example. > > Barry > > > > > > > Thanks, > > Mark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From evanum at gmail.com Wed Nov 15 11:53:48 2017 From: evanum at gmail.com (Evan Um) Date: Wed, 15 Nov 2017 09:53:48 -0800 Subject: [petsc-users] IDR availalbe in PETSC? Message-ID: Dear PETSC users, I was wondering if anyone already tried/developed an induced dimension reduction (IDR) solver for PETSC? I think that it is a useful one but I couldn't find its example with PETSC. If you have any idea about IDR routines for PETSC, please let me know. Thanks! Best, Evan -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Wed Nov 15 14:26:11 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Wed, 15 Nov 2017 13:26:11 -0700 Subject: [petsc-users] superlu_dist produces random results Message-ID: Hi, There is a heat conduction problem. When superlu_dist is used as a preconditioner, we have random results from different runs. Is there a random algorithm in superlu_dist? If we use ASM or MUMPS as the preconditioner, we then don't have this issue. run 1: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.013384e-02 2 Linear |R| = 4.020995e-08 1 Nonlinear |R| = 1.404678e-02 0 Linear |R| = 1.404678e-02 1 Linear |R| = 5.104757e-08 2 Linear |R| = 7.699637e-14 2 Nonlinear |R| = 5.106418e-08 run 2: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.013384e-02 2 Linear |R| = 4.020995e-08 1 Nonlinear |R| = 1.404678e-02 0 Linear |R| = 1.404678e-02 1 Linear |R| = 5.109913e-08 2 Linear |R| = 7.189091e-14 2 Nonlinear |R| = 5.111591e-08 run 3: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.013384e-02 2 Linear |R| = 4.020995e-08 1 Nonlinear |R| = 1.404678e-02 0 Linear |R| = 1.404678e-02 1 Linear |R| = 5.104942e-08 2 Linear |R| = 7.465572e-14 2 Nonlinear |R| = 5.106642e-08 run 4: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.013384e-02 2 Linear |R| = 4.020995e-08 1 Nonlinear |R| = 1.404678e-02 0 Linear |R| = 1.404678e-02 1 Linear |R| = 5.102730e-08 2 Linear |R| = 7.132220e-14 2 Nonlinear |R| = 5.104442e-08 Solver details: SNES Object: 8 MPI processes type: newtonls maximum iterations=15, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 total number of linear solver iterations=4 total number of function evaluations=7 norm schedule ALWAYS SNESLineSearch Object: 8 MPI processes type: basic maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 8 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=100, initial guess is zero tolerances: relative=1e-06, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 0., needed 0. Factored matrix follows: Mat Object: 8 MPI processes type: superlu_dist rows=7925, cols=7925 package used to perform factorization: superlu_dist total: nonzeros=0, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 SuperLU_DIST run parameters: Process grid nprow 4 x npcol 2 Equilibrate matrix TRUE Matrix input mode 1 Replace tiny pivots FALSE Use iterative refinement TRUE Processors in row 4 col partition 2 Row permutation LargeDiag Column permutation METIS_AT_PLUS_A Parallel symbolic factorization FALSE Repeated factorization SamePattern linear system matrix followed by preconditioner matrix: Mat Object: 8 MPI processes type: mffd rows=7925, cols=7925 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: () 8 MPI processes type: mpiaij rows=7925, cols=7925 total: nonzeros=63587, allocated nonzeros=63865 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Fande, -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Nov 15 14:59:40 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 15 Nov 2017 20:59:40 +0000 Subject: [petsc-users] superlu_dist produces random results In-Reply-To: References: Message-ID: Meaningless differences > On Nov 15, 2017, at 2:26 PM, Kong, Fande wrote: > > Hi, > > There is a heat conduction problem. When superlu_dist is used as a preconditioner, we have random results from different runs. Is there a random algorithm in superlu_dist? If we use ASM or MUMPS as the preconditioner, we then don't have this issue. > > run 1: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.013384e-02 > 2 Linear |R| = 4.020995e-08 > 1 Nonlinear |R| = 1.404678e-02 > 0 Linear |R| = 1.404678e-02 > 1 Linear |R| = 5.104757e-08 > 2 Linear |R| = 7.699637e-14 > 2 Nonlinear |R| = 5.106418e-08 > > > run 2: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.013384e-02 > 2 Linear |R| = 4.020995e-08 > 1 Nonlinear |R| = 1.404678e-02 > 0 Linear |R| = 1.404678e-02 > 1 Linear |R| = 5.109913e-08 > 2 Linear |R| = 7.189091e-14 > 2 Nonlinear |R| = 5.111591e-08 > > run 3: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.013384e-02 > 2 Linear |R| = 4.020995e-08 > 1 Nonlinear |R| = 1.404678e-02 > 0 Linear |R| = 1.404678e-02 > 1 Linear |R| = 5.104942e-08 > 2 Linear |R| = 7.465572e-14 > 2 Nonlinear |R| = 5.106642e-08 > > run 4: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.013384e-02 > 2 Linear |R| = 4.020995e-08 > 1 Nonlinear |R| = 1.404678e-02 > 0 Linear |R| = 1.404678e-02 > 1 Linear |R| = 5.102730e-08 > 2 Linear |R| = 7.132220e-14 > 2 Nonlinear |R| = 5.104442e-08 > > Solver details: > > SNES Object: 8 MPI processes > type: newtonls > maximum iterations=15, maximum function evaluations=10000 > tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 > total number of linear solver iterations=4 > total number of function evaluations=7 > norm schedule ALWAYS > SNESLineSearch Object: 8 MPI processes > type: basic > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > maximum iterations=40 > KSP Object: 8 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=100, initial guess is zero > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 8 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0., needed 0. > Factored matrix follows: > Mat Object: 8 MPI processes > type: superlu_dist > rows=7925, cols=7925 > package used to perform factorization: superlu_dist > total: nonzeros=0, allocated nonzeros=0 > total number of mallocs used during MatSetValues calls =0 > SuperLU_DIST run parameters: > Process grid nprow 4 x npcol 2 > Equilibrate matrix TRUE > Matrix input mode 1 > Replace tiny pivots FALSE > Use iterative refinement TRUE > Processors in row 4 col partition 2 > Row permutation LargeDiag > Column permutation METIS_AT_PLUS_A > Parallel symbolic factorization FALSE > Repeated factorization SamePattern > linear system matrix followed by preconditioner matrix: > Mat Object: 8 MPI processes > type: mffd > rows=7925, cols=7925 > Matrix-free approximation: > err=1.49012e-08 (relative error in function evaluation) > Using wp compute h routine > Does not compute normU > Mat Object: () 8 MPI processes > type: mpiaij > rows=7925, cols=7925 > total: nonzeros=63587, allocated nonzeros=63865 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > > > Fande, > > From fande.kong at inl.gov Wed Nov 15 15:36:57 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Wed, 15 Nov 2017 14:36:57 -0700 Subject: [petsc-users] superlu_dist produces random results In-Reply-To: References: Message-ID: Hi Barry, Thanks for your reply. I was wondering why this happens only when we use superlu_dist. I am trying to understand the algorithm in superlu_dist. If we use ASM or MUMPS, we do not produce these differences. The differences actually are NOT meaningless. In fact, we have a real transient application that presents this issue. When we run the simulation with superlu_dist in parallel for thousands of time steps, the final physics solution looks totally different from different runs. The differences are not acceptable any more. For a steady problem, the difference may be meaningless. But it is significant for the transient problem. This makes the solution not reproducible, and we can not even set a targeting solution in the test system because the solution is so different from one run to another. I guess there might/may be a tiny bug in superlu_dist or the PETSc interface to superlu_dist. Fande, On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. wrote: > > Meaningless differences > > > > On Nov 15, 2017, at 2:26 PM, Kong, Fande wrote: > > > > Hi, > > > > There is a heat conduction problem. When superlu_dist is used as a > preconditioner, we have random results from different runs. Is there a > random algorithm in superlu_dist? If we use ASM or MUMPS as the > preconditioner, we then don't have this issue. > > > > run 1: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020995e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 5.104757e-08 > > 2 Linear |R| = 7.699637e-14 > > 2 Nonlinear |R| = 5.106418e-08 > > > > > > run 2: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020995e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 5.109913e-08 > > 2 Linear |R| = 7.189091e-14 > > 2 Nonlinear |R| = 5.111591e-08 > > > > run 3: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020995e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 5.104942e-08 > > 2 Linear |R| = 7.465572e-14 > > 2 Nonlinear |R| = 5.106642e-08 > > > > run 4: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020995e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 5.102730e-08 > > 2 Linear |R| = 7.132220e-14 > > 2 Nonlinear |R| = 5.104442e-08 > > > > Solver details: > > > > SNES Object: 8 MPI processes > > type: newtonls > > maximum iterations=15, maximum function evaluations=10000 > > tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 > > total number of linear solver iterations=4 > > total number of function evaluations=7 > > norm schedule ALWAYS > > SNESLineSearch Object: 8 MPI processes > > type: basic > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > maximum iterations=40 > > KSP Object: 8 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=100, initial guess is zero > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 8 MPI processes > > type: lu > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0., needed 0. > > Factored matrix follows: > > Mat Object: 8 MPI processes > > type: superlu_dist > > rows=7925, cols=7925 > > package used to perform factorization: superlu_dist > > total: nonzeros=0, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > SuperLU_DIST run parameters: > > Process grid nprow 4 x npcol 2 > > Equilibrate matrix TRUE > > Matrix input mode 1 > > Replace tiny pivots FALSE > > Use iterative refinement TRUE > > Processors in row 4 col partition 2 > > Row permutation LargeDiag > > Column permutation METIS_AT_PLUS_A > > Parallel symbolic factorization FALSE > > Repeated factorization SamePattern > > linear system matrix followed by preconditioner matrix: > > Mat Object: 8 MPI processes > > type: mffd > > rows=7925, cols=7925 > > Matrix-free approximation: > > err=1.49012e-08 (relative error in function evaluation) > > Using wp compute h routine > > Does not compute normU > > Mat Object: () 8 MPI processes > > type: mpiaij > > rows=7925, cols=7925 > > total: nonzeros=63587, allocated nonzeros=63865 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > > > > > Fande, > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Nov 15 15:51:31 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 15 Nov 2017 16:51:31 -0500 Subject: [petsc-users] superlu_dist produces random results In-Reply-To: References: Message-ID: On Wed, Nov 15, 2017 at 4:36 PM, Kong, Fande wrote: > Hi Barry, > > Thanks for your reply. I was wondering why this happens only when we use > superlu_dist. I am trying to understand the algorithm in superlu_dist. If > we use ASM or MUMPS, we do not produce these differences. > > The differences actually are NOT meaningless. In fact, we have a real > transient application that presents this issue. When we run the > simulation with superlu_dist in parallel for thousands of time steps, the > final physics solution looks totally different from different runs. The > differences are not acceptable any more. For a steady problem, the > difference may be meaningless. But it is significant for the transient > problem. > Are you sure this formulation is stable? It does not seem like it. Matt > This makes the solution not reproducible, and we can not even set a > targeting solution in the test system because the solution is so different > from one run to another. I guess there might/may be a tiny bug in > superlu_dist or the PETSc interface to superlu_dist. > > > Fande, > > > > > On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. > wrote: > >> >> Meaningless differences >> >> >> > On Nov 15, 2017, at 2:26 PM, Kong, Fande wrote: >> > >> > Hi, >> > >> > There is a heat conduction problem. When superlu_dist is used as a >> preconditioner, we have random results from different runs. Is there a >> random algorithm in superlu_dist? If we use ASM or MUMPS as the >> preconditioner, we then don't have this issue. >> > >> > run 1: >> > >> > 0 Nonlinear |R| = 9.447423e+03 >> > 0 Linear |R| = 9.447423e+03 >> > 1 Linear |R| = 1.013384e-02 >> > 2 Linear |R| = 4.020995e-08 >> > 1 Nonlinear |R| = 1.404678e-02 >> > 0 Linear |R| = 1.404678e-02 >> > 1 Linear |R| = 5.104757e-08 >> > 2 Linear |R| = 7.699637e-14 >> > 2 Nonlinear |R| = 5.106418e-08 >> > >> > >> > run 2: >> > >> > 0 Nonlinear |R| = 9.447423e+03 >> > 0 Linear |R| = 9.447423e+03 >> > 1 Linear |R| = 1.013384e-02 >> > 2 Linear |R| = 4.020995e-08 >> > 1 Nonlinear |R| = 1.404678e-02 >> > 0 Linear |R| = 1.404678e-02 >> > 1 Linear |R| = 5.109913e-08 >> > 2 Linear |R| = 7.189091e-14 >> > 2 Nonlinear |R| = 5.111591e-08 >> > >> > run 3: >> > >> > 0 Nonlinear |R| = 9.447423e+03 >> > 0 Linear |R| = 9.447423e+03 >> > 1 Linear |R| = 1.013384e-02 >> > 2 Linear |R| = 4.020995e-08 >> > 1 Nonlinear |R| = 1.404678e-02 >> > 0 Linear |R| = 1.404678e-02 >> > 1 Linear |R| = 5.104942e-08 >> > 2 Linear |R| = 7.465572e-14 >> > 2 Nonlinear |R| = 5.106642e-08 >> > >> > run 4: >> > >> > 0 Nonlinear |R| = 9.447423e+03 >> > 0 Linear |R| = 9.447423e+03 >> > 1 Linear |R| = 1.013384e-02 >> > 2 Linear |R| = 4.020995e-08 >> > 1 Nonlinear |R| = 1.404678e-02 >> > 0 Linear |R| = 1.404678e-02 >> > 1 Linear |R| = 5.102730e-08 >> > 2 Linear |R| = 7.132220e-14 >> > 2 Nonlinear |R| = 5.104442e-08 >> > >> > Solver details: >> > >> > SNES Object: 8 MPI processes >> > type: newtonls >> > maximum iterations=15, maximum function evaluations=10000 >> > tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 >> > total number of linear solver iterations=4 >> > total number of function evaluations=7 >> > norm schedule ALWAYS >> > SNESLineSearch Object: 8 MPI processes >> > type: basic >> > maxstep=1.000000e+08, minlambda=1.000000e-12 >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, >> lambda=1.000000e-08 >> > maximum iterations=40 >> > KSP Object: 8 MPI processes >> > type: gmres >> > restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> > happy breakdown tolerance 1e-30 >> > maximum iterations=100, initial guess is zero >> > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. >> > right preconditioning >> > using UNPRECONDITIONED norm type for convergence test >> > PC Object: 8 MPI processes >> > type: lu >> > out-of-place factorization >> > tolerance for zero pivot 2.22045e-14 >> > matrix ordering: natural >> > factor fill ratio given 0., needed 0. >> > Factored matrix follows: >> > Mat Object: 8 MPI processes >> > type: superlu_dist >> > rows=7925, cols=7925 >> > package used to perform factorization: superlu_dist >> > total: nonzeros=0, allocated nonzeros=0 >> > total number of mallocs used during MatSetValues calls =0 >> > SuperLU_DIST run parameters: >> > Process grid nprow 4 x npcol 2 >> > Equilibrate matrix TRUE >> > Matrix input mode 1 >> > Replace tiny pivots FALSE >> > Use iterative refinement TRUE >> > Processors in row 4 col partition 2 >> > Row permutation LargeDiag >> > Column permutation METIS_AT_PLUS_A >> > Parallel symbolic factorization FALSE >> > Repeated factorization SamePattern >> > linear system matrix followed by preconditioner matrix: >> > Mat Object: 8 MPI processes >> > type: mffd >> > rows=7925, cols=7925 >> > Matrix-free approximation: >> > err=1.49012e-08 (relative error in function evaluation) >> > Using wp compute h routine >> > Does not compute normU >> > Mat Object: () 8 MPI processes >> > type: mpiaij >> > rows=7925, cols=7925 >> > total: nonzeros=63587, allocated nonzeros=63865 >> > total number of mallocs used during MatSetValues calls =0 >> > not using I-node (on process 0) routines >> > >> > >> > Fande, >> > >> > >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Nov 15 15:52:18 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 15 Nov 2017 21:52:18 +0000 Subject: [petsc-users] superlu_dist produces random results In-Reply-To: References: Message-ID: > On Nov 15, 2017, at 3:36 PM, Kong, Fande wrote: > > Hi Barry, > > Thanks for your reply. I was wondering why this happens only when we use superlu_dist. I am trying to understand the algorithm in superlu_dist. If we use ASM or MUMPS, we do not produce these differences. > > The differences actually are NOT meaningless. In fact, we have a real transient application that presents this issue. When we run the simulation with superlu_dist in parallel for thousands of time steps, the final physics solution looks totally different from different runs. The differences are not acceptable any more. For a steady problem, the difference may be meaningless. But it is significant for the transient problem. I submit that the "physics solution" of all of these runs is equally right and equally wrong. If the solutions are very different due to a small perturbation than something is wrong with the model or the integrator, I don't think you can blame the linear solver (see below) > > This makes the solution not reproducible, and we can not even set a targeting solution in the test system because the solution is so different from one run to another. I guess there might/may be a tiny bug in superlu_dist or the PETSc interface to superlu_dist. This is possible but it is also possible this is due to normal round off inside of SuperLU dist. Since you have SuperLU_Dist inside a nonlinear iteration it shouldn't really matter exactly how well SuperLU_Dist does. The nonlinear iteration does essential defect correction for you; are you making sure that the nonlinear iteration always works for every timestep? For example confirm that SNESGetConvergedReason() is always positive. > > > Fande, > > > > > On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. wrote: > > Meaningless differences > > > > On Nov 15, 2017, at 2:26 PM, Kong, Fande wrote: > > > > Hi, > > > > There is a heat conduction problem. When superlu_dist is used as a preconditioner, we have random results from different runs. Is there a random algorithm in superlu_dist? If we use ASM or MUMPS as the preconditioner, we then don't have this issue. > > > > run 1: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020995e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 5.104757e-08 > > 2 Linear |R| = 7.699637e-14 > > 2 Nonlinear |R| = 5.106418e-08 > > > > > > run 2: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020995e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 5.109913e-08 > > 2 Linear |R| = 7.189091e-14 > > 2 Nonlinear |R| = 5.111591e-08 > > > > run 3: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020995e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 5.104942e-08 > > 2 Linear |R| = 7.465572e-14 > > 2 Nonlinear |R| = 5.106642e-08 > > > > run 4: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020995e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 5.102730e-08 > > 2 Linear |R| = 7.132220e-14 > > 2 Nonlinear |R| = 5.104442e-08 > > > > Solver details: > > > > SNES Object: 8 MPI processes > > type: newtonls > > maximum iterations=15, maximum function evaluations=10000 > > tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 > > total number of linear solver iterations=4 > > total number of function evaluations=7 > > norm schedule ALWAYS > > SNESLineSearch Object: 8 MPI processes > > type: basic > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > > maximum iterations=40 > > KSP Object: 8 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=100, initial guess is zero > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 8 MPI processes > > type: lu > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0., needed 0. > > Factored matrix follows: > > Mat Object: 8 MPI processes > > type: superlu_dist > > rows=7925, cols=7925 > > package used to perform factorization: superlu_dist > > total: nonzeros=0, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > SuperLU_DIST run parameters: > > Process grid nprow 4 x npcol 2 > > Equilibrate matrix TRUE > > Matrix input mode 1 > > Replace tiny pivots FALSE > > Use iterative refinement TRUE > > Processors in row 4 col partition 2 > > Row permutation LargeDiag > > Column permutation METIS_AT_PLUS_A > > Parallel symbolic factorization FALSE > > Repeated factorization SamePattern > > linear system matrix followed by preconditioner matrix: > > Mat Object: 8 MPI processes > > type: mffd > > rows=7925, cols=7925 > > Matrix-free approximation: > > err=1.49012e-08 (relative error in function evaluation) > > Using wp compute h routine > > Does not compute normU > > Mat Object: () 8 MPI processes > > type: mpiaij > > rows=7925, cols=7925 > > total: nonzeros=63587, allocated nonzeros=63865 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > > > > > Fande, > > > > > > From a.croucher at auckland.ac.nz Wed Nov 15 16:15:57 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Thu, 16 Nov 2017 11:15:57 +1300 Subject: [petsc-users] ISGlobalToLocalMappingApplyBlock In-Reply-To: References: <5b548602-6be9-a0e0-fe9b-789ed4289ea4@auckland.ac.nz> Message-ID: <69b9a876-cb48-51e1-42f8-af9a0b6c890f@auckland.ac.nz> hi Dave, On 15/11/17 21:34, Dave May wrote: > > > Or am I wrong to expect this to give the same results regardless of > blocksize? > > > > Yep. Maybe I am not using this function correctly then. The man page says it "Provides the local block numbering for a list of integers specified with a block global numbering." So I thought if I put in global block indices it would give me the corresponding local block indices- which would be the same regardless of the size of each block. > > However the large negative number being printed looks an uninitialized > variable. This seems odd as with mode = MASK nout should equal N and > any requested block indices not in the IS should result in -1 being > inserted in your local_indices array. > > What's the value of nout? nout returns 1 on both ranks, as expected. - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Nov 15 16:18:20 2017 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 15 Nov 2017 17:18:20 -0500 Subject: [petsc-users] superlu_dist produces random results In-Reply-To: References: Message-ID: To be clear: these differences completely go away with MUMPS? Can you valgrind this? We have seen some valgrind warning from MUMPS from BLAS routines. It could be that your BLAS is buggy (and SuperLU uses some BLAS routines that MUMPS does not). I think SuperLU does more/different pivoting than MUMPS. What BLAS are you using? (download, MKL, ...) On Wed, Nov 15, 2017 at 4:52 PM, Smith, Barry F. wrote: > > > > On Nov 15, 2017, at 3:36 PM, Kong, Fande wrote: > > > > Hi Barry, > > > > Thanks for your reply. I was wondering why this happens only when we use > superlu_dist. I am trying to understand the algorithm in superlu_dist. If > we use ASM or MUMPS, we do not produce these differences. > > > > The differences actually are NOT meaningless. In fact, we have a real > transient application that presents this issue. When we run the > simulation with superlu_dist in parallel for thousands of time steps, the > final physics solution looks totally different from different runs. The > differences are not acceptable any more. For a steady problem, the > difference may be meaningless. But it is significant for the transient > problem. > > I submit that the "physics solution" of all of these runs is equally > right and equally wrong. If the solutions are very different due to a small > perturbation than something is wrong with the model or the integrator, I > don't think you can blame the linear solver (see below) > > > > This makes the solution not reproducible, and we can not even set a > targeting solution in the test system because the solution is so different > from one run to another. I guess there might/may be a tiny bug in > superlu_dist or the PETSc interface to superlu_dist. > > This is possible but it is also possible this is due to normal round off > inside of SuperLU dist. > > Since you have SuperLU_Dist inside a nonlinear iteration it shouldn't > really matter exactly how well SuperLU_Dist does. The nonlinear iteration > does essential defect correction for you; are you making sure that the > nonlinear iteration always works for every timestep? For example confirm > that SNESGetConvergedReason() is always positive. > > > > > > > > Fande, > > > > > > > > > > On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. > wrote: > > > > Meaningless differences > > > > > > > On Nov 15, 2017, at 2:26 PM, Kong, Fande wrote: > > > > > > Hi, > > > > > > There is a heat conduction problem. When superlu_dist is used as a > preconditioner, we have random results from different runs. Is there a > random algorithm in superlu_dist? If we use ASM or MUMPS as the > preconditioner, we then don't have this issue. > > > > > > run 1: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.104757e-08 > > > 2 Linear |R| = 7.699637e-14 > > > 2 Nonlinear |R| = 5.106418e-08 > > > > > > > > > run 2: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.109913e-08 > > > 2 Linear |R| = 7.189091e-14 > > > 2 Nonlinear |R| = 5.111591e-08 > > > > > > run 3: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.104942e-08 > > > 2 Linear |R| = 7.465572e-14 > > > 2 Nonlinear |R| = 5.106642e-08 > > > > > > run 4: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.102730e-08 > > > 2 Linear |R| = 7.132220e-14 > > > 2 Nonlinear |R| = 5.104442e-08 > > > > > > Solver details: > > > > > > SNES Object: 8 MPI processes > > > type: newtonls > > > maximum iterations=15, maximum function evaluations=10000 > > > tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 > > > total number of linear solver iterations=4 > > > total number of function evaluations=7 > > > norm schedule ALWAYS > > > SNESLineSearch Object: 8 MPI processes > > > type: basic > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > > maximum iterations=40 > > > KSP Object: 8 MPI processes > > > type: gmres > > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > > happy breakdown tolerance 1e-30 > > > maximum iterations=100, initial guess is zero > > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > > > right preconditioning > > > using UNPRECONDITIONED norm type for convergence test > > > PC Object: 8 MPI processes > > > type: lu > > > out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 0., needed 0. > > > Factored matrix follows: > > > Mat Object: 8 MPI processes > > > type: superlu_dist > > > rows=7925, cols=7925 > > > package used to perform factorization: superlu_dist > > > total: nonzeros=0, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > SuperLU_DIST run parameters: > > > Process grid nprow 4 x npcol 2 > > > Equilibrate matrix TRUE > > > Matrix input mode 1 > > > Replace tiny pivots FALSE > > > Use iterative refinement TRUE > > > Processors in row 4 col partition 2 > > > Row permutation LargeDiag > > > Column permutation METIS_AT_PLUS_A > > > Parallel symbolic factorization FALSE > > > Repeated factorization SamePattern > > > linear system matrix followed by preconditioner matrix: > > > Mat Object: 8 MPI processes > > > type: mffd > > > rows=7925, cols=7925 > > > Matrix-free approximation: > > > err=1.49012e-08 (relative error in function evaluation) > > > Using wp compute h routine > > > Does not compute normU > > > Mat Object: () 8 MPI processes > > > type: mpiaij > > > rows=7925, cols=7925 > > > total: nonzeros=63587, allocated nonzeros=63865 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > > > > > > > Fande, > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Wed Nov 15 16:24:35 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Wed, 15 Nov 2017 15:24:35 -0700 Subject: [petsc-users] superlu_dist produces random results In-Reply-To: References: Message-ID: On Wed, Nov 15, 2017 at 2:52 PM, Smith, Barry F. wrote: > > > > On Nov 15, 2017, at 3:36 PM, Kong, Fande wrote: > > > > Hi Barry, > > > > Thanks for your reply. I was wondering why this happens only when we use > superlu_dist. I am trying to understand the algorithm in superlu_dist. If > we use ASM or MUMPS, we do not produce these differences. > > > > The differences actually are NOT meaningless. In fact, we have a real > transient application that presents this issue. When we run the > simulation with superlu_dist in parallel for thousands of time steps, the > final physics solution looks totally different from different runs. The > differences are not acceptable any more. For a steady problem, the > difference may be meaningless. But it is significant for the transient > problem. > > I submit that the "physics solution" of all of these runs is equally > right and equally wrong. If the solutions are very different due to a small > perturbation than something is wrong with the model or the integrator, I > don't think you can blame the linear solver (see below) > > > > This makes the solution not reproducible, and we can not even set a > targeting solution in the test system because the solution is so different > from one run to another. I guess there might/may be a tiny bug in > superlu_dist or the PETSc interface to superlu_dist. > > This is possible but it is also possible this is due to normal round off > inside of SuperLU dist. > > Since you have SuperLU_Dist inside a nonlinear iteration it shouldn't > really matter exactly how well SuperLU_Dist does. The nonlinear iteration > does essential defect correction for you; are you making sure that the > nonlinear iteration always works for every timestep? For example confirm > that SNESGetConvergedReason() is always positive. > Definitely it could be something wrong on my side. But let us focus on the simple question first. To make the discussion a little simpler, let us back to the simple problem (heat conduction). Now I want to understand why this happens to superlu_dist only. When we are using ASM or MUMPS, why we can not see the differences from one run to another? I posted the residual histories for MUMPS and ASM. We can not see any differences in terms of the residual norms when using MUMPS or ASM. Does superlu_dist have higher round off than other solvers? MUMPS run1: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.013384e-02 2 Linear |R| = 4.020993e-08 1 Nonlinear |R| = 1.404678e-02 0 Linear |R| = 1.404678e-02 1 Linear |R| = 4.836162e-08 2 Linear |R| = 7.055620e-14 2 Nonlinear |R| = 4.836392e-08 MUMPS run2: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.013384e-02 2 Linear |R| = 4.020993e-08 1 Nonlinear |R| = 1.404678e-02 0 Linear |R| = 1.404678e-02 1 Linear |R| = 4.836162e-08 2 Linear |R| = 7.055620e-14 2 Nonlinear |R| = 4.836392e-08 MUMPS run3: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.013384e-02 2 Linear |R| = 4.020993e-08 1 Nonlinear |R| = 1.404678e-02 0 Linear |R| = 1.404678e-02 1 Linear |R| = 4.836162e-08 2 Linear |R| = 7.055620e-14 2 Nonlinear |R| = 4.836392e-08 MUMPS run4: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.013384e-02 2 Linear |R| = 4.020993e-08 1 Nonlinear |R| = 1.404678e-02 0 Linear |R| = 1.404678e-02 1 Linear |R| = 4.836162e-08 2 Linear |R| = 7.055620e-14 2 Nonlinear |R| = 4.836392e-08 ASM run1: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 6.189229e+03 2 Linear |R| = 3.252487e+02 3 Linear |R| = 3.485174e+01 4 Linear |R| = 8.600695e+00 5 Linear |R| = 3.333942e+00 6 Linear |R| = 1.706112e+00 7 Linear |R| = 5.047863e-01 8 Linear |R| = 2.337297e-01 9 Linear |R| = 1.071627e-01 10 Linear |R| = 4.692177e-02 11 Linear |R| = 1.340717e-02 12 Linear |R| = 4.753951e-03 1 Nonlinear |R| = 2.320271e-02 0 Linear |R| = 2.320271e-02 1 Linear |R| = 4.367880e-03 2 Linear |R| = 1.407852e-03 3 Linear |R| = 6.036360e-04 4 Linear |R| = 1.867661e-04 5 Linear |R| = 8.760076e-05 6 Linear |R| = 3.260519e-05 7 Linear |R| = 1.435418e-05 8 Linear |R| = 4.532875e-06 9 Linear |R| = 2.439053e-06 10 Linear |R| = 7.998549e-07 11 Linear |R| = 2.428064e-07 12 Linear |R| = 4.766918e-08 13 Linear |R| = 1.713748e-08 2 Nonlinear |R| = 3.671573e-07 ASM run2: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 6.189229e+03 2 Linear |R| = 3.252487e+02 3 Linear |R| = 3.485174e+01 4 Linear |R| = 8.600695e+00 5 Linear |R| = 3.333942e+00 6 Linear |R| = 1.706112e+00 7 Linear |R| = 5.047863e-01 8 Linear |R| = 2.337297e-01 9 Linear |R| = 1.071627e-01 10 Linear |R| = 4.692177e-02 11 Linear |R| = 1.340717e-02 12 Linear |R| = 4.753951e-03 1 Nonlinear |R| = 2.320271e-02 0 Linear |R| = 2.320271e-02 1 Linear |R| = 4.367880e-03 2 Linear |R| = 1.407852e-03 3 Linear |R| = 6.036360e-04 4 Linear |R| = 1.867661e-04 5 Linear |R| = 8.760076e-05 6 Linear |R| = 3.260519e-05 7 Linear |R| = 1.435418e-05 8 Linear |R| = 4.532875e-06 9 Linear |R| = 2.439053e-06 10 Linear |R| = 7.998549e-07 11 Linear |R| = 2.428064e-07 12 Linear |R| = 4.766918e-08 13 Linear |R| = 1.713748e-08 2 Nonlinear |R| = 3.671573e-07 ASM run3: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 6.189229e+03 2 Linear |R| = 3.252487e+02 3 Linear |R| = 3.485174e+01 4 Linear |R| = 8.600695e+00 5 Linear |R| = 3.333942e+00 6 Linear |R| = 1.706112e+00 7 Linear |R| = 5.047863e-01 8 Linear |R| = 2.337297e-01 9 Linear |R| = 1.071627e-01 10 Linear |R| = 4.692177e-02 11 Linear |R| = 1.340717e-02 12 Linear |R| = 4.753951e-03 1 Nonlinear |R| = 2.320271e-02 0 Linear |R| = 2.320271e-02 1 Linear |R| = 4.367880e-03 2 Linear |R| = 1.407852e-03 3 Linear |R| = 6.036360e-04 4 Linear |R| = 1.867661e-04 5 Linear |R| = 8.760076e-05 6 Linear |R| = 3.260519e-05 7 Linear |R| = 1.435418e-05 8 Linear |R| = 4.532875e-06 9 Linear |R| = 2.439053e-06 10 Linear |R| = 7.998549e-07 11 Linear |R| = 2.428064e-07 12 Linear |R| = 4.766918e-08 13 Linear |R| = 1.713748e-08 2 Nonlinear |R| = 3.671573e-07 ASM run4: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 6.189229e+03 2 Linear |R| = 3.252487e+02 3 Linear |R| = 3.485174e+01 4 Linear |R| = 8.600695e+00 5 Linear |R| = 3.333942e+00 6 Linear |R| = 1.706112e+00 7 Linear |R| = 5.047863e-01 8 Linear |R| = 2.337297e-01 9 Linear |R| = 1.071627e-01 10 Linear |R| = 4.692177e-02 11 Linear |R| = 1.340717e-02 12 Linear |R| = 4.753951e-03 1 Nonlinear |R| = 2.320271e-02 0 Linear |R| = 2.320271e-02 1 Linear |R| = 4.367880e-03 2 Linear |R| = 1.407852e-03 3 Linear |R| = 6.036360e-04 4 Linear |R| = 1.867661e-04 5 Linear |R| = 8.760076e-05 6 Linear |R| = 3.260519e-05 7 Linear |R| = 1.435418e-05 8 Linear |R| = 4.532875e-06 9 Linear |R| = 2.439053e-06 10 Linear |R| = 7.998549e-07 11 Linear |R| = 2.428064e-07 12 Linear |R| = 4.766918e-08 13 Linear |R| = 1.713748e-08 2 Nonlinear |R| = 3.671573e-07 > > > > > > > > Fande, > > > > > > > > > > On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. > wrote: > > > > Meaningless differences > > > > > > > On Nov 15, 2017, at 2:26 PM, Kong, Fande wrote: > > > > > > Hi, > > > > > > There is a heat conduction problem. When superlu_dist is used as a > preconditioner, we have random results from different runs. Is there a > random algorithm in superlu_dist? If we use ASM or MUMPS as the > preconditioner, we then don't have this issue. > > > > > > run 1: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.104757e-08 > > > 2 Linear |R| = 7.699637e-14 > > > 2 Nonlinear |R| = 5.106418e-08 > > > > > > > > > run 2: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.109913e-08 > > > 2 Linear |R| = 7.189091e-14 > > > 2 Nonlinear |R| = 5.111591e-08 > > > > > > run 3: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.104942e-08 > > > 2 Linear |R| = 7.465572e-14 > > > 2 Nonlinear |R| = 5.106642e-08 > > > > > > run 4: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.102730e-08 > > > 2 Linear |R| = 7.132220e-14 > > > 2 Nonlinear |R| = 5.104442e-08 > > > > > > Solver details: > > > > > > SNES Object: 8 MPI processes > > > type: newtonls > > > maximum iterations=15, maximum function evaluations=10000 > > > tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 > > > total number of linear solver iterations=4 > > > total number of function evaluations=7 > > > norm schedule ALWAYS > > > SNESLineSearch Object: 8 MPI processes > > > type: basic > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > > maximum iterations=40 > > > KSP Object: 8 MPI processes > > > type: gmres > > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > > happy breakdown tolerance 1e-30 > > > maximum iterations=100, initial guess is zero > > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > > > right preconditioning > > > using UNPRECONDITIONED norm type for convergence test > > > PC Object: 8 MPI processes > > > type: lu > > > out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 0., needed 0. > > > Factored matrix follows: > > > Mat Object: 8 MPI processes > > > type: superlu_dist > > > rows=7925, cols=7925 > > > package used to perform factorization: superlu_dist > > > total: nonzeros=0, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > SuperLU_DIST run parameters: > > > Process grid nprow 4 x npcol 2 > > > Equilibrate matrix TRUE > > > Matrix input mode 1 > > > Replace tiny pivots FALSE > > > Use iterative refinement TRUE > > > Processors in row 4 col partition 2 > > > Row permutation LargeDiag > > > Column permutation METIS_AT_PLUS_A > > > Parallel symbolic factorization FALSE > > > Repeated factorization SamePattern > > > linear system matrix followed by preconditioner matrix: > > > Mat Object: 8 MPI processes > > > type: mffd > > > rows=7925, cols=7925 > > > Matrix-free approximation: > > > err=1.49012e-08 (relative error in function evaluation) > > > Using wp compute h routine > > > Does not compute normU > > > Mat Object: () 8 MPI processes > > > type: mpiaij > > > rows=7925, cols=7925 > > > total: nonzeros=63587, allocated nonzeros=63865 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > > > > > > > Fande, > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Nov 15 16:35:41 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 15 Nov 2017 22:35:41 +0000 Subject: [petsc-users] superlu_dist produces random results In-Reply-To: References: Message-ID: <16D20022-A815-4EAD-98D3-E887894B7FA8@mcs.anl.gov> Since the convergence labeled linear does not converge to 14 digits in one iteration I am assuming you are using lagged preconditioning and or lagged Jacobian? What happens if you do no lagging and solve each linear solve with a new LU factorization? Barry > On Nov 15, 2017, at 4:24 PM, Kong, Fande wrote: > > > > On Wed, Nov 15, 2017 at 2:52 PM, Smith, Barry F. wrote: > > > > On Nov 15, 2017, at 3:36 PM, Kong, Fande wrote: > > > > Hi Barry, > > > > Thanks for your reply. I was wondering why this happens only when we use superlu_dist. I am trying to understand the algorithm in superlu_dist. If we use ASM or MUMPS, we do not produce these differences. > > > > The differences actually are NOT meaningless. In fact, we have a real transient application that presents this issue. When we run the simulation with superlu_dist in parallel for thousands of time steps, the final physics solution looks totally different from different runs. The differences are not acceptable any more. For a steady problem, the difference may be meaningless. But it is significant for the transient problem. > > I submit that the "physics solution" of all of these runs is equally right and equally wrong. If the solutions are very different due to a small perturbation than something is wrong with the model or the integrator, I don't think you can blame the linear solver (see below) > > > > This makes the solution not reproducible, and we can not even set a targeting solution in the test system because the solution is so different from one run to another. I guess there might/may be a tiny bug in superlu_dist or the PETSc interface to superlu_dist. > > This is possible but it is also possible this is due to normal round off inside of SuperLU dist. > > Since you have SuperLU_Dist inside a nonlinear iteration it shouldn't really matter exactly how well SuperLU_Dist does. The nonlinear iteration does essential defect correction for you; are you making sure that the nonlinear iteration always works for every timestep? For example confirm that SNESGetConvergedReason() is always positive. > > Definitely it could be something wrong on my side. But let us focus on the simple question first. > > To make the discussion a little simpler, let us back to the simple problem (heat conduction). Now I want to understand why this happens to superlu_dist only. When we are using ASM or MUMPS, why we can not see the differences from one run to another? I posted the residual histories for MUMPS and ASM. We can not see any differences in terms of the residual norms when using MUMPS or ASM. Does superlu_dist have higher round off than other solvers? > > > > MUMPS run1: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.013384e-02 > 2 Linear |R| = 4.020993e-08 > 1 Nonlinear |R| = 1.404678e-02 > 0 Linear |R| = 1.404678e-02 > 1 Linear |R| = 4.836162e-08 > 2 Linear |R| = 7.055620e-14 > 2 Nonlinear |R| = 4.836392e-08 > > MUMPS run2: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.013384e-02 > 2 Linear |R| = 4.020993e-08 > 1 Nonlinear |R| = 1.404678e-02 > 0 Linear |R| = 1.404678e-02 > 1 Linear |R| = 4.836162e-08 > 2 Linear |R| = 7.055620e-14 > 2 Nonlinear |R| = 4.836392e-08 > > MUMPS run3: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.013384e-02 > 2 Linear |R| = 4.020993e-08 > 1 Nonlinear |R| = 1.404678e-02 > 0 Linear |R| = 1.404678e-02 > 1 Linear |R| = 4.836162e-08 > 2 Linear |R| = 7.055620e-14 > 2 Nonlinear |R| = 4.836392e-08 > > MUMPS run4: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.013384e-02 > 2 Linear |R| = 4.020993e-08 > 1 Nonlinear |R| = 1.404678e-02 > 0 Linear |R| = 1.404678e-02 > 1 Linear |R| = 4.836162e-08 > 2 Linear |R| = 7.055620e-14 > 2 Nonlinear |R| = 4.836392e-08 > > > > ASM run1: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 6.189229e+03 > 2 Linear |R| = 3.252487e+02 > 3 Linear |R| = 3.485174e+01 > 4 Linear |R| = 8.600695e+00 > 5 Linear |R| = 3.333942e+00 > 6 Linear |R| = 1.706112e+00 > 7 Linear |R| = 5.047863e-01 > 8 Linear |R| = 2.337297e-01 > 9 Linear |R| = 1.071627e-01 > 10 Linear |R| = 4.692177e-02 > 11 Linear |R| = 1.340717e-02 > 12 Linear |R| = 4.753951e-03 > 1 Nonlinear |R| = 2.320271e-02 > 0 Linear |R| = 2.320271e-02 > 1 Linear |R| = 4.367880e-03 > 2 Linear |R| = 1.407852e-03 > 3 Linear |R| = 6.036360e-04 > 4 Linear |R| = 1.867661e-04 > 5 Linear |R| = 8.760076e-05 > 6 Linear |R| = 3.260519e-05 > 7 Linear |R| = 1.435418e-05 > 8 Linear |R| = 4.532875e-06 > 9 Linear |R| = 2.439053e-06 > 10 Linear |R| = 7.998549e-07 > 11 Linear |R| = 2.428064e-07 > 12 Linear |R| = 4.766918e-08 > 13 Linear |R| = 1.713748e-08 > 2 Nonlinear |R| = 3.671573e-07 > > > ASM run2: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 6.189229e+03 > 2 Linear |R| = 3.252487e+02 > 3 Linear |R| = 3.485174e+01 > 4 Linear |R| = 8.600695e+00 > 5 Linear |R| = 3.333942e+00 > 6 Linear |R| = 1.706112e+00 > 7 Linear |R| = 5.047863e-01 > 8 Linear |R| = 2.337297e-01 > 9 Linear |R| = 1.071627e-01 > 10 Linear |R| = 4.692177e-02 > 11 Linear |R| = 1.340717e-02 > 12 Linear |R| = 4.753951e-03 > 1 Nonlinear |R| = 2.320271e-02 > 0 Linear |R| = 2.320271e-02 > 1 Linear |R| = 4.367880e-03 > 2 Linear |R| = 1.407852e-03 > 3 Linear |R| = 6.036360e-04 > 4 Linear |R| = 1.867661e-04 > 5 Linear |R| = 8.760076e-05 > 6 Linear |R| = 3.260519e-05 > 7 Linear |R| = 1.435418e-05 > 8 Linear |R| = 4.532875e-06 > 9 Linear |R| = 2.439053e-06 > 10 Linear |R| = 7.998549e-07 > 11 Linear |R| = 2.428064e-07 > 12 Linear |R| = 4.766918e-08 > 13 Linear |R| = 1.713748e-08 > 2 Nonlinear |R| = 3.671573e-07 > > ASM run3: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 6.189229e+03 > 2 Linear |R| = 3.252487e+02 > 3 Linear |R| = 3.485174e+01 > 4 Linear |R| = 8.600695e+00 > 5 Linear |R| = 3.333942e+00 > 6 Linear |R| = 1.706112e+00 > 7 Linear |R| = 5.047863e-01 > 8 Linear |R| = 2.337297e-01 > 9 Linear |R| = 1.071627e-01 > 10 Linear |R| = 4.692177e-02 > 11 Linear |R| = 1.340717e-02 > 12 Linear |R| = 4.753951e-03 > 1 Nonlinear |R| = 2.320271e-02 > 0 Linear |R| = 2.320271e-02 > 1 Linear |R| = 4.367880e-03 > 2 Linear |R| = 1.407852e-03 > 3 Linear |R| = 6.036360e-04 > 4 Linear |R| = 1.867661e-04 > 5 Linear |R| = 8.760076e-05 > 6 Linear |R| = 3.260519e-05 > 7 Linear |R| = 1.435418e-05 > 8 Linear |R| = 4.532875e-06 > 9 Linear |R| = 2.439053e-06 > 10 Linear |R| = 7.998549e-07 > 11 Linear |R| = 2.428064e-07 > 12 Linear |R| = 4.766918e-08 > 13 Linear |R| = 1.713748e-08 > 2 Nonlinear |R| = 3.671573e-07 > > > > ASM run4: > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 6.189229e+03 > 2 Linear |R| = 3.252487e+02 > 3 Linear |R| = 3.485174e+01 > 4 Linear |R| = 8.600695e+00 > 5 Linear |R| = 3.333942e+00 > 6 Linear |R| = 1.706112e+00 > 7 Linear |R| = 5.047863e-01 > 8 Linear |R| = 2.337297e-01 > 9 Linear |R| = 1.071627e-01 > 10 Linear |R| = 4.692177e-02 > 11 Linear |R| = 1.340717e-02 > 12 Linear |R| = 4.753951e-03 > 1 Nonlinear |R| = 2.320271e-02 > 0 Linear |R| = 2.320271e-02 > 1 Linear |R| = 4.367880e-03 > 2 Linear |R| = 1.407852e-03 > 3 Linear |R| = 6.036360e-04 > 4 Linear |R| = 1.867661e-04 > 5 Linear |R| = 8.760076e-05 > 6 Linear |R| = 3.260519e-05 > 7 Linear |R| = 1.435418e-05 > 8 Linear |R| = 4.532875e-06 > 9 Linear |R| = 2.439053e-06 > 10 Linear |R| = 7.998549e-07 > 11 Linear |R| = 2.428064e-07 > 12 Linear |R| = 4.766918e-08 > 13 Linear |R| = 1.713748e-08 > 2 Nonlinear |R| = 3.671573e-07 > > > > > > > > > > > > > Fande, > > > > > > > > > > On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. wrote: > > > > Meaningless differences > > > > > > > On Nov 15, 2017, at 2:26 PM, Kong, Fande wrote: > > > > > > Hi, > > > > > > There is a heat conduction problem. When superlu_dist is used as a preconditioner, we have random results from different runs. Is there a random algorithm in superlu_dist? If we use ASM or MUMPS as the preconditioner, we then don't have this issue. > > > > > > run 1: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.104757e-08 > > > 2 Linear |R| = 7.699637e-14 > > > 2 Nonlinear |R| = 5.106418e-08 > > > > > > > > > run 2: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.109913e-08 > > > 2 Linear |R| = 7.189091e-14 > > > 2 Nonlinear |R| = 5.111591e-08 > > > > > > run 3: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.104942e-08 > > > 2 Linear |R| = 7.465572e-14 > > > 2 Nonlinear |R| = 5.106642e-08 > > > > > > run 4: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020995e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 5.102730e-08 > > > 2 Linear |R| = 7.132220e-14 > > > 2 Nonlinear |R| = 5.104442e-08 > > > > > > Solver details: > > > > > > SNES Object: 8 MPI processes > > > type: newtonls > > > maximum iterations=15, maximum function evaluations=10000 > > > tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 > > > total number of linear solver iterations=4 > > > total number of function evaluations=7 > > > norm schedule ALWAYS > > > SNESLineSearch Object: 8 MPI processes > > > type: basic > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > > > maximum iterations=40 > > > KSP Object: 8 MPI processes > > > type: gmres > > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > > happy breakdown tolerance 1e-30 > > > maximum iterations=100, initial guess is zero > > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > > > right preconditioning > > > using UNPRECONDITIONED norm type for convergence test > > > PC Object: 8 MPI processes > > > type: lu > > > out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 0., needed 0. > > > Factored matrix follows: > > > Mat Object: 8 MPI processes > > > type: superlu_dist > > > rows=7925, cols=7925 > > > package used to perform factorization: superlu_dist > > > total: nonzeros=0, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > SuperLU_DIST run parameters: > > > Process grid nprow 4 x npcol 2 > > > Equilibrate matrix TRUE > > > Matrix input mode 1 > > > Replace tiny pivots FALSE > > > Use iterative refinement TRUE > > > Processors in row 4 col partition 2 > > > Row permutation LargeDiag > > > Column permutation METIS_AT_PLUS_A > > > Parallel symbolic factorization FALSE > > > Repeated factorization SamePattern > > > linear system matrix followed by preconditioner matrix: > > > Mat Object: 8 MPI processes > > > type: mffd > > > rows=7925, cols=7925 > > > Matrix-free approximation: > > > err=1.49012e-08 (relative error in function evaluation) > > > Using wp compute h routine > > > Does not compute normU > > > Mat Object: () 8 MPI processes > > > type: mpiaij > > > rows=7925, cols=7925 > > > total: nonzeros=63587, allocated nonzeros=63865 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > > > > > > > Fande, > > > > > > > > > > From jed at jedbrown.org Wed Nov 15 16:48:05 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 15 Nov 2017 15:48:05 -0700 Subject: [petsc-users] IDR availalbe in PETSC? In-Reply-To: References: Message-ID: <87375f17mi.fsf@jedbrown.org> There isn't an IDR in PETSc, but there is BCGSL which usually performs similarly. Contributions welcome. Evan Um writes: > Dear PETSC users, > > I was wondering if anyone already tried/developed an induced dimension > reduction (IDR) solver for PETSC? I think that it is a useful one but I > couldn't find its example with PETSC. If you have any idea about IDR > routines for PETSC, please let me know. Thanks! > > Best, > Evan From a.croucher at auckland.ac.nz Wed Nov 15 16:52:10 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Thu, 16 Nov 2017 11:52:10 +1300 Subject: [petsc-users] ISGlobalToLocalMappingApplyBlock In-Reply-To: <69b9a876-cb48-51e1-42f8-af9a0b6c890f@auckland.ac.nz> References: <5b548602-6be9-a0e0-fe9b-789ed4289ea4@auckland.ac.nz> <69b9a876-cb48-51e1-42f8-af9a0b6c890f@auckland.ac.nz> Message-ID: <94d0e9c4-e218-4fce-c6c5-1896dc9eea00@auckland.ac.nz> I actually attached the wrong test program last time- I've attached the right one here, which is much simpler. It test global indices 0, 1, ... 9. If I run on 2 processes, the local indices it returns are: rank 0: 0, 1, 2, 3, 4, 0, 0, 0, -253701943, 0 rank 1: -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 The results I expected are: rank 0: 0, 1, 2, 3, 4, -1, -1, -1, -1, -1 rank 1: -1, -1, -1, -1, -1, 0, 1, 2, 3, 4 So the results for global indices 0, 1,... 4 are what I expected, on both ranks. But the results for global indices 5, 6, ... 9 are not. I tried increasing the blocksize to 3 or 4, and the results were exactly the same. It only gives the results I expected if I change the blocksize to 1. - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- A non-text attachment was scrubbed... Name: testl2g2.F90 Type: text/x-fortran Size: 1185 bytes Desc: not available URL: From fande.kong at inl.gov Wed Nov 15 16:52:36 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Wed, 15 Nov 2017 15:52:36 -0700 Subject: [petsc-users] superlu_dist produces random results In-Reply-To: <16D20022-A815-4EAD-98D3-E887894B7FA8@mcs.anl.gov> References: <16D20022-A815-4EAD-98D3-E887894B7FA8@mcs.anl.gov> Message-ID: On Wed, Nov 15, 2017 at 3:35 PM, Smith, Barry F. wrote: > > Since the convergence labeled linear does not converge to 14 digits in > one iteration I am assuming you are using lagged preconditioning and or > lagged Jacobian? > We are using Jacobian-free Newton. So Jacobian is different from the preconditioning matrix. > > What happens if you do no lagging and solve each linear solve with a > new LU factorization? > We have the following results without using Jacobian-free Newton. Again, superlu_dist produces differences, while MUMPS gives the same results in terms of the residual norms. Fande, Superlu_dist run1: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.322285e-11 1 Nonlinear |R| = 1.666987e-11 Superlu_dist run2: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.322171e-11 1 Nonlinear |R| = 1.666977e-11 Superlu_dist run3: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.321964e-11 1 Nonlinear |R| = 1.666959e-11 Superlu_dist run4: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.321978e-11 1 Nonlinear |R| = 1.668688e-11 MUMPS run1: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.360637e-11 1 Nonlinear |R| = 1.654334e-11 MUMPS run 2: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.360637e-11 1 Nonlinear |R| = 1.654334e-11 MUMPS run 3: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.360637e-11 1 Nonlinear |R| = 1.654334e-11 MUMPS run4: 0 Nonlinear |R| = 9.447423e+03 0 Linear |R| = 9.447423e+03 1 Linear |R| = 1.360637e-11 1 Nonlinear |R| = 1.654334e-11 > > Barry > > > > On Nov 15, 2017, at 4:24 PM, Kong, Fande wrote: > > > > > > > > On Wed, Nov 15, 2017 at 2:52 PM, Smith, Barry F. > wrote: > > > > > > > On Nov 15, 2017, at 3:36 PM, Kong, Fande wrote: > > > > > > Hi Barry, > > > > > > Thanks for your reply. I was wondering why this happens only when we > use superlu_dist. I am trying to understand the algorithm in superlu_dist. > If we use ASM or MUMPS, we do not produce these differences. > > > > > > The differences actually are NOT meaningless. In fact, we have a real > transient application that presents this issue. When we run the > simulation with superlu_dist in parallel for thousands of time steps, the > final physics solution looks totally different from different runs. The > differences are not acceptable any more. For a steady problem, the > difference may be meaningless. But it is significant for the transient > problem. > > > > I submit that the "physics solution" of all of these runs is equally > right and equally wrong. If the solutions are very different due to a small > perturbation than something is wrong with the model or the integrator, I > don't think you can blame the linear solver (see below) > > > > > > This makes the solution not reproducible, and we can not even set a > targeting solution in the test system because the solution is so different > from one run to another. I guess there might/may be a tiny bug in > superlu_dist or the PETSc interface to superlu_dist. > > > > This is possible but it is also possible this is due to normal round > off inside of SuperLU dist. > > > > Since you have SuperLU_Dist inside a nonlinear iteration it shouldn't > really matter exactly how well SuperLU_Dist does. The nonlinear iteration > does essential defect correction for you; are you making sure that the > nonlinear iteration always works for every timestep? For example confirm > that SNESGetConvergedReason() is always positive. > > > > Definitely it could be something wrong on my side. But let us focus on > the simple question first. > > > > To make the discussion a little simpler, let us back to the simple > problem (heat conduction). Now I want to understand why this happens to > superlu_dist only. When we are using ASM or MUMPS, why we can not see the > differences from one run to another? I posted the residual histories for > MUMPS and ASM. We can not see any differences in terms of the residual > norms when using MUMPS or ASM. Does superlu_dist have higher round off than > other solvers? > > > > > > > > MUMPS run1: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > MUMPS run2: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > MUMPS run3: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > MUMPS run4: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > > > > > ASM run1: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > ASM run2: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > ASM run3: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > > > ASM run4: > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > > > > > > > > > > > > > > > > > > Fande, > > > > > > > > > > > > > > > On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. > wrote: > > > > > > Meaningless differences > > > > > > > > > > On Nov 15, 2017, at 2:26 PM, Kong, Fande wrote: > > > > > > > > Hi, > > > > > > > > There is a heat conduction problem. When superlu_dist is used as a > preconditioner, we have random results from different runs. Is there a > random algorithm in superlu_dist? If we use ASM or MUMPS as the > preconditioner, we then don't have this issue. > > > > > > > > run 1: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.104757e-08 > > > > 2 Linear |R| = 7.699637e-14 > > > > 2 Nonlinear |R| = 5.106418e-08 > > > > > > > > > > > > run 2: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.109913e-08 > > > > 2 Linear |R| = 7.189091e-14 > > > > 2 Nonlinear |R| = 5.111591e-08 > > > > > > > > run 3: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.104942e-08 > > > > 2 Linear |R| = 7.465572e-14 > > > > 2 Nonlinear |R| = 5.106642e-08 > > > > > > > > run 4: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.102730e-08 > > > > 2 Linear |R| = 7.132220e-14 > > > > 2 Nonlinear |R| = 5.104442e-08 > > > > > > > > Solver details: > > > > > > > > SNES Object: 8 MPI processes > > > > type: newtonls > > > > maximum iterations=15, maximum function evaluations=10000 > > > > tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 > > > > total number of linear solver iterations=4 > > > > total number of function evaluations=7 > > > > norm schedule ALWAYS > > > > SNESLineSearch Object: 8 MPI processes > > > > type: basic > > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > > > maximum iterations=40 > > > > KSP Object: 8 MPI processes > > > > type: gmres > > > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > > > happy breakdown tolerance 1e-30 > > > > maximum iterations=100, initial guess is zero > > > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > > > > right preconditioning > > > > using UNPRECONDITIONED norm type for convergence test > > > > PC Object: 8 MPI processes > > > > type: lu > > > > out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > matrix ordering: natural > > > > factor fill ratio given 0., needed 0. > > > > Factored matrix follows: > > > > Mat Object: 8 MPI processes > > > > type: superlu_dist > > > > rows=7925, cols=7925 > > > > package used to perform factorization: superlu_dist > > > > total: nonzeros=0, allocated nonzeros=0 > > > > total number of mallocs used during MatSetValues calls =0 > > > > SuperLU_DIST run parameters: > > > > Process grid nprow 4 x npcol 2 > > > > Equilibrate matrix TRUE > > > > Matrix input mode 1 > > > > Replace tiny pivots FALSE > > > > Use iterative refinement TRUE > > > > Processors in row 4 col partition 2 > > > > Row permutation LargeDiag > > > > Column permutation METIS_AT_PLUS_A > > > > Parallel symbolic factorization FALSE > > > > Repeated factorization SamePattern > > > > linear system matrix followed by preconditioner matrix: > > > > Mat Object: 8 MPI processes > > > > type: mffd > > > > rows=7925, cols=7925 > > > > Matrix-free approximation: > > > > err=1.49012e-08 (relative error in function evaluation) > > > > Using wp compute h routine > > > > Does not compute normU > > > > Mat Object: () 8 MPI processes > > > > type: mpiaij > > > > rows=7925, cols=7925 > > > > total: nonzeros=63587, allocated nonzeros=63865 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > > > > > > > > > Fande, > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Nov 15 17:04:49 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 15 Nov 2017 23:04:49 +0000 Subject: [petsc-users] superlu_dist produces random results In-Reply-To: References: <16D20022-A815-4EAD-98D3-E887894B7FA8@mcs.anl.gov> Message-ID: <053458DA-CA4C-4E3D-A11E-D9C4FCE438E3@mcs.anl.gov> Do the ASM runs for thousands of time-steps produce the same final "physical results" as the MUMPS run for thousands of timesteps? While with SuperLU you get a very different "physical results"? Barry > On Nov 15, 2017, at 4:52 PM, Kong, Fande wrote: > > > > On Wed, Nov 15, 2017 at 3:35 PM, Smith, Barry F. wrote: > > Since the convergence labeled linear does not converge to 14 digits in one iteration I am assuming you are using lagged preconditioning and or lagged Jacobian? > > We are using Jacobian-free Newton. So Jacobian is different from the preconditioning matrix. > > > What happens if you do no lagging and solve each linear solve with a new LU factorization? > > We have the following results without using Jacobian-free Newton. Again, superlu_dist produces differences, while MUMPS gives the same results in terms of the residual norms. > > > Fande, > > > Superlu_dist run1: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.322285e-11 > 1 Nonlinear |R| = 1.666987e-11 > > > Superlu_dist run2: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.322171e-11 > 1 Nonlinear |R| = 1.666977e-11 > > > Superlu_dist run3: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.321964e-11 > 1 Nonlinear |R| = 1.666959e-11 > > > Superlu_dist run4: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.321978e-11 > 1 Nonlinear |R| = 1.668688e-11 > > > MUMPS run1: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.360637e-11 > 1 Nonlinear |R| = 1.654334e-11 > > MUMPS run 2: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.360637e-11 > 1 Nonlinear |R| = 1.654334e-11 > > MUMPS run 3: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.360637e-11 > 1 Nonlinear |R| = 1.654334e-11 > > MUMPS run4: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.360637e-11 > 1 Nonlinear |R| = 1.654334e-11 > > > > > > > > > > Barry > > > > On Nov 15, 2017, at 4:24 PM, Kong, Fande wrote: > > > > > > > > On Wed, Nov 15, 2017 at 2:52 PM, Smith, Barry F. wrote: > > > > > > > On Nov 15, 2017, at 3:36 PM, Kong, Fande wrote: > > > > > > Hi Barry, > > > > > > Thanks for your reply. I was wondering why this happens only when we use superlu_dist. I am trying to understand the algorithm in superlu_dist. If we use ASM or MUMPS, we do not produce these differences. > > > > > > The differences actually are NOT meaningless. In fact, we have a real transient application that presents this issue. When we run the simulation with superlu_dist in parallel for thousands of time steps, the final physics solution looks totally different from different runs. The differences are not acceptable any more. For a steady problem, the difference may be meaningless. But it is significant for the transient problem. > > > > I submit that the "physics solution" of all of these runs is equally right and equally wrong. If the solutions are very different due to a small perturbation than something is wrong with the model or the integrator, I don't think you can blame the linear solver (see below) > > > > > > This makes the solution not reproducible, and we can not even set a targeting solution in the test system because the solution is so different from one run to another. I guess there might/may be a tiny bug in superlu_dist or the PETSc interface to superlu_dist. > > > > This is possible but it is also possible this is due to normal round off inside of SuperLU dist. > > > > Since you have SuperLU_Dist inside a nonlinear iteration it shouldn't really matter exactly how well SuperLU_Dist does. The nonlinear iteration does essential defect correction for you; are you making sure that the nonlinear iteration always works for every timestep? For example confirm that SNESGetConvergedReason() is always positive. > > > > Definitely it could be something wrong on my side. But let us focus on the simple question first. > > > > To make the discussion a little simpler, let us back to the simple problem (heat conduction). Now I want to understand why this happens to superlu_dist only. When we are using ASM or MUMPS, why we can not see the differences from one run to another? I posted the residual histories for MUMPS and ASM. We can not see any differences in terms of the residual norms when using MUMPS or ASM. Does superlu_dist have higher round off than other solvers? > > > > > > > > MUMPS run1: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > MUMPS run2: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > MUMPS run3: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > MUMPS run4: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > > > > > ASM run1: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > ASM run2: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > ASM run3: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > > > ASM run4: > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > > > > > > > > > > > > > > > > > > Fande, > > > > > > > > > > > > > > > On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. wrote: > > > > > > Meaningless differences > > > > > > > > > > On Nov 15, 2017, at 2:26 PM, Kong, Fande wrote: > > > > > > > > Hi, > > > > > > > > There is a heat conduction problem. When superlu_dist is used as a preconditioner, we have random results from different runs. Is there a random algorithm in superlu_dist? If we use ASM or MUMPS as the preconditioner, we then don't have this issue. > > > > > > > > run 1: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.104757e-08 > > > > 2 Linear |R| = 7.699637e-14 > > > > 2 Nonlinear |R| = 5.106418e-08 > > > > > > > > > > > > run 2: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.109913e-08 > > > > 2 Linear |R| = 7.189091e-14 > > > > 2 Nonlinear |R| = 5.111591e-08 > > > > > > > > run 3: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.104942e-08 > > > > 2 Linear |R| = 7.465572e-14 > > > > 2 Nonlinear |R| = 5.106642e-08 > > > > > > > > run 4: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.102730e-08 > > > > 2 Linear |R| = 7.132220e-14 > > > > 2 Nonlinear |R| = 5.104442e-08 > > > > > > > > Solver details: > > > > > > > > SNES Object: 8 MPI processes > > > > type: newtonls > > > > maximum iterations=15, maximum function evaluations=10000 > > > > tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 > > > > total number of linear solver iterations=4 > > > > total number of function evaluations=7 > > > > norm schedule ALWAYS > > > > SNESLineSearch Object: 8 MPI processes > > > > type: basic > > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > > > > maximum iterations=40 > > > > KSP Object: 8 MPI processes > > > > type: gmres > > > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > > > happy breakdown tolerance 1e-30 > > > > maximum iterations=100, initial guess is zero > > > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > > > > right preconditioning > > > > using UNPRECONDITIONED norm type for convergence test > > > > PC Object: 8 MPI processes > > > > type: lu > > > > out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > matrix ordering: natural > > > > factor fill ratio given 0., needed 0. > > > > Factored matrix follows: > > > > Mat Object: 8 MPI processes > > > > type: superlu_dist > > > > rows=7925, cols=7925 > > > > package used to perform factorization: superlu_dist > > > > total: nonzeros=0, allocated nonzeros=0 > > > > total number of mallocs used during MatSetValues calls =0 > > > > SuperLU_DIST run parameters: > > > > Process grid nprow 4 x npcol 2 > > > > Equilibrate matrix TRUE > > > > Matrix input mode 1 > > > > Replace tiny pivots FALSE > > > > Use iterative refinement TRUE > > > > Processors in row 4 col partition 2 > > > > Row permutation LargeDiag > > > > Column permutation METIS_AT_PLUS_A > > > > Parallel symbolic factorization FALSE > > > > Repeated factorization SamePattern > > > > linear system matrix followed by preconditioner matrix: > > > > Mat Object: 8 MPI processes > > > > type: mffd > > > > rows=7925, cols=7925 > > > > Matrix-free approximation: > > > > err=1.49012e-08 (relative error in function evaluation) > > > > Using wp compute h routine > > > > Does not compute normU > > > > Mat Object: () 8 MPI processes > > > > type: mpiaij > > > > rows=7925, cols=7925 > > > > total: nonzeros=63587, allocated nonzeros=63865 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > > > > > > > > > Fande, > > > > > > > > > > > > > > > > From fande.kong at inl.gov Wed Nov 15 17:17:51 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Wed, 15 Nov 2017 16:17:51 -0700 Subject: [petsc-users] superlu_dist produces random results In-Reply-To: <053458DA-CA4C-4E3D-A11E-D9C4FCE438E3@mcs.anl.gov> References: <16D20022-A815-4EAD-98D3-E887894B7FA8@mcs.anl.gov> <053458DA-CA4C-4E3D-A11E-D9C4FCE438E3@mcs.anl.gov> Message-ID: Thanks, Barry, On Wed, Nov 15, 2017 at 4:04 PM, Smith, Barry F. wrote: > > Do the ASM runs for thousands of time-steps produce the same final > "physical results" as the MUMPS run for thousands of timesteps? While with > SuperLU you get a very different "physical results"? > Let me update a little bit more. The simulation with SuperLU may fail at certain time step. Sometime we can also run the simulation successfully for the whole time range. It is totally random. We will try ASM and MUMPS. Fande, > > Barry > > > > On Nov 15, 2017, at 4:52 PM, Kong, Fande wrote: > > > > > > > > On Wed, Nov 15, 2017 at 3:35 PM, Smith, Barry F. > wrote: > > > > Since the convergence labeled linear does not converge to 14 digits in > one iteration I am assuming you are using lagged preconditioning and or > lagged Jacobian? > > > > We are using Jacobian-free Newton. So Jacobian is different from the > preconditioning matrix. > > > > > > What happens if you do no lagging and solve each linear solve with a > new LU factorization? > > > > We have the following results without using Jacobian-free Newton. Again, > superlu_dist produces differences, while MUMPS gives the same results in > terms of the residual norms. > > > > > > Fande, > > > > > > Superlu_dist run1: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.322285e-11 > > 1 Nonlinear |R| = 1.666987e-11 > > > > > > Superlu_dist run2: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.322171e-11 > > 1 Nonlinear |R| = 1.666977e-11 > > > > > > Superlu_dist run3: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.321964e-11 > > 1 Nonlinear |R| = 1.666959e-11 > > > > > > Superlu_dist run4: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.321978e-11 > > 1 Nonlinear |R| = 1.668688e-11 > > > > > > MUMPS run1: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.360637e-11 > > 1 Nonlinear |R| = 1.654334e-11 > > > > MUMPS run 2: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.360637e-11 > > 1 Nonlinear |R| = 1.654334e-11 > > > > MUMPS run 3: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.360637e-11 > > 1 Nonlinear |R| = 1.654334e-11 > > > > MUMPS run4: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.360637e-11 > > 1 Nonlinear |R| = 1.654334e-11 > > > > > > > > > > > > > > > > > > > > Barry > > > > > > > On Nov 15, 2017, at 4:24 PM, Kong, Fande wrote: > > > > > > > > > > > > On Wed, Nov 15, 2017 at 2:52 PM, Smith, Barry F. > wrote: > > > > > > > > > > On Nov 15, 2017, at 3:36 PM, Kong, Fande wrote: > > > > > > > > Hi Barry, > > > > > > > > Thanks for your reply. I was wondering why this happens only when we > use superlu_dist. I am trying to understand the algorithm in superlu_dist. > If we use ASM or MUMPS, we do not produce these differences. > > > > > > > > The differences actually are NOT meaningless. In fact, we have a > real transient application that presents this issue. When we run the > simulation with superlu_dist in parallel for thousands of time steps, the > final physics solution looks totally different from different runs. The > differences are not acceptable any more. For a steady problem, the > difference may be meaningless. But it is significant for the transient > problem. > > > > > > I submit that the "physics solution" of all of these runs is equally > right and equally wrong. If the solutions are very different due to a small > perturbation than something is wrong with the model or the integrator, I > don't think you can blame the linear solver (see below) > > > > > > > > This makes the solution not reproducible, and we can not even set a > targeting solution in the test system because the solution is so different > from one run to another. I guess there might/may be a tiny bug in > superlu_dist or the PETSc interface to superlu_dist. > > > > > > This is possible but it is also possible this is due to normal round > off inside of SuperLU dist. > > > > > > Since you have SuperLU_Dist inside a nonlinear iteration it > shouldn't really matter exactly how well SuperLU_Dist does. The nonlinear > iteration does essential defect correction for you; are you making sure > that the nonlinear iteration always works for every timestep? For example > confirm that SNESGetConvergedReason() is always positive. > > > > > > Definitely it could be something wrong on my side. But let us focus > on the simple question first. > > > > > > To make the discussion a little simpler, let us back to the simple > problem (heat conduction). Now I want to understand why this happens to > superlu_dist only. When we are using ASM or MUMPS, why we can not see the > differences from one run to another? I posted the residual histories for > MUMPS and ASM. We can not see any differences in terms of the residual > norms when using MUMPS or ASM. Does superlu_dist have higher round off than > other solvers? > > > > > > > > > > > > MUMPS run1: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020993e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 4.836162e-08 > > > 2 Linear |R| = 7.055620e-14 > > > 2 Nonlinear |R| = 4.836392e-08 > > > > > > MUMPS run2: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020993e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 4.836162e-08 > > > 2 Linear |R| = 7.055620e-14 > > > 2 Nonlinear |R| = 4.836392e-08 > > > > > > MUMPS run3: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020993e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 4.836162e-08 > > > 2 Linear |R| = 7.055620e-14 > > > 2 Nonlinear |R| = 4.836392e-08 > > > > > > MUMPS run4: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 1.013384e-02 > > > 2 Linear |R| = 4.020993e-08 > > > 1 Nonlinear |R| = 1.404678e-02 > > > 0 Linear |R| = 1.404678e-02 > > > 1 Linear |R| = 4.836162e-08 > > > 2 Linear |R| = 7.055620e-14 > > > 2 Nonlinear |R| = 4.836392e-08 > > > > > > > > > > > > ASM run1: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 6.189229e+03 > > > 2 Linear |R| = 3.252487e+02 > > > 3 Linear |R| = 3.485174e+01 > > > 4 Linear |R| = 8.600695e+00 > > > 5 Linear |R| = 3.333942e+00 > > > 6 Linear |R| = 1.706112e+00 > > > 7 Linear |R| = 5.047863e-01 > > > 8 Linear |R| = 2.337297e-01 > > > 9 Linear |R| = 1.071627e-01 > > > 10 Linear |R| = 4.692177e-02 > > > 11 Linear |R| = 1.340717e-02 > > > 12 Linear |R| = 4.753951e-03 > > > 1 Nonlinear |R| = 2.320271e-02 > > > 0 Linear |R| = 2.320271e-02 > > > 1 Linear |R| = 4.367880e-03 > > > 2 Linear |R| = 1.407852e-03 > > > 3 Linear |R| = 6.036360e-04 > > > 4 Linear |R| = 1.867661e-04 > > > 5 Linear |R| = 8.760076e-05 > > > 6 Linear |R| = 3.260519e-05 > > > 7 Linear |R| = 1.435418e-05 > > > 8 Linear |R| = 4.532875e-06 > > > 9 Linear |R| = 2.439053e-06 > > > 10 Linear |R| = 7.998549e-07 > > > 11 Linear |R| = 2.428064e-07 > > > 12 Linear |R| = 4.766918e-08 > > > 13 Linear |R| = 1.713748e-08 > > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > > > > ASM run2: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 6.189229e+03 > > > 2 Linear |R| = 3.252487e+02 > > > 3 Linear |R| = 3.485174e+01 > > > 4 Linear |R| = 8.600695e+00 > > > 5 Linear |R| = 3.333942e+00 > > > 6 Linear |R| = 1.706112e+00 > > > 7 Linear |R| = 5.047863e-01 > > > 8 Linear |R| = 2.337297e-01 > > > 9 Linear |R| = 1.071627e-01 > > > 10 Linear |R| = 4.692177e-02 > > > 11 Linear |R| = 1.340717e-02 > > > 12 Linear |R| = 4.753951e-03 > > > 1 Nonlinear |R| = 2.320271e-02 > > > 0 Linear |R| = 2.320271e-02 > > > 1 Linear |R| = 4.367880e-03 > > > 2 Linear |R| = 1.407852e-03 > > > 3 Linear |R| = 6.036360e-04 > > > 4 Linear |R| = 1.867661e-04 > > > 5 Linear |R| = 8.760076e-05 > > > 6 Linear |R| = 3.260519e-05 > > > 7 Linear |R| = 1.435418e-05 > > > 8 Linear |R| = 4.532875e-06 > > > 9 Linear |R| = 2.439053e-06 > > > 10 Linear |R| = 7.998549e-07 > > > 11 Linear |R| = 2.428064e-07 > > > 12 Linear |R| = 4.766918e-08 > > > 13 Linear |R| = 1.713748e-08 > > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > ASM run3: > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 6.189229e+03 > > > 2 Linear |R| = 3.252487e+02 > > > 3 Linear |R| = 3.485174e+01 > > > 4 Linear |R| = 8.600695e+00 > > > 5 Linear |R| = 3.333942e+00 > > > 6 Linear |R| = 1.706112e+00 > > > 7 Linear |R| = 5.047863e-01 > > > 8 Linear |R| = 2.337297e-01 > > > 9 Linear |R| = 1.071627e-01 > > > 10 Linear |R| = 4.692177e-02 > > > 11 Linear |R| = 1.340717e-02 > > > 12 Linear |R| = 4.753951e-03 > > > 1 Nonlinear |R| = 2.320271e-02 > > > 0 Linear |R| = 2.320271e-02 > > > 1 Linear |R| = 4.367880e-03 > > > 2 Linear |R| = 1.407852e-03 > > > 3 Linear |R| = 6.036360e-04 > > > 4 Linear |R| = 1.867661e-04 > > > 5 Linear |R| = 8.760076e-05 > > > 6 Linear |R| = 3.260519e-05 > > > 7 Linear |R| = 1.435418e-05 > > > 8 Linear |R| = 4.532875e-06 > > > 9 Linear |R| = 2.439053e-06 > > > 10 Linear |R| = 7.998549e-07 > > > 11 Linear |R| = 2.428064e-07 > > > 12 Linear |R| = 4.766918e-08 > > > 13 Linear |R| = 1.713748e-08 > > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > > > > > > > ASM run4: > > > 0 Nonlinear |R| = 9.447423e+03 > > > 0 Linear |R| = 9.447423e+03 > > > 1 Linear |R| = 6.189229e+03 > > > 2 Linear |R| = 3.252487e+02 > > > 3 Linear |R| = 3.485174e+01 > > > 4 Linear |R| = 8.600695e+00 > > > 5 Linear |R| = 3.333942e+00 > > > 6 Linear |R| = 1.706112e+00 > > > 7 Linear |R| = 5.047863e-01 > > > 8 Linear |R| = 2.337297e-01 > > > 9 Linear |R| = 1.071627e-01 > > > 10 Linear |R| = 4.692177e-02 > > > 11 Linear |R| = 1.340717e-02 > > > 12 Linear |R| = 4.753951e-03 > > > 1 Nonlinear |R| = 2.320271e-02 > > > 0 Linear |R| = 2.320271e-02 > > > 1 Linear |R| = 4.367880e-03 > > > 2 Linear |R| = 1.407852e-03 > > > 3 Linear |R| = 6.036360e-04 > > > 4 Linear |R| = 1.867661e-04 > > > 5 Linear |R| = 8.760076e-05 > > > 6 Linear |R| = 3.260519e-05 > > > 7 Linear |R| = 1.435418e-05 > > > 8 Linear |R| = 4.532875e-06 > > > 9 Linear |R| = 2.439053e-06 > > > 10 Linear |R| = 7.998549e-07 > > > 11 Linear |R| = 2.428064e-07 > > > 12 Linear |R| = 4.766918e-08 > > > 13 Linear |R| = 1.713748e-08 > > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fande, > > > > > > > > > > > > > > > > > > > > On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. > wrote: > > > > > > > > Meaningless differences > > > > > > > > > > > > > On Nov 15, 2017, at 2:26 PM, Kong, Fande > wrote: > > > > > > > > > > Hi, > > > > > > > > > > There is a heat conduction problem. When superlu_dist is used as a > preconditioner, we have random results from different runs. Is there a > random algorithm in superlu_dist? If we use ASM or MUMPS as the > preconditioner, we then don't have this issue. > > > > > > > > > > run 1: > > > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > > 0 Linear |R| = 9.447423e+03 > > > > > 1 Linear |R| = 1.013384e-02 > > > > > 2 Linear |R| = 4.020995e-08 > > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > > 0 Linear |R| = 1.404678e-02 > > > > > 1 Linear |R| = 5.104757e-08 > > > > > 2 Linear |R| = 7.699637e-14 > > > > > 2 Nonlinear |R| = 5.106418e-08 > > > > > > > > > > > > > > > run 2: > > > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > > 0 Linear |R| = 9.447423e+03 > > > > > 1 Linear |R| = 1.013384e-02 > > > > > 2 Linear |R| = 4.020995e-08 > > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > > 0 Linear |R| = 1.404678e-02 > > > > > 1 Linear |R| = 5.109913e-08 > > > > > 2 Linear |R| = 7.189091e-14 > > > > > 2 Nonlinear |R| = 5.111591e-08 > > > > > > > > > > run 3: > > > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > > 0 Linear |R| = 9.447423e+03 > > > > > 1 Linear |R| = 1.013384e-02 > > > > > 2 Linear |R| = 4.020995e-08 > > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > > 0 Linear |R| = 1.404678e-02 > > > > > 1 Linear |R| = 5.104942e-08 > > > > > 2 Linear |R| = 7.465572e-14 > > > > > 2 Nonlinear |R| = 5.106642e-08 > > > > > > > > > > run 4: > > > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > > 0 Linear |R| = 9.447423e+03 > > > > > 1 Linear |R| = 1.013384e-02 > > > > > 2 Linear |R| = 4.020995e-08 > > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > > 0 Linear |R| = 1.404678e-02 > > > > > 1 Linear |R| = 5.102730e-08 > > > > > 2 Linear |R| = 7.132220e-14 > > > > > 2 Nonlinear |R| = 5.104442e-08 > > > > > > > > > > Solver details: > > > > > > > > > > SNES Object: 8 MPI processes > > > > > type: newtonls > > > > > maximum iterations=15, maximum function evaluations=10000 > > > > > tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 > > > > > total number of linear solver iterations=4 > > > > > total number of function evaluations=7 > > > > > norm schedule ALWAYS > > > > > SNESLineSearch Object: 8 MPI processes > > > > > type: basic > > > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > > > > maximum iterations=40 > > > > > KSP Object: 8 MPI processes > > > > > type: gmres > > > > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > > > > happy breakdown tolerance 1e-30 > > > > > maximum iterations=100, initial guess is zero > > > > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > > > > > right preconditioning > > > > > using UNPRECONDITIONED norm type for convergence test > > > > > PC Object: 8 MPI processes > > > > > type: lu > > > > > out-of-place factorization > > > > > tolerance for zero pivot 2.22045e-14 > > > > > matrix ordering: natural > > > > > factor fill ratio given 0., needed 0. > > > > > Factored matrix follows: > > > > > Mat Object: 8 MPI processes > > > > > type: superlu_dist > > > > > rows=7925, cols=7925 > > > > > package used to perform factorization: superlu_dist > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > total number of mallocs used during MatSetValues calls > =0 > > > > > SuperLU_DIST run parameters: > > > > > Process grid nprow 4 x npcol 2 > > > > > Equilibrate matrix TRUE > > > > > Matrix input mode 1 > > > > > Replace tiny pivots FALSE > > > > > Use iterative refinement TRUE > > > > > Processors in row 4 col partition 2 > > > > > Row permutation LargeDiag > > > > > Column permutation METIS_AT_PLUS_A > > > > > Parallel symbolic factorization FALSE > > > > > Repeated factorization SamePattern > > > > > linear system matrix followed by preconditioner matrix: > > > > > Mat Object: 8 MPI processes > > > > > type: mffd > > > > > rows=7925, cols=7925 > > > > > Matrix-free approximation: > > > > > err=1.49012e-08 (relative error in function evaluation) > > > > > Using wp compute h routine > > > > > Does not compute normU > > > > > Mat Object: () 8 MPI processes > > > > > type: mpiaij > > > > > rows=7925, cols=7925 > > > > > total: nonzeros=63587, allocated nonzeros=63865 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > not using I-node (on process 0) routines > > > > > > > > > > > > > > > Fande, > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Wed Nov 15 19:39:33 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Thu, 16 Nov 2017 14:39:33 +1300 Subject: [petsc-users] ISGlobalToLocalMappingApplyBlock In-Reply-To: <94d0e9c4-e218-4fce-c6c5-1896dc9eea00@auckland.ac.nz> References: <5b548602-6be9-a0e0-fe9b-789ed4289ea4@auckland.ac.nz> <69b9a876-cb48-51e1-42f8-af9a0b6c890f@auckland.ac.nz> <94d0e9c4-e218-4fce-c6c5-1896dc9eea00@auckland.ac.nz> Message-ID: <7304d690-40c5-ded9-8588-9aed3996c530@auckland.ac.nz> I've debugged into the ISGlobalToLocalMappingApplyBlock() function and it seems to me the bounds checking in there is not correct when the blocksize is > 1. It checks against the same bounds, scaled up by the blocksize, in both the block and non-block versions of the function. I think for the block version the bounds should not be scaled. I've just created a pull request (acroucher/fix-IS-global-to-local-mapping-block) with a suggested fix. - Adrian On 16/11/17 11:52, Adrian Croucher wrote: > I actually attached the wrong test program last time- I've attached > the right one here, which is much simpler. It test global indices 0, > 1, ... 9. > > If I run on 2 processes, the local indices it returns are: > > rank 0: 0, 1, 2, 3, 4, 0, 0, 0, -253701943, 0 > rank 1: -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 > > The results I expected are: > > rank 0: 0, 1, 2, 3, 4, -1, -1, -1, -1, -1 > rank 1: -1, -1, -1, -1, -1, 0, 1, 2, 3, 4 > > So the results for global indices 0, 1,... 4 are what I expected, on > both ranks. But the results for global indices 5, 6, ... 9 are not. > > I tried increasing the blocksize to 3 or 4, and the results were > exactly the same. > > It only gives the results I expected if I change the blocksize to 1. > > - Adrian > -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 From jed at jedbrown.org Wed Nov 15 21:40:56 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 15 Nov 2017 20:40:56 -0700 Subject: [petsc-users] Possible to recover ILU(k) from hypre/pilut? In-Reply-To: <022F4CE4-2E18-40C4-835B-2825F6D65F73@mcs.anl.gov> References: <022F4CE4-2E18-40C4-835B-2825F6D65F73@mcs.anl.gov> Message-ID: <871sky28mv.fsf@jedbrown.org> "Smith, Barry F." writes: >> On Nov 15, 2017, at 6:38 AM, Mark Lohry wrote: >> >> I've found ILU(0) or (1) to be working well for my problem, but the petsc implementation is serial only. Running with -pc_type hypre -pc_hypre_type pilut with default settings has considerably worse convergence. I've tried using -pc_hypre_pilut_factorrowsize (number of actual elements in row) to trick it into doing ILU(0), to no effect. >> >> Is there any way to recover classical ILU(k) from pilut? >> >> Hypre's docs state pilut is no longer supported, and Euclid should be used for anything moving forward. pc_hypre_boomeramg has options for Euclid smoothers. Any hope of a pc_hypre_type euclid? > > Not unless someone outside the PETSc team decides to put it back in. PETSc used to have a Euclid interface. My recollection is that Barry removed it because users were finding too many bugs in Euclid and upstream wasn't fixing them. A contributed revival of the interface won't fix the upstream problem. From bsmith at mcs.anl.gov Wed Nov 15 21:50:00 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 16 Nov 2017 03:50:00 +0000 Subject: [petsc-users] Possible to recover ILU(k) from hypre/pilut? In-Reply-To: <871sky28mv.fsf@jedbrown.org> References: <022F4CE4-2E18-40C4-835B-2825F6D65F73@mcs.anl.gov> <871sky28mv.fsf@jedbrown.org> Message-ID: > On Nov 15, 2017, at 9:40 PM, Jed Brown wrote: > > "Smith, Barry F." writes: > >>> On Nov 15, 2017, at 6:38 AM, Mark Lohry wrote: >>> >>> I've found ILU(0) or (1) to be working well for my problem, but the petsc implementation is serial only. Running with -pc_type hypre -pc_hypre_type pilut with default settings has considerably worse convergence. I've tried using -pc_hypre_pilut_factorrowsize (number of actual elements in row) to trick it into doing ILU(0), to no effect. >>> >>> Is there any way to recover classical ILU(k) from pilut? >>> >>> Hypre's docs state pilut is no longer supported, and Euclid should be used for anything moving forward. pc_hypre_boomeramg has options for Euclid smoothers. Any hope of a pc_hypre_type euclid? >> >> Not unless someone outside the PETSc team decides to put it back in. > > PETSc used to have a Euclid interface. My recollection is that Barry > removed it because users were finding too many bugs in Euclid and > upstream wasn't fixing them. A contributed revival of the interface > won't fix the upstream problem. The hypre team now claims they care about Euclid. But given the limitations of ILU in parallel I can't imagine anyone cares all that much. From mlohry at gmail.com Wed Nov 15 21:57:48 2017 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 15 Nov 2017 22:57:48 -0500 Subject: [petsc-users] Possible to recover ILU(k) from hypre/pilut? In-Reply-To: References: <022F4CE4-2E18-40C4-835B-2825F6D65F73@mcs.anl.gov> <871sky28mv.fsf@jedbrown.org> Message-ID: What are the limitations of ILU in parallel you're referring to? Does Schwarz+local ILU typically fare better? On Nov 15, 2017 10:50 PM, "Smith, Barry F." wrote: > > > > On Nov 15, 2017, at 9:40 PM, Jed Brown wrote: > > > > "Smith, Barry F." writes: > > > >>> On Nov 15, 2017, at 6:38 AM, Mark Lohry wrote: > >>> > >>> I've found ILU(0) or (1) to be working well for my problem, but the > petsc implementation is serial only. Running with -pc_type hypre > -pc_hypre_type pilut with default settings has considerably worse > convergence. I've tried using -pc_hypre_pilut_factorrowsize (number of > actual elements in row) to trick it into doing ILU(0), to no effect. > >>> > >>> Is there any way to recover classical ILU(k) from pilut? > >>> > >>> Hypre's docs state pilut is no longer supported, and Euclid should be > used for anything moving forward. pc_hypre_boomeramg has options for Euclid > smoothers. Any hope of a pc_hypre_type euclid? > >> > >> Not unless someone outside the PETSc team decides to put it back in. > > > > PETSc used to have a Euclid interface. My recollection is that Barry > > removed it because users were finding too many bugs in Euclid and > > upstream wasn't fixing them. A contributed revival of the interface > > won't fix the upstream problem. > > The hypre team now claims they care about Euclid. But given the > limitations of ILU in parallel I can't imagine anyone cares all that much. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Nov 15 22:01:15 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 16 Nov 2017 04:01:15 +0000 Subject: [petsc-users] Possible to recover ILU(k) from hypre/pilut? In-Reply-To: References: <022F4CE4-2E18-40C4-835B-2825F6D65F73@mcs.anl.gov> <871sky28mv.fsf@jedbrown.org> Message-ID: > On Nov 15, 2017, at 9:57 PM, Mark Lohry wrote: > > What are the limitations of ILU in parallel you're referring to? Does Schwarz+local ILU typically fare better? If ILU works fine for scalably in parallel that is great. Most of the PETSc team has an explicit bias against ILU generally speaking. Barry > > On Nov 15, 2017 10:50 PM, "Smith, Barry F." wrote: > > > > On Nov 15, 2017, at 9:40 PM, Jed Brown wrote: > > > > "Smith, Barry F." writes: > > > >>> On Nov 15, 2017, at 6:38 AM, Mark Lohry wrote: > >>> > >>> I've found ILU(0) or (1) to be working well for my problem, but the petsc implementation is serial only. Running with -pc_type hypre -pc_hypre_type pilut with default settings has considerably worse convergence. I've tried using -pc_hypre_pilut_factorrowsize (number of actual elements in row) to trick it into doing ILU(0), to no effect. > >>> > >>> Is there any way to recover classical ILU(k) from pilut? > >>> > >>> Hypre's docs state pilut is no longer supported, and Euclid should be used for anything moving forward. pc_hypre_boomeramg has options for Euclid smoothers. Any hope of a pc_hypre_type euclid? > >> > >> Not unless someone outside the PETSc team decides to put it back in. > > > > PETSc used to have a Euclid interface. My recollection is that Barry > > removed it because users were finding too many bugs in Euclid and > > upstream wasn't fixing them. A contributed revival of the interface > > won't fix the upstream problem. > > The hypre team now claims they care about Euclid. But given the limitations of ILU in parallel I can't imagine anyone cares all that much. > > From knepley at gmail.com Thu Nov 16 06:53:02 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 16 Nov 2017 07:53:02 -0500 Subject: [petsc-users] Possible to recover ILU(k) from hypre/pilut? In-Reply-To: References: <022F4CE4-2E18-40C4-835B-2825F6D65F73@mcs.anl.gov> <871sky28mv.fsf@jedbrown.org> Message-ID: On Wed, Nov 15, 2017 at 10:57 PM, Mark Lohry wrote: > What are the limitations of ILU in parallel you're referring to? Does > Schwarz+local ILU typically fare better? > Anecdotally, the sweet spot for ILU(k) k > 0 is extremely small. For smaller problems, sparse direct is so good its hard to win with ILU(k) since you do at least a few iterates. For larger problems, ILU(k) runs out of gas or memory fairly fast, and its better to find a method tailored to the problem. Matt > On Nov 15, 2017 10:50 PM, "Smith, Barry F." wrote: > >> >> >> > On Nov 15, 2017, at 9:40 PM, Jed Brown wrote: >> > >> > "Smith, Barry F." writes: >> > >> >>> On Nov 15, 2017, at 6:38 AM, Mark Lohry wrote: >> >>> >> >>> I've found ILU(0) or (1) to be working well for my problem, but the >> petsc implementation is serial only. Running with -pc_type hypre >> -pc_hypre_type pilut with default settings has considerably worse >> convergence. I've tried using -pc_hypre_pilut_factorrowsize (number of >> actual elements in row) to trick it into doing ILU(0), to no effect. >> >>> >> >>> Is there any way to recover classical ILU(k) from pilut? >> >>> >> >>> Hypre's docs state pilut is no longer supported, and Euclid should be >> used for anything moving forward. pc_hypre_boomeramg has options for Euclid >> smoothers. Any hope of a pc_hypre_type euclid? >> >> >> >> Not unless someone outside the PETSc team decides to put it back in. >> > >> > PETSc used to have a Euclid interface. My recollection is that Barry >> > removed it because users were finding too many bugs in Euclid and >> > upstream wasn't fixing them. A contributed revival of the interface >> > won't fix the upstream problem. >> >> The hypre team now claims they care about Euclid. But given the >> limitations of ILU in parallel I can't imagine anyone cares all that much. >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Thu Nov 16 07:53:45 2017 From: mlohry at gmail.com (Mark Lohry) Date: Thu, 16 Nov 2017 08:53:45 -0500 Subject: [petsc-users] Possible to recover ILU(k) from hypre/pilut? In-Reply-To: References: <022F4CE4-2E18-40C4-835B-2825F6D65F73@mcs.anl.gov> <871sky28mv.fsf@jedbrown.org> Message-ID: Good to know, thanks. Maybe plausible that ILU(0) works well here since the system is much denser than comparable low order methods? ILU(1) gave maybe 10% fewer gmres iterations than ILU(0), so a net loss there. My system is mostly equivalent to what's used with ILU(0) variants effectively in Persson & Peraire - "newton-gmres preconditioning for discontinuous galerkin discretizations of the navier-stokes equations". I'll have to run comparisons with the sparse direct solvers in the near future. On Thu, Nov 16, 2017 at 7:53 AM, Matthew Knepley wrote: > On Wed, Nov 15, 2017 at 10:57 PM, Mark Lohry wrote: > >> What are the limitations of ILU in parallel you're referring to? Does >> Schwarz+local ILU typically fare better? >> > > > Anecdotally, the sweet spot for ILU(k) k > 0 is extremely small. For > smaller problems, sparse direct is so good its > hard to win with ILU(k) since you do at least a few iterates. For larger > problems, ILU(k) runs out of gas or memory > fairly fast, and its better to find a method tailored to the problem. > > Matt > > >> On Nov 15, 2017 10:50 PM, "Smith, Barry F." wrote: >> >>> >>> >>> > On Nov 15, 2017, at 9:40 PM, Jed Brown wrote: >>> > >>> > "Smith, Barry F." writes: >>> > >>> >>> On Nov 15, 2017, at 6:38 AM, Mark Lohry wrote: >>> >>> >>> >>> I've found ILU(0) or (1) to be working well for my problem, but the >>> petsc implementation is serial only. Running with -pc_type hypre >>> -pc_hypre_type pilut with default settings has considerably worse >>> convergence. I've tried using -pc_hypre_pilut_factorrowsize (number of >>> actual elements in row) to trick it into doing ILU(0), to no effect. >>> >>> >>> >>> Is there any way to recover classical ILU(k) from pilut? >>> >>> >>> >>> Hypre's docs state pilut is no longer supported, and Euclid should >>> be used for anything moving forward. pc_hypre_boomeramg has options for >>> Euclid smoothers. Any hope of a pc_hypre_type euclid? >>> >> >>> >> Not unless someone outside the PETSc team decides to put it back in. >>> > >>> > PETSc used to have a Euclid interface. My recollection is that Barry >>> > removed it because users were finding too many bugs in Euclid and >>> > upstream wasn't fixing them. A contributed revival of the interface >>> > won't fix the upstream problem. >>> >>> The hypre team now claims they care about Euclid. But given the >>> limitations of ILU in parallel I can't imagine anyone cares all that much. >>> >>> >>> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Thu Nov 16 21:25:56 2017 From: zakaryah at gmail.com (zakaryah .) Date: Thu, 16 Nov 2017 22:25:56 -0500 Subject: [petsc-users] Auxiliary fields for multigrid Message-ID: I have equations which depend on some external data. I'm not sure about the approach for making this compatible with multigrid - i.e. making sure those external fields are properly refined/coarsened. Are there PETSc examples that do this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Fri Nov 17 06:47:54 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 17 Nov 2017 12:47:54 +0000 Subject: [petsc-users] Auxiliary fields for multigrid In-Reply-To: References: Message-ID: On Fri, 17 Nov 2017 at 04:26, zakaryah . wrote: > I have equations which depend on some external data. I'm not sure about > the approach for making this compatible with multigrid - i.e. making sure > those external fields are properly refined/coarsened. > Multi grid does not strictly require you to restrict your external data. For example (I) algebraic mg (PCGAMG) only requires only the fine level operator: (I) you can define coarse level operators with geometric mg using Galerkin projection. The latter only requires one to specify how to interpolate your DM fields from fine to coarse (which PLEX and DA provide). Why not try these out for your problem first? Are there PETSc examples that do this? > Not that I'm aware off. Most use Galerkin or rediscretize the operator on the coarser levels. Thanks, Dave > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Nov 17 06:54:14 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Nov 2017 07:54:14 -0500 Subject: [petsc-users] Auxiliary fields for multigrid In-Reply-To: References: Message-ID: On Thu, Nov 16, 2017 at 10:25 PM, zakaryah . wrote: > I have equations which depend on some external data. I'm not sure about > the approach for making this compatible with multigrid - i.e. making sure > those external fields are properly refined/coarsened. Are there PETSc > examples that do this? > There are no examples, but there are hooks (DMRefineHookAdd, DMCoarsenHookAdd) for this purpose. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Nov 17 08:25:38 2017 From: jed at jedbrown.org (Jed Brown) Date: Fri, 17 Nov 2017 07:25:38 -0700 Subject: [petsc-users] Auxiliary fields for multigrid In-Reply-To: References: Message-ID: <87bmk1yobh.fsf@jedbrown.org> Dave May writes: > Are there PETSc examples that do this? >> > > Not that I'm aware off. Most use Galerkin or rediscretize the operator on > the coarser levels. src/snes/examples/tutorials/ex48.c From jed at jedbrown.org Sat Nov 18 17:51:59 2017 From: jed at jedbrown.org (Jed Brown) Date: Sat, 18 Nov 2017 16:51:59 -0700 Subject: [petsc-users] questions about vectorization In-Reply-To: References: Message-ID: <871skvxi00.fsf@jedbrown.org> Richard Tran Mills writes: > On Tue, Nov 14, 2017 at 12:13 PM, Zhang, Hong wrote: > >> >> >> On Nov 13, 2017, at 10:49 PM, Xiangdong wrote: >> >> 1) How about the vectorization of BAIJ format? >> >> >> BAIJ kernels are optimized with manual unrolling, but not with AVX >> intrinsics. So the vectorization relies on the compiler's ability. >> It may or may not get vectorized depending on the compiler's optimization >> decisions. But vectorization is not essential for the performance of most >> BAIJ kernels. >> > > I know that this has come up in previous discussions, but I'm guessing that > the manual unrolling actually impedes the ability of many modern compilers > to optimize the BAIJ calculations. I suppose we ought to have a switch to > enable or disable the use of the unrolled versions? (And, further down the > road, some sort of performance model to tell us what the setting for the > switch should be...) I added a crude test for BAIJ(4), see branch 'jed/matbaij-loop'. Clang-5.0 is a bit better than gcc-7.2 for this problem. GCC produces comparable code and performance with both versions, but Clang produces tighter code (see below) for the current (fully unrolled) code, but it actually executes slower than the loop code. Testing as below, which produces a matrix with 284160 nonzeros (2.4 MB matrix, fits in my L3 cache). I use BCGS instead of GMRES so that the solve can be resident in cache. $ mpich-clang-opt/tests/src/snes/examples/tutorials/ex19 -da_grid_x 60 -da_grid_y 60 -prandtl 1e4 -ksp_type bcgs -dm_mat_type baij -pc_type none -mat_baij_loop 0 -log_view |grep MatMult MatMult 16269 1.0 1.8919e+00 1.0 9.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00 78 77 0 0 0 78 77 0 0 0 4763 clang MatMult_SeqBAIJ_4 0.73 ?2f0: movsxd rdi,DWORD PTR [rbp+0x0] 2.44 ? add rbp,0x4 0.24 ? shl rdi,0x5 0.98 ? vbroad ymm1,QWORD PTR [rax+rdi*1] 0.73 ? vbroad ymm2,QWORD PTR [rax+rdi*1+0x8] 2.93 ? vbroad ymm3,QWORD PTR [rax+rdi*1+0x10] 0.98 ? vbroad ymm4,QWORD PTR [rax+rdi*1+0x18] 2.44 ? vfmadd ymm1,ymm0,YMMWORD PTR [rsi] 23.47 ? vfmadd ymm1,ymm2,YMMWORD PTR [rsi+0x20] 8.31 ? vfmadd ymm1,ymm3,YMMWORD PTR [rsi+0x40] 0.98 ? vmovap ymm0,ymm1 26.89 ? vfmadd ymm0,ymm4,YMMWORD PTR [rsi+0x60] 0.49 ? sub rsi,0xffffffffffffff80 ? add edx,0xffffffff 0.24 ? ? jne 2f0 $ mpich-clang-opt/tests/src/snes/examples/tutorials/ex19 -da_grid_x 60 -da_grid_y 60 -prandtl 1e4 -ksp_type bcgs -dm_mat_type baij -pc_type none -mat_baij_loop 1 -log_view |grep MatMult MatMult 16269 1.0 1.6305e+00 1.0 9.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00 73 77 0 0 0 73 77 0 0 0 5527 1.86 ?130: cdqe ? vmovup ymm2,YMMWORD PTR [rbx+rax*8] 14.60 ? vmovup ymm3,YMMWORD PTR [rbx+rax*8+0x20] 1.24 ? vmovup ymm4,YMMWORD PTR [rbx+rax*8+0x40] 16.77 ? vmovup ymm5,YMMWORD PTR [rbx+rax*8+0x60] 2.17 ? vmovap YMMWORD PTR [rsp+0xc0],ymm5 0.93 ? vmovap YMMWORD PTR [rsp+0xa0],ymm4 0.62 ? vmovap YMMWORD PTR [rsp+0x80],ymm3 1.86 ? vmovap YMMWORD PTR [rsp+0x60],ymm2 0.93 ? mov esi,DWORD PTR [r13+rdi*4+0x0] 0.62 ? shl esi,0x2 0.62 ? movsxd rsi,esi 1.55 ? vbroad ymm2,QWORD PTR [rcx+rsi*8] 2.17 ? vfmadd ymm2,ymm1,YMMWORD PTR [rsp+0x60] 1.24 ? vbroad ymm1,QWORD PTR [rcx+rsi*8+0x8] 10.56 ? vfmadd ymm1,ymm2,YMMWORD PTR [rsp+0x80] 0.62 ? vbroad ymm2,QWORD PTR [rcx+rsi*8+0x10] 13.35 ? vfmadd ymm2,ymm1,YMMWORD PTR [rsp+0xa0] 1.86 ? vbroad ymm1,QWORD PTR [rcx+rsi*8+0x18] 15.53 ? vfmadd ymm1,ymm2,YMMWORD PTR [rsp+0xc0] ? add rdi,0x1 ? add eax,0x10 ? cmp rdi,rdx ? ? jl 130 The code with loops is faster with GCC as well, but the assembly is not as clean in either case. I don't have time to do more comprehensive testing at the moment, but it would be really useful to test with other block sizes, especially 3 (elasticity) and 5 (compressible flow) and with other compilers (especially Intel). If the performance advantage of loops holds, we can eliminate tons of code from PETSc by judicious use of inline functions. From zakaryah at gmail.com Sat Nov 18 23:40:29 2017 From: zakaryah at gmail.com (zakaryah .) Date: Sun, 19 Nov 2017 00:40:29 -0500 Subject: [petsc-users] Auxiliary fields for multigrid In-Reply-To: References: Message-ID: Ok, thanks for the suggestions. ?I'd like to check that I understand the approach. I create one DMDA for the variables to be solved by the SNES. That DMDA will be coarsened and refined by the options I pass to e.g. FAS. In my user context struct, I can have a second DMDA for the auxiliary fields. At each coarsening or refinement of the solution variable DMDA, I add a hook to simultaneously coarsen or refine the auxiliary fields. I suppose I can store the results of those operations in a third DMDA in the user context struct, because I'll need to access them from the SNES Function and Jacobian routines. Does that sound right?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From cpraveen at gmail.com Sun Nov 19 03:16:59 2017 From: cpraveen at gmail.com (Praveen C) Date: Sun, 19 Nov 2017 14:46:59 +0530 Subject: [petsc-users] Compilation error with ifort: invalid return statement in main Message-ID: <2D9D0776-DA57-422E-81D0-BA85D76361EF@gmail.com> Dear all In the main program, if I have a line like call PetscInitialize('param.in', ierr); CHKERRQ(ierr) compiling with mpifort gives main.f95(17): error #6353: A RETURN statement is invalid in the main program. call PetscInitialize('param.in', ierr); if (ierr .ne. 0) then ; call PetscErrorF(ierr); return; endif ?????????????????????????????????????????????^ It compiles fine with gfortran. What can I do for the ifort case ? Thanks praveen From jroman at dsic.upv.es Sun Nov 19 03:43:46 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sun, 19 Nov 2017 10:43:46 +0100 Subject: [petsc-users] Compilation error with ifort: invalid return statement in main In-Reply-To: <2D9D0776-DA57-422E-81D0-BA85D76361EF@gmail.com> References: <2D9D0776-DA57-422E-81D0-BA85D76361EF@gmail.com> Message-ID: <2B209A10-924E-4EDE-94A9-F7D56A82F241@dsic.upv.es> > El 19 nov 2017, a las 10:16, Praveen C escribi?: > > Dear all > > In the main program, if I have a line like > > call PetscInitialize('param.in', ierr); CHKERRQ(ierr) > > compiling with mpifort gives > > main.f95(17): error #6353: A RETURN statement is invalid in the main program. > call PetscInitialize('param.in', ierr); if (ierr .ne. 0) then ; call PetscErrorF(ierr); return; endif > ?????????????????????????????????????????????^ > > It compiles fine with gfortran. What can I do for the ifort case ? > > Thanks > praveen Use CHKERRA(ierr) in the main program and CHKERRQ(ierr) in the subroutines. On the other hand, you should not use CHKERRA(ierr) after PetscInitialize because it assumes that MPI has been correctly initialized. Jose From knepley at gmail.com Sun Nov 19 06:04:39 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 19 Nov 2017 07:04:39 -0500 Subject: [petsc-users] Auxiliary fields for multigrid In-Reply-To: References: Message-ID: On Sun, Nov 19, 2017 at 12:40 AM, zakaryah . wrote: > Ok, thanks for the suggestions. > ?I'd like to check that I understand the approach. > > I create one DMDA for the variables to be solved by the SNES. That DMDA > will be coarsened and refined by the options I pass to e.g. FAS. In my > user context struct, I can have a second DMDA for the auxiliary fields. At > each coarsening or refinement of the solution variable DMDA, I add a hook > to simultaneously coarsen or refine the auxiliary fields. > That sounds right. I do not understand needing a 3rd DM. Thanks, Matt > I suppose I can store the results of those operations in a third DMDA in > the user context struct, because I'll need to access them from the SNES > Function and Jacobian routines. Does that sound right?? > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Nov 19 06:08:57 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 19 Nov 2017 07:08:57 -0500 Subject: [petsc-users] Compilation error with ifort: invalid return statement in main In-Reply-To: <2B209A10-924E-4EDE-94A9-F7D56A82F241@dsic.upv.es> References: <2D9D0776-DA57-422E-81D0-BA85D76361EF@gmail.com> <2B209A10-924E-4EDE-94A9-F7D56A82F241@dsic.upv.es> Message-ID: On Sun, Nov 19, 2017 at 4:43 AM, Jose E. Roman wrote: > > > El 19 nov 2017, a las 10:16, Praveen C escribi?: > > > > Dear all > > > > In the main program, if I have a line like > > > > call PetscInitialize('param.in', ierr); CHKERRQ(ierr) > > > > compiling with mpifort gives > > > > main.f95(17): error #6353: A RETURN statement is invalid in the main > program. > > call PetscInitialize('param.in', ierr); if (ierr .ne. 0) then ; call > PetscErrorF(ierr); return; endif > > ?????????????????????????????????????????????^ > > > > It compiles fine with gfortran. What can I do for the ifort case ? > > > > Thanks > > praveen > > Use CHKERRA(ierr) in the main program and CHKERRQ(ierr) in the subroutines. > > On the other hand, you should not use CHKERRA(ierr) after PetscInitialize Do you mean after PetscFinalize()? You should not call any PETSc functions after this. Thanks, Matt > because it assumes that MPI has been correctly initialized. > > Jose > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Sun Nov 19 06:12:41 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sun, 19 Nov 2017 13:12:41 +0100 Subject: [petsc-users] Compilation error with ifort: invalid return statement in main In-Reply-To: References: <2D9D0776-DA57-422E-81D0-BA85D76361EF@gmail.com> <2B209A10-924E-4EDE-94A9-F7D56A82F241@dsic.upv.es> Message-ID: <44614B17-2A5F-4B7A-9006-631E003CF923@dsic.upv.es> > El 19 nov 2017, a las 13:08, Matthew Knepley escribi?: > > On Sun, Nov 19, 2017 at 4:43 AM, Jose E. Roman wrote: > > > El 19 nov 2017, a las 10:16, Praveen C escribi?: > > > > Dear all > > > > In the main program, if I have a line like > > > > call PetscInitialize('param.in', ierr); CHKERRQ(ierr) > > > > compiling with mpifort gives > > > > main.f95(17): error #6353: A RETURN statement is invalid in the main program. > > call PetscInitialize('param.in', ierr); if (ierr .ne. 0) then ; call PetscErrorF(ierr); return; endif > > ?????????????????????????????????????????????^ > > > > It compiles fine with gfortran. What can I do for the ifort case ? > > > > Thanks > > praveen > > Use CHKERRA(ierr) in the main program and CHKERRQ(ierr) in the subroutines. > > On the other hand, you should not use CHKERRA(ierr) after PetscInitialize > > Do you mean after PetscFinalize()? You should not call any PETSc functions after this. I mean to check the error code returned by PetscInitialize(). In C examples it is done this way ierr = PetscInitialize(&argc,&args,(char*)0,help);if (ierr) return ierr; instead of with CHKERRQ. > > Thanks, > > Matt > > because it assumes that MPI has been correctly initialized. > > Jose > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From jed at jedbrown.org Sun Nov 19 08:20:14 2017 From: jed at jedbrown.org (Jed Brown) Date: Sun, 19 Nov 2017 07:20:14 -0700 Subject: [petsc-users] Auxiliary fields for multigrid In-Reply-To: References: Message-ID: <87tvxqwdsx.fsf@jedbrown.org> "zakaryah ." writes: > Ok, thanks for the suggestions. > ?I'd like to check that I understand the approach. > > I create one DMDA for the variables to be solved by the SNES. That DMDA > will be coarsened and refined by the options I pass to e.g. FAS. In my > user context struct, I can have a second DMDA for the auxiliary fields. At > each coarsening or refinement of the solution variable DMDA, I add a hook > to simultaneously coarsen or refine the auxiliary fields. I suppose I can > store the results of those operations in a third DMDA in the user context > struct, because I'll need to access them from the SNES Function and > Jacobian routines. Does that sound right?? Basically, please look at the example. From bsmith at mcs.anl.gov Sun Nov 19 09:48:55 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sun, 19 Nov 2017 15:48:55 +0000 Subject: [petsc-users] Compilation error with ifort: invalid return statement in main In-Reply-To: <44614B17-2A5F-4B7A-9006-631E003CF923@dsic.upv.es> References: <2D9D0776-DA57-422E-81D0-BA85D76361EF@gmail.com> <2B209A10-924E-4EDE-94A9-F7D56A82F241@dsic.upv.es> <44614B17-2A5F-4B7A-9006-631E003CF923@dsic.upv.es> Message-ID: In some of the Fortran examples we have code like call PetscInitialize(PETSC_NULL_CHARACTER,ierr) if (ierr .ne. 0) then print*,'Unable to initialize PETSc' stop endif Eventually we should update all the examples with this, or something even better. Barry > On Nov 19, 2017, at 6:12 AM, Jose E. Roman wrote: > > >> El 19 nov 2017, a las 13:08, Matthew Knepley escribi?: >> >> On Sun, Nov 19, 2017 at 4:43 AM, Jose E. Roman wrote: >> >>> El 19 nov 2017, a las 10:16, Praveen C escribi?: >>> >>> Dear all >>> >>> In the main program, if I have a line like >>> >>> call PetscInitialize('param.in', ierr); CHKERRQ(ierr) >>> >>> compiling with mpifort gives >>> >>> main.f95(17): error #6353: A RETURN statement is invalid in the main program. >>> call PetscInitialize('param.in', ierr); if (ierr .ne. 0) then ; call PetscErrorF(ierr); return; endif >>> ?????????????????????????????????????????????^ >>> >>> It compiles fine with gfortran. What can I do for the ifort case ? >>> >>> Thanks >>> praveen >> >> Use CHKERRA(ierr) in the main program and CHKERRQ(ierr) in the subroutines. >> >> On the other hand, you should not use CHKERRA(ierr) after PetscInitialize >> >> Do you mean after PetscFinalize()? You should not call any PETSc functions after this. > > I mean to check the error code returned by PetscInitialize(). > In C examples it is done this way > ierr = PetscInitialize(&argc,&args,(char*)0,help);if (ierr) return ierr; > instead of with CHKERRQ. > >> >> Thanks, >> >> Matt >> >> because it assumes that MPI has been correctly initialized. >> >> Jose >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > From cpraveen at gmail.com Sun Nov 19 10:47:40 2017 From: cpraveen at gmail.com (Praveen C) Date: Sun, 19 Nov 2017 22:17:40 +0530 Subject: [petsc-users] Compilation error with ifort: invalid return statement in main In-Reply-To: References: <2D9D0776-DA57-422E-81D0-BA85D76361EF@gmail.com> <2B209A10-924E-4EDE-94A9-F7D56A82F241@dsic.upv.es> <44614B17-2A5F-4B7A-9006-631E003CF923@dsic.upv.es> Message-ID: <275B5095-8B79-45DB-A35C-7A0776A6D664@gmail.com> Thank you for this solution. For now, I am using this call PetscInitialize(PETSC_NULL_CHARACTER,ierr) if (ierr .ne. 0) stop 'Unable to initialize PETSc' and CHKERRA in other places of main. Best praveen > On 19-Nov-2017, at 9:18 PM, Smith, Barry F. wrote: > > In some of the Fortran examples we have code like > > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > if (ierr .ne. 0) then > print*,'Unable to initialize PETSc' > stop > endif > > Eventually we should update all the examples with this, or something even better. > > Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From yann.jobic at univ-amu.fr Sun Nov 19 15:59:00 2017 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Sun, 19 Nov 2017 22:59:00 +0100 Subject: [petsc-users] DMPlexCreateFromDAG and orientation Message-ID: <31e57015-438f-bcc4-2f4e-adc5559006c2@univ-amu.fr> Hello, I want to create my custom DMPLEX by using DMPlexCreateFromDAG(). I tried with only one cell. I have : ? type: plex Mesh 'DM_0x3dd7df0_0': orientation is missing cap --> base: [0] Max sizes cone: 8 support: 1 [0]: 1 ----> 0 [0]: 2 ----> 0 [0]: 3 ----> 0 [0]: 4 ----> 0 [0]: 5 ----> 0 [0]: 6 ----> 0 [0]: 7 ----> 0 [0]: 8 ----> 0 base <-- cap: [0]: 0 <---- 1 (0) [0]: 0 <---- 5 (0) [0]: 0 <---- 7 (0) [0]: 0 <---- 3 (0) [0]: 0 <---- 2 (0) [0]: 0 <---- 6 (0) [0]: 0 <---- 8 (0) [0]: 0 <---- 4 (0) coordinates with 1 fields ? field 0 with 3 components Process 0: ? (?? 1) dim? 3 offset?? 0 0. 0. 0. ? (?? 2) dim? 3 offset?? 3 0. 0. 0.01 ? (?? 3) dim? 3 offset?? 6 0. 0.01 0. ? (?? 4) dim? 3 offset?? 9 0. 0.01 0.01 ? (?? 5) dim? 3 offset? 12 0.01 0. 0. ? (?? 6) dim? 3 offset? 15 0.01 0. 0.01 ? (?? 7) dim? 3 offset? 18 0.01 0.01 0. ? (?? 8) dim? 3 offset? 21 0.01 0.01 0.01 Which is not correct when i'm trying to see the DM in visit. I tried one from the example ex4.c, which gives : base <-- cap: [0]: 0 <---- 1 (0) [0]: 0 <---- 2 (0) [0]: 0 <---- 3 (0) [0]: 0 <---- 4 (0) [0]: 0 <---- 5 (0) [0]: 0 <---- 6 (0) [0]: 0 <---- 7 (0) [0]: 0 <---- 8 (0) coordinates with 1 fields ? field 0 with 3 components Process 0: ? (?? 1) dim? 3 offset?? 0 -1. -1. -1. ? (?? 2) dim? 3 offset?? 3 -1. 1. -1. ? (?? 3) dim? 3 offset?? 6 1. 1. -1. ? (?? 4) dim? 3 offset?? 9 1. -1. -1. ? (?? 5) dim? 3 offset? 12 -1. -1. 1. ? (?? 6) dim? 3 offset? 15 1. -1. 1. ? (?? 7) dim? 3 offset? 18 1. 1. 1. ? (?? 8) dim? 3 offset? 21 -1. 1. 1. And this one is perfect. I may have a problem with the "cone orientation", but i really don't understand how to set it. I tried the use DMPlexOrient(), but i still have the problem. However, my connectivity looks good. What i am doing wrong ? This is even more complicated when i have two cells sharing a face. Thanks for the? help ! Regards, Yann From knepley at gmail.com Mon Nov 20 07:47:02 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Nov 2017 08:47:02 -0500 Subject: [petsc-users] DMPlexCreateFromDAG and orientation In-Reply-To: <31e57015-438f-bcc4-2f4e-adc5559006c2@univ-amu.fr> References: <31e57015-438f-bcc4-2f4e-adc5559006c2@univ-amu.fr> Message-ID: On Sun, Nov 19, 2017 at 4:59 PM, Yann Jobic wrote: > Hello, > > I want to create my custom DMPLEX by using DMPlexCreateFromDAG(). I tried > with only one cell. I have : > It looks like a hex cell. Here is how I order hex cells: https://bitbucket.org/petsc/petsc/src/d89bd21cf2b5366df29efb6006298d2bc22fb509/src/dm/impls/plex/plexinterpolate.c?at=master&fileviewer=file-view-default#plexinterpolate.c-113 The first four vertices are on the bottom, last four on the top. Those faces start with vertices right above each other, and the faces have outward orientation. You can check your structure by looking at the faces specified above, and see if it matches yours, with outward orientation for all. Eventually me, or someone, will draw pictures of these for the manual. Thanks, Matt > type: plex > Mesh 'DM_0x3dd7df0_0': > orientation is missing > cap --> base: > [0] Max sizes cone: 8 support: 1 > [0]: 1 ----> 0 > [0]: 2 ----> 0 > [0]: 3 ----> 0 > [0]: 4 ----> 0 > [0]: 5 ----> 0 > [0]: 6 ----> 0 > [0]: 7 ----> 0 > [0]: 8 ----> 0 > base <-- cap: > [0]: 0 <---- 1 (0) > [0]: 0 <---- 5 (0) > [0]: 0 <---- 7 (0) > [0]: 0 <---- 3 (0) > [0]: 0 <---- 2 (0) > [0]: 0 <---- 6 (0) > [0]: 0 <---- 8 (0) > [0]: 0 <---- 4 (0) > coordinates with 1 fields > field 0 with 3 components > Process 0: > ( 1) dim 3 offset 0 0. 0. 0. > ( 2) dim 3 offset 3 0. 0. 0.01 > ( 3) dim 3 offset 6 0. 0.01 0. > ( 4) dim 3 offset 9 0. 0.01 0.01 > ( 5) dim 3 offset 12 0.01 0. 0. > ( 6) dim 3 offset 15 0.01 0. 0.01 > ( 7) dim 3 offset 18 0.01 0.01 0. > ( 8) dim 3 offset 21 0.01 0.01 0.01 > > > Which is not correct when i'm trying to see the DM in visit. I tried one > from the example ex4.c, which gives : > > base <-- cap: > [0]: 0 <---- 1 (0) > [0]: 0 <---- 2 (0) > [0]: 0 <---- 3 (0) > [0]: 0 <---- 4 (0) > [0]: 0 <---- 5 (0) > [0]: 0 <---- 6 (0) > [0]: 0 <---- 7 (0) > [0]: 0 <---- 8 (0) > coordinates with 1 fields > field 0 with 3 components > Process 0: > ( 1) dim 3 offset 0 -1. -1. -1. > ( 2) dim 3 offset 3 -1. 1. -1. > ( 3) dim 3 offset 6 1. 1. -1. > ( 4) dim 3 offset 9 1. -1. -1. > ( 5) dim 3 offset 12 -1. -1. 1. > ( 6) dim 3 offset 15 1. -1. 1. > ( 7) dim 3 offset 18 1. 1. 1. > ( 8) dim 3 offset 21 -1. 1. 1. > > And this one is perfect. > > I may have a problem with the "cone orientation", but i really don't > understand how to set it. I tried the use DMPlexOrient(), but i still have > the problem. > > However, my connectivity looks good. What i am doing wrong ? This is even > more complicated when i have two cells sharing a face. > > Thanks for the help ! > > Regards, > > Yann > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlmackie862 at gmail.com Mon Nov 20 10:10:24 2017 From: rlmackie862 at gmail.com (Randall Mackie) Date: Mon, 20 Nov 2017 08:10:24 -0800 Subject: [petsc-users] Intel MKL Message-ID: <417EEDC7-133E-4C4B-AFF5-9EB7934C9B0D@gmail.com> Dear PETSc team: On upgrading to version 3.8, we have discovered an inconsistency in the python configuration scripts for using Intel MKL for BLAS/LAPACK. It seems that these options were changed between 3.7 and 3.8: Version 3.8: --with-blaslapack-lib=libsunperf.a --with-blas-lib=libblas.a --with-lapack-lib=liblapack.a --with-blaslapack-dir=/soft/com/packages/intel/13/079/mkl Version 3.7: --with-blas-lapack-lib=libsunperf.a --with-blas-lib=libblas.a --with-lapack-lib=liblapack.a --with-blas-lapack-dir=/soft/com/packages/intel/13/079/mkl So a hyphen was inserted with the lib and dir options. However, we found that at least in the mkl_pardiso.py script it still looks for a hyphen, but in other scripts we checked it doesn?t (although we did not do an exhaustive search). We know that you are not connected to Intel, but maybe someone from Intel reads these messages, as their web pages that explain how to install Petsc are wrong and conform to the pre-3.8 options: https://software.intel.com/en-us/articles/enabling-intel-mkl-in-petsc-applications https://software.intel.com/en-us/articles/mkl-blas-lapack-with-petsc Thanks, Randy Mackie -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuentesdt at gmail.com Mon Nov 20 10:46:25 2017 From: fuentesdt at gmail.com (David Fuentes) Date: Mon, 20 Nov 2017 10:46:25 -0600 Subject: [petsc-users] PetscFEIntegrateBdResidual_Basic Message-ID: *Is there a way to pass the boundary set id to the function pointers for the residual evaluation on the boundary ?* *https://bitbucket.org/petsc/petsc/src/d89bd21cf2b5366df29efb6006298d2bc22fb509/src/dm/dt/interface/dtfe.c?at=master&fileviewer=file-view-default#dtfe.c-4245 * *I want to pass the boundary condition/constraint ID (ids): PetscErrorCode PetscDSAddBoundary(PetscDS ds, DMBoundaryConditionType type, const char name[], const char labelname[], PetscInt field, PetscInt numcomps, const PetscInt *comps, void (*bcFunc)(void), PetscInt numids, const PetscInt *ids, void *ctx)* *to the functions for the residual evaluation on the boundary.* *For example, I have two side sets in an exodus file. I want to implement Neumann boundary conditions on side set = 2 and Mixed/Cauchy BC on side set = 3. Or similarly use different* *gmsh BC tags for Neumann/Mixed BC.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Nov 20 11:02:02 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Nov 2017 12:02:02 -0500 Subject: [petsc-users] PetscFEIntegrateBdResidual_Basic In-Reply-To: References: Message-ID: On Mon, Nov 20, 2017 at 11:46 AM, David Fuentes wrote: > *Is there a way to pass the boundary set id to the function pointers for > the residual evaluation on the boundary ?* > > *https://bitbucket.org/petsc/petsc/src/d89bd21cf2b5366df29efb6006298d2bc22fb509/src/dm/dt/interface/dtfe.c?at=master&fileviewer=file-view-default#dtfe.c-4245 > * > > *I want to pass the boundary condition/constraint ID (ids): > PetscErrorCode PetscDSAddBoundary(PetscDS ds, DMBoundaryConditionType type, > const char name[], const char labelname[], PetscInt field, PetscInt > numcomps, const PetscInt *comps, void (*bcFunc)(void), PetscInt numids, > const PetscInt *ids, void *ctx)* > > *to the functions for the residual evaluation on the boundary.* > > > *For example, I have two side sets in an exodus file. I want to implement > Neumann boundary conditions on side set = 2 and Mixed/Cauchy BC on side set > = 3. Or similarly use different* > > *gmsh BC tags for Neumann/Mixed BC.* > I am not completely against this, but let me respond with my rationale first. What I thought you would do, is call AddBoundary() twice. Once with the Neumann function and value 2, and once with the Cauchy function and value 3. Does that not work in your situation? Also, I am refectoring this right now because a DS object can only take a single boundary integral point function (which is a pain for inhomogeneous Neumann), so I welcome input. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Nov 20 11:15:53 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 20 Nov 2017 11:15:53 -0600 Subject: [petsc-users] Intel MKL In-Reply-To: <417EEDC7-133E-4C4B-AFF5-9EB7934C9B0D@gmail.com> References: <417EEDC7-133E-4C4B-AFF5-9EB7934C9B0D@gmail.com> Message-ID: On Mon, 20 Nov 2017, Randall Mackie wrote: > Dear PETSc team: > > On upgrading to version 3.8, we have discovered an inconsistency in the python configuration scripts for using Intel MKL for BLAS/LAPACK. > It seems that these options were changed between 3.7 and 3.8: > > > Version 3.8: > --with-blaslapack-lib=libsunperf.a > --with-blas-lib=libblas.a --with-lapack-lib=liblapack.a > --with-blaslapack-dir=/soft/com/packages/intel/13/079/mkl > > Version 3.7: > --with-blas-lapack-lib=libsunperf.a > --with-blas-lib=libblas.a --with-lapack-lib=liblapack.a > --with-blas-lapack-dir=/soft/com/packages/intel/13/079/mkl > > So a hyphen was inserted with the lib and dir options. Actually a hyphen was removed in 3.8 > > However, we found that at least in the mkl_pardiso.py script it still looks for a hyphen, but in other scripts we checked it doesn?t (although we did not do an exhaustive search). Hm - 'blas-lapack' is listed in an error message. Pushed a fix to 'maint' now. Satish > > We know that you are not connected to Intel, but maybe someone from Intel reads these messages, as their web pages that explain how to install Petsc are wrong and conform to the pre-3.8 options: > > https://software.intel.com/en-us/articles/enabling-intel-mkl-in-petsc-applications > > https://software.intel.com/en-us/articles/mkl-blas-lapack-with-petsc > > > Thanks, Randy Mackie > > From fuentesdt at gmail.com Mon Nov 20 11:20:24 2017 From: fuentesdt at gmail.com (David Fuentes) Date: Mon, 20 Nov 2017 11:20:24 -0600 Subject: [petsc-users] PetscFEIntegrateBdResidual_Basic In-Reply-To: References: Message-ID: Thanks for the quick reply. Indeed it does work like this. I have added a location dependence on the boundary to differentiate the two. However, when my mesh moves then the BC will be applied incorrectly. *// PetscFEIntegrateBdResidual_Basic DMPlexComputeBdResidual_Internal* *static* *void** f0_bd_u(PetscInt dim, PetscInt Nf, PetscInt NfAux,* *const** PetscInt uOff[], **const** PetscInt uOff_x[], * *const** PetscScalar u[], **const** PetscScalar u_t[], **const** PetscScalar u_x[],* *const** PetscInt aOff[], **const** PetscInt aOff_x[], * *const** PetscScalar a[], **const** PetscScalar a_t[], **const** PetscScalar a_x[],* * PetscReal t, **const** PetscReal x[], **const** PetscReal n[], PetscInt numConstants, **const** PetscScalar constants[], PetscScalar f0[])* *{ * * PetscInt d;* *double** radius = **0.0**;* *const* *double** zthresh = (**3.** - **4.5**/**2.**)***.01**; **// [m]* *for** (d = **0**; d < **2**; ++d) radius += x[d]*x[d];* * radius = sqrt(radius); * *if** ( radius > **.001**/**2.** ) * * {f0[**0**] = constants[**2**] * u[**0**];}* *else* * {* * f0[**0**] = -constants[**3**] ; * * }* *} * On Mon, Nov 20, 2017 at 11:02 AM, Matthew Knepley wrote: > On Mon, Nov 20, 2017 at 11:46 AM, David Fuentes > wrote: > >> *Is there a way to pass the boundary set id to the function pointers for >> the residual evaluation on the boundary ?* >> >> *https://bitbucket.org/petsc/petsc/src/d89bd21cf2b5366df29efb6006298d2bc22fb509/src/dm/dt/interface/dtfe.c?at=master&fileviewer=file-view-default#dtfe.c-4245 >> * >> >> *I want to pass the boundary condition/constraint ID (ids): >> PetscErrorCode PetscDSAddBoundary(PetscDS ds, DMBoundaryConditionType type, >> const char name[], const char labelname[], PetscInt field, PetscInt >> numcomps, const PetscInt *comps, void (*bcFunc)(void), PetscInt numids, >> const PetscInt *ids, void *ctx)* >> >> *to the functions for the residual evaluation on the boundary.* >> >> >> *For example, I have two side sets in an exodus file. I want to implement >> Neumann boundary conditions on side set = 2 and Mixed/Cauchy BC on side set >> = 3. Or similarly use different* >> >> *gmsh BC tags for Neumann/Mixed BC.* >> > I am not completely against this, but let me respond with my rationale > first. What I thought you would do, is call AddBoundary() twice. Once with > the > Neumann function and value 2, and once with the Cauchy function and value > 3. Does that not work in your situation? > > Also, I am refectoring this right now because a DS object can only take a > single boundary integral point function (which is a pain for inhomogeneous > Neumann), > so I welcome input. > > Thanks, > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Nov 20 11:23:12 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Nov 2017 12:23:12 -0500 Subject: [petsc-users] PetscFEIntegrateBdResidual_Basic In-Reply-To: References: Message-ID: On Mon, Nov 20, 2017 at 12:20 PM, David Fuentes wrote: > Thanks for the quick reply. Indeed it does work like this. I have added a > location dependence on the boundary to differentiate the two. However, when > my mesh moves then the BC will be applied incorrectly. > I don't think you should have to do that. You can call AddBoundary twice. Once with a function with the first branch and marker value 3, and the next with the second branch and marker value 2. Does that make sense? Thanks, Matt > *// PetscFEIntegrateBdResidual_Basic DMPlexComputeBdResidual_Internal* > > *static* *void** f0_bd_u(PetscInt dim, PetscInt Nf, PetscInt NfAux,* > > *const** PetscInt uOff[], **const** PetscInt > uOff_x[], **const** PetscScalar u[], **const** PetscScalar u_t[], **const** > PetscScalar u_x[],* > > *const** PetscInt aOff[], **const** PetscInt > aOff_x[], **const** PetscScalar a[], **const** PetscScalar a_t[], **const** > PetscScalar a_x[],* > > * PetscReal t, **const** PetscReal x[], **const** > PetscReal n[], PetscInt numConstants, **const** PetscScalar constants[], > PetscScalar f0[])* > > *{ * > > * PetscInt d;* > > *double** radius = **0.0**;* > > *const* *double** zthresh = (**3.** - **4.5**/**2.**)***.01**; **// > [m]* > > *for** (d = **0**; d < **2**; ++d) radius += x[d]*x[d];* > > * radius = sqrt(radius); * > > *if** ( radius > **.001**/**2.** ) * > > * {f0[**0**] = constants[**2**] * u[**0**];}* > > *else* > > * {* > > * f0[**0**] = -constants[**3**] ; * > > * }* > > *} * > > > On Mon, Nov 20, 2017 at 11:02 AM, Matthew Knepley > wrote: > >> On Mon, Nov 20, 2017 at 11:46 AM, David Fuentes >> wrote: >> >>> *Is there a way to pass the boundary set id to the function pointers for >>> the residual evaluation on the boundary ?* >>> >>> *https://bitbucket.org/petsc/petsc/src/d89bd21cf2b5366df29efb6006298d2bc22fb509/src/dm/dt/interface/dtfe.c?at=master&fileviewer=file-view-default#dtfe.c-4245 >>> * >>> >>> *I want to pass the boundary condition/constraint ID (ids): >>> PetscErrorCode PetscDSAddBoundary(PetscDS ds, DMBoundaryConditionType type, >>> const char name[], const char labelname[], PetscInt field, PetscInt >>> numcomps, const PetscInt *comps, void (*bcFunc)(void), PetscInt numids, >>> const PetscInt *ids, void *ctx)* >>> >>> *to the functions for the residual evaluation on the boundary.* >>> >>> >>> *For example, I have two side sets in an exodus file. I want to >>> implement Neumann boundary conditions on side set = 2 and Mixed/Cauchy BC >>> on side set = 3. Or similarly use different* >>> >>> *gmsh BC tags for Neumann/Mixed BC.* >>> >> I am not completely against this, but let me respond with my rationale >> first. What I thought you would do, is call AddBoundary() twice. Once with >> the >> Neumann function and value 2, and once with the Cauchy function and value >> 3. Does that not work in your situation? >> >> Also, I am refectoring this right now because a DS object can only take a >> single boundary integral point function (which is a pain for inhomogeneous >> Neumann), >> so I welcome input. >> >> Thanks, >> >> Matt >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuentesdt at gmail.com Mon Nov 20 11:33:07 2017 From: fuentesdt at gmail.com (David Fuentes) Date: Mon, 20 Nov 2017 11:33:07 -0600 Subject: [petsc-users] PetscFEIntegrateBdResidual_Basic In-Reply-To: References: Message-ID: Thanks! will give that a try! df On Mon, Nov 20, 2017 at 11:23 AM, Matthew Knepley wrote: > On Mon, Nov 20, 2017 at 12:20 PM, David Fuentes > wrote: > >> Thanks for the quick reply. Indeed it does work like this. I have added a >> location dependence on the boundary to differentiate the two. However, when >> my mesh moves then the BC will be applied incorrectly. >> > > I don't think you should have to do that. You can call AddBoundary twice. > Once with a function with the first branch and marker value 3, and the next > with the second branch and marker value 2. Does > that make sense? > > Thanks, > > Matt > >> *// PetscFEIntegrateBdResidual_Basic DMPlexComputeBdResidual_Internal* >> >> *static* *void** f0_bd_u(PetscInt dim, PetscInt Nf, PetscInt NfAux,* >> >> *const** PetscInt uOff[], **const** PetscInt >> uOff_x[], **const** PetscScalar u[], **const** PetscScalar u_t[], * >> *const** PetscScalar u_x[],* >> >> *const** PetscInt aOff[], **const** PetscInt >> aOff_x[], **const** PetscScalar a[], **const** PetscScalar a_t[], * >> *const** PetscScalar a_x[],* >> >> * PetscReal t, **const** PetscReal x[], **const** >> PetscReal n[], PetscInt numConstants, **const** PetscScalar constants[], >> PetscScalar f0[])* >> >> *{ * >> >> * PetscInt d;* >> >> *double** radius = **0.0**;* >> >> *const* *double** zthresh = (**3.** - **4.5**/**2.**)***.01**; **// >> [m]* >> >> *for** (d = **0**; d < **2**; ++d) radius += x[d]*x[d];* >> >> * radius = sqrt(radius); * >> >> *if** ( radius > **.001**/**2.** ) * >> >> * {f0[**0**] = constants[**2**] * u[**0**];}* >> >> *else* >> >> * {* >> >> * f0[**0**] = -constants[**3**] ; * >> >> * }* >> >> *} * >> >> >> On Mon, Nov 20, 2017 at 11:02 AM, Matthew Knepley >> wrote: >> >>> On Mon, Nov 20, 2017 at 11:46 AM, David Fuentes >>> wrote: >>> >>>> *Is there a way to pass the boundary set id to the function pointers >>>> for the residual evaluation on the boundary ?* >>>> >>>> *https://bitbucket.org/petsc/petsc/src/d89bd21cf2b5366df29efb6006298d2bc22fb509/src/dm/dt/interface/dtfe.c?at=master&fileviewer=file-view-default#dtfe.c-4245 >>>> * >>>> >>>> *I want to pass the boundary condition/constraint ID (ids): >>>> PetscErrorCode PetscDSAddBoundary(PetscDS ds, DMBoundaryConditionType type, >>>> const char name[], const char labelname[], PetscInt field, PetscInt >>>> numcomps, const PetscInt *comps, void (*bcFunc)(void), PetscInt numids, >>>> const PetscInt *ids, void *ctx)* >>>> >>>> *to the functions for the residual evaluation on the boundary.* >>>> >>>> >>>> *For example, I have two side sets in an exodus file. I want to >>>> implement Neumann boundary conditions on side set = 2 and Mixed/Cauchy BC >>>> on side set = 3. Or similarly use different* >>>> >>>> *gmsh BC tags for Neumann/Mixed BC.* >>>> >>> I am not completely against this, but let me respond with my rationale >>> first. What I thought you would do, is call AddBoundary() twice. Once with >>> the >>> Neumann function and value 2, and once with the Cauchy function and >>> value 3. Does that not work in your situation? >>> >>> Also, I am refectoring this right now because a DS object can only take >>> a single boundary integral point function (which is a pain for >>> inhomogeneous Neumann), >>> so I welcome input. >>> >>> Thanks, >>> >>> Matt >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Nov 20 11:36:23 2017 From: jed at jedbrown.org (Jed Brown) Date: Mon, 20 Nov 2017 10:36:23 -0700 Subject: [petsc-users] Intel MKL In-Reply-To: <417EEDC7-133E-4C4B-AFF5-9EB7934C9B0D@gmail.com> References: <417EEDC7-133E-4C4B-AFF5-9EB7934C9B0D@gmail.com> Message-ID: <87mv3gvomg.fsf@jedbrown.org> Randall Mackie writes: > Dear PETSc team: > > On upgrading to version 3.8, we have discovered an inconsistency in the python configuration scripts for using Intel MKL for BLAS/LAPACK. > It seems that these options were changed between 3.7 and 3.8: > > > Version 3.8: > --with-blaslapack-lib=libsunperf.a > --with-blas-lib=libblas.a --with-lapack-lib=liblapack.a > --with-blaslapack-dir=/soft/com/packages/intel/13/079/mkl > > Version 3.7: > --with-blas-lapack-lib=libsunperf.a > --with-blas-lib=libblas.a --with-lapack-lib=liblapack.a > --with-blas-lapack-dir=/soft/com/packages/intel/13/079/mkl > > So a hyphen was inserted with the lib and dir options. > > However, we found that at least in the mkl_pardiso.py script it still looks for a hyphen, but in other scripts we checked it doesn?t (although we did not do an exhaustive search). Does it actually "look for a hyphen" or is it just the error message: raise RuntimeError('MKL_CPardiso requires Intel MKL. Please rerun configure using --with-blas-lapack-dir=LOCATION_OF_INTEL_MKL') Configure is supposed to translate the old convention to the new convention. Yes, this error message should also be fixed. > We know that you are not connected to Intel, but maybe someone from Intel reads these messages, as their web pages that explain how to install Petsc are wrong and conform to the pre-3.8 options: > > https://software.intel.com/en-us/articles/enabling-intel-mkl-in-petsc-applications > > https://software.intel.com/en-us/articles/mkl-blas-lapack-with-petsc > > > Thanks, Randy Mackie From yann.jobic at univ-amu.fr Mon Nov 20 15:50:39 2017 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Mon, 20 Nov 2017 22:50:39 +0100 Subject: [petsc-users] DMPlexCreateFromDAG and orientation In-Reply-To: References: <31e57015-438f-bcc4-2f4e-adc5559006c2@univ-amu.fr> Message-ID: <25cd26c0-80dc-380f-feda-4a43b9cbc9f7@univ-amu.fr> Hi, Finally i used DMPlexCreateFromCellList, i don't know why i didn't try it first. I indeed create hex cells (so far). And it works ! Thanks for the help, Yann Le 20/11/2017 ? 14:47, Matthew Knepley a ?crit?: > On Sun, Nov 19, 2017 at 4:59 PM, Yann Jobic > wrote: > > Hello, > > I want to create my custom DMPLEX by using DMPlexCreateFromDAG(). > I tried with only one cell. I have : > > > It looks like a hex cell. Here is how I order hex cells: > > https://bitbucket.org/petsc/petsc/src/d89bd21cf2b5366df29efb6006298d2bc22fb509/src/dm/impls/plex/plexinterpolate.c?at=master&fileviewer=file-view-default#plexinterpolate.c-113 > > The first four vertices are on the bottom, last four on the top. Those > faces start with vertices right above each other, and > the faces have outward orientation. You can check your structure by > looking at the faces specified above, and see if it > matches yours, with outward orientation for all. > > Eventually me, or someone, will draw pictures of these for the manual. > > ? Thanks, > > ? ? Matt > > ? type: plex > Mesh 'DM_0x3dd7df0_0': > orientation is missing > cap --> base: > [0] Max sizes cone: 8 support: 1 > [0]: 1 ----> 0 > [0]: 2 ----> 0 > [0]: 3 ----> 0 > [0]: 4 ----> 0 > [0]: 5 ----> 0 > [0]: 6 ----> 0 > [0]: 7 ----> 0 > [0]: 8 ----> 0 > base <-- cap: > [0]: 0 <---- 1 (0) > [0]: 0 <---- 5 (0) > [0]: 0 <---- 7 (0) > [0]: 0 <---- 3 (0) > [0]: 0 <---- 2 (0) > [0]: 0 <---- 6 (0) > [0]: 0 <---- 8 (0) > [0]: 0 <---- 4 (0) > coordinates with 1 fields > ? field 0 with 3 components > Process 0: > ? (?? 1) dim? 3 offset?? 0 0. 0. 0. > ? (?? 2) dim? 3 offset?? 3 0. 0. 0.01 > ? (?? 3) dim? 3 offset?? 6 0. 0.01 0. > ? (?? 4) dim? 3 offset?? 9 0. 0.01 0.01 > ? (?? 5) dim? 3 offset? 12 0.01 0. 0. > ? (?? 6) dim? 3 offset? 15 0.01 0. 0.01 > ? (?? 7) dim? 3 offset? 18 0.01 0.01 0. > ? (?? 8) dim? 3 offset? 21 0.01 0.01 0.01 > > > Which is not correct when i'm trying to see the DM in visit. I > tried one from the example ex4.c, which gives : > > base <-- cap: > [0]: 0 <---- 1 (0) > [0]: 0 <---- 2 (0) > [0]: 0 <---- 3 (0) > [0]: 0 <---- 4 (0) > [0]: 0 <---- 5 (0) > [0]: 0 <---- 6 (0) > [0]: 0 <---- 7 (0) > [0]: 0 <---- 8 (0) > coordinates with 1 fields > ? field 0 with 3 components > Process 0: > ? (?? 1) dim? 3 offset?? 0 -1. -1. -1. > ? (?? 2) dim? 3 offset?? 3 -1. 1. -1. > ? (?? 3) dim? 3 offset?? 6 1. 1. -1. > ? (?? 4) dim? 3 offset?? 9 1. -1. -1. > ? (?? 5) dim? 3 offset? 12 -1. -1. 1. > ? (?? 6) dim? 3 offset? 15 1. -1. 1. > ? (?? 7) dim? 3 offset? 18 1. 1. 1. > ? (?? 8) dim? 3 offset? 21 -1. 1. 1. > > And this one is perfect. > > I may have a problem with the "cone orientation", but i really > don't understand how to set it. I tried the use DMPlexOrient(), > but i still have the problem. > > However, my connectivity looks good. What i am doing wrong ? This > is even more complicated when i have two cells sharing a face. > > Thanks for the? help ! > > Regards, > > Yann > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuentesdt at gmail.com Mon Nov 20 18:19:46 2017 From: fuentesdt at gmail.com (David Fuentes) Date: Mon, 20 Nov 2017 18:19:46 -0600 Subject: [petsc-users] PetscFEIntegrateBdResidual_Basic In-Reply-To: References: Message-ID: Sorry, If I call this AddBoundary Twice with different markers/ids, the bcFunc seems to be applied to set values for DM_BC_ESSENTIAL not DM_BC_NATURAL? Can I call ierr = PetscDSSetBdResidual(prob, 0, f0_bd_u, f1_bd_zero) twice to set the f0 functions different for each boundary ? /*@C PetscDSAddBoundary - Add a boundary condition to the model Input Parameters: + ds - The PetscDS object . type - The type of condition, e.g. DM_BC_ESSENTIAL/DM_BC_ESSENTIAL_FIELD (Dirichlet), or DM_BC_NATURAL (Neumann) . name - The BC name . labelname - The label defining constrained points . field - The field to constrain . numcomps - The number of constrained field components . comps - An array of constrained component numbers . bcFunc - A pointwise function giving boundary values . numids - The number of DMLabel ids for constrained points . ids - An array of ids for constrained points - ctx - An optional user context for bcFunc Options Database Keys: + -bc_ - Overrides the boundary ids - -bc__comp - Overrides the boundary components Level: developer .seealso: PetscDSGetBoundary() @*/ On Mon, Nov 20, 2017 at 11:33 AM, David Fuentes wrote: > Thanks! will give that a try! > df > > On Mon, Nov 20, 2017 at 11:23 AM, Matthew Knepley > wrote: > >> On Mon, Nov 20, 2017 at 12:20 PM, David Fuentes >> wrote: >> >>> Thanks for the quick reply. Indeed it does work like this. I have added >>> a location dependence on the boundary to differentiate the two. However, >>> when my mesh moves then the BC will be applied incorrectly. >>> >> >> I don't think you should have to do that. You can call AddBoundary twice. >> Once with a function with the first branch and marker value 3, and the next >> with the second branch and marker value 2. Does >> that make sense? >> >> Thanks, >> >> Matt >> >>> *// PetscFEIntegrateBdResidual_Basic DMPlexComputeBdResidual_Internal* >>> >>> *static* *void** f0_bd_u(PetscInt dim, PetscInt Nf, PetscInt NfAux,* >>> >>> *const** PetscInt uOff[], **const** PetscInt >>> uOff_x[], **const** PetscScalar u[], **const** PetscScalar u_t[], * >>> *const** PetscScalar u_x[],* >>> >>> *const** PetscInt aOff[], **const** PetscInt >>> aOff_x[], **const** PetscScalar a[], **const** PetscScalar a_t[], * >>> *const** PetscScalar a_x[],* >>> >>> * PetscReal t, **const** PetscReal x[], **const** >>> PetscReal n[], PetscInt numConstants, **const** PetscScalar >>> constants[], PetscScalar f0[])* >>> >>> *{ * >>> >>> * PetscInt d;* >>> >>> *double** radius = **0.0**;* >>> >>> *const* *double** zthresh = (**3.** - **4.5**/**2.**)***.01**; **// >>> [m]* >>> >>> *for** (d = **0**; d < **2**; ++d) radius += x[d]*x[d];* >>> >>> * radius = sqrt(radius); * >>> >>> *if** ( radius > **.001**/**2.** ) * >>> >>> * {f0[**0**] = constants[**2**] * u[**0**];}* >>> >>> *else* >>> >>> * {* >>> >>> * f0[**0**] = -constants[**3**] ; * >>> >>> * }* >>> >>> *} * >>> >>> >>> On Mon, Nov 20, 2017 at 11:02 AM, Matthew Knepley >>> wrote: >>> >>>> On Mon, Nov 20, 2017 at 11:46 AM, David Fuentes >>>> wrote: >>>> >>>>> *Is there a way to pass the boundary set id to the function pointers >>>>> for the residual evaluation on the boundary ?* >>>>> >>>>> *https://bitbucket.org/petsc/petsc/src/d89bd21cf2b5366df29efb6006298d2bc22fb509/src/dm/dt/interface/dtfe.c?at=master&fileviewer=file-view-default#dtfe.c-4245 >>>>> * >>>>> >>>>> *I want to pass the boundary condition/constraint ID (ids): >>>>> PetscErrorCode PetscDSAddBoundary(PetscDS ds, DMBoundaryConditionType type, >>>>> const char name[], const char labelname[], PetscInt field, PetscInt >>>>> numcomps, const PetscInt *comps, void (*bcFunc)(void), PetscInt numids, >>>>> const PetscInt *ids, void *ctx)* >>>>> >>>>> *to the functions for the residual evaluation on the boundary.* >>>>> >>>>> >>>>> *For example, I have two side sets in an exodus file. I want to >>>>> implement Neumann boundary conditions on side set = 2 and Mixed/Cauchy BC >>>>> on side set = 3. Or similarly use different* >>>>> >>>>> *gmsh BC tags for Neumann/Mixed BC.* >>>>> >>>> I am not completely against this, but let me respond with my rationale >>>> first. What I thought you would do, is call AddBoundary() twice. Once with >>>> the >>>> Neumann function and value 2, and once with the Cauchy function and >>>> value 3. Does that not work in your situation? >>>> >>>> Also, I am refectoring this right now because a DS object can only take >>>> a single boundary integral point function (which is a pain for >>>> inhomogeneous Neumann), >>>> so I welcome input. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 21 05:32:23 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 21 Nov 2017 06:32:23 -0500 Subject: [petsc-users] PetscFEIntegrateBdResidual_Basic In-Reply-To: References: Message-ID: On Mon, Nov 20, 2017 at 7:19 PM, David Fuentes wrote: > Sorry, If I call this AddBoundary Twice with different markers/ids, the > bcFunc seems to be applied to set values for DM_BC_ESSENTIAL not > DM_BC_NATURAL? > > Can I call ierr = PetscDSSetBdResidual(prob, 0, f0_bd_u, f1_bd_zero) > twice to set the f0 functions different for each boundary ? > Okay, this is the problem I was citing before. AddBoundary() does take natural conditions, but it just ignores the inhomogeneous ones, which are properly integrated in the weak form. However, DS only takes a single pointfunction and integrates over the whole boundary. Thus your fix. I will get this fixed, but since it entails some programming, it might take me a while. Sorry about that. Thanks, Matt > > /*@C > PetscDSAddBoundary - Add a boundary condition to the model > > Input Parameters: > + ds - The PetscDS object > . type - The type of condition, e.g. DM_BC_ESSENTIAL/DM_BC_ESSENTIAL_FIELD > (Dirichlet), or DM_BC_NATURAL (Neumann) > . name - The BC name > . labelname - The label defining constrained points > . field - The field to constrain > . numcomps - The number of constrained field components > . comps - An array of constrained component numbers > . bcFunc - A pointwise function giving boundary values > . numids - The number of DMLabel ids for constrained points > . ids - An array of ids for constrained points > - ctx - An optional user context for bcFunc > > Options Database Keys: > + -bc_ - Overrides the boundary ids > - -bc__comp - Overrides the boundary components > > Level: developer > > .seealso: PetscDSGetBoundary() > @*/ > > On Mon, Nov 20, 2017 at 11:33 AM, David Fuentes > wrote: > >> Thanks! will give that a try! >> df >> >> On Mon, Nov 20, 2017 at 11:23 AM, Matthew Knepley >> wrote: >> >>> On Mon, Nov 20, 2017 at 12:20 PM, David Fuentes >>> wrote: >>> >>>> Thanks for the quick reply. Indeed it does work like this. I have added >>>> a location dependence on the boundary to differentiate the two. However, >>>> when my mesh moves then the BC will be applied incorrectly. >>>> >>> >>> I don't think you should have to do that. You can call AddBoundary >>> twice. Once with a function with the first branch and marker value 3, and >>> the next with the second branch and marker value 2. Does >>> that make sense? >>> >>> Thanks, >>> >>> Matt >>> >>>> *// PetscFEIntegrateBdResidual_Basic DMPlexComputeBdResidual_Internal* >>>> >>>> >>>> *static* *void** f0_bd_u(PetscInt dim, PetscInt Nf, PetscInt NfAux,* >>>> >>>> *const** PetscInt uOff[], **const** PetscInt >>>> uOff_x[], **const** PetscScalar u[], **const** PetscScalar u_t[], * >>>> *const** PetscScalar u_x[],* >>>> >>>> *const** PetscInt aOff[], **const** PetscInt >>>> aOff_x[], **const** PetscScalar a[], **const** PetscScalar a_t[], * >>>> *const** PetscScalar a_x[],* >>>> >>>> * PetscReal t, **const** PetscReal x[], **const** >>>> PetscReal n[], PetscInt numConstants, **const** PetscScalar >>>> constants[], PetscScalar f0[])* >>>> >>>> *{ * >>>> >>>> * PetscInt d;* >>>> >>>> *double** radius = **0.0**;* >>>> >>>> *const* *double** zthresh = (**3.** - **4.5**/**2.**)***.01**; **// >>>> [m]* >>>> >>>> *for** (d = **0**; d < **2**; ++d) radius += x[d]*x[d];* >>>> >>>> * radius = sqrt(radius); * >>>> >>>> *if** ( radius > **.001**/**2.** ) * >>>> >>>> * {f0[**0**] = constants[**2**] * u[**0**];}* >>>> >>>> *else* >>>> >>>> * {* >>>> >>>> * f0[**0**] = -constants[**3**] ; * >>>> >>>> * }* >>>> >>>> *} * >>>> >>>> >>>> On Mon, Nov 20, 2017 at 11:02 AM, Matthew Knepley >>>> wrote: >>>> >>>>> On Mon, Nov 20, 2017 at 11:46 AM, David Fuentes >>>>> wrote: >>>>> >>>>>> *Is there a way to pass the boundary set id to the function pointers >>>>>> for the residual evaluation on the boundary ?* >>>>>> >>>>>> *https://bitbucket.org/petsc/petsc/src/d89bd21cf2b5366df29efb6006298d2bc22fb509/src/dm/dt/interface/dtfe.c?at=master&fileviewer=file-view-default#dtfe.c-4245 >>>>>> * >>>>>> >>>>>> *I want to pass the boundary condition/constraint ID (ids): >>>>>> PetscErrorCode PetscDSAddBoundary(PetscDS ds, DMBoundaryConditionType type, >>>>>> const char name[], const char labelname[], PetscInt field, PetscInt >>>>>> numcomps, const PetscInt *comps, void (*bcFunc)(void), PetscInt numids, >>>>>> const PetscInt *ids, void *ctx)* >>>>>> >>>>>> *to the functions for the residual evaluation on the boundary.* >>>>>> >>>>>> >>>>>> *For example, I have two side sets in an exodus file. I want to >>>>>> implement Neumann boundary conditions on side set = 2 and Mixed/Cauchy BC >>>>>> on side set = 3. Or similarly use different* >>>>>> >>>>>> *gmsh BC tags for Neumann/Mixed BC.* >>>>>> >>>>> I am not completely against this, but let me respond with my rationale >>>>> first. What I thought you would do, is call AddBoundary() twice. Once with >>>>> the >>>>> Neumann function and value 2, and once with the Cauchy function and >>>>> value 3. Does that not work in your situation? >>>>> >>>>> Also, I am refectoring this right now because a DS object can only >>>>> take a single boundary integral point function (which is a pain for >>>>> inhomogeneous Neumann), >>>>> so I welcome input. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From juan at tf.uni-kiel.de Wed Nov 22 03:48:46 2017 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Wed, 22 Nov 2017 10:48:46 +0100 Subject: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues Message-ID: Hello, we prepared a small example which computes the gradient via the continuous adjoint method of a heating problem with a cost functional. We implemented the text book example and tested the gradient via a Taylor Remainder (which works fine). Now we wanted to solve the optimization problem with TAO and checked the gradient vs. the finite difference gradient and run into problems. Testing hand-coded gradient (hc) against finite difference gradient (fd), if the ratio ||fd - hc|| / ||hc|| is 0 (1.e-8), the hand-coded gradient is probably correct. Run with -tao_test_display to show difference between hand-coded and finite difference gradient. ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = (fd'hc)/||fd||||hc|| = 0.99768 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = 0.00973464 max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference ||fd-hc|| = 0.00243363 ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = (fd'hc)/||fd||||hc|| = 0.997609 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = 0.0253185 max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference ||fd-hc|| = 0.00624562 ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = (fd'hc)/||fd||||hc|| = 0.997338 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| = 0.00585376 max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference ||fd-hc|| = 0.00137836 Despite these differences we achieve convergence with our hand coded gradient, but have to use -tao_ls_type unit. $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_gatol 1e-7 -tao_ls_type unit iter = 0, Function value: 0.000316722, Residual: 0.00126285 iter = 1, Function value: 3.82272e-05, Residual: 0.000438094 iter = 2, Function value: 1.26011e-07, Residual: 8.4194e-08 Tao Object: 1 MPI processes type: blmvm Gradient steps: 0 TaoLineSearch Object: 1 MPI processes type: unit Active Set subset type: subvec convergence tolerances: gatol=1e-07, steptol=0., gttol=0. Residual in Function/Gradient:=8.4194e-08 Objective value=1.26011e-07 total number of iterations=2, (max: 2000) total number of function/gradient evaluations=3, (max: 4000) Solution converged: ||g(X)|| <= gatol $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_fd_gradient iter = 0, Function value: 0.000316722, Residual: 4.87343e-06 iter = 1, Function value: 0.000195676, Residual: 3.83011e-06 iter = 2, Function value: 1.26394e-07, Residual: 1.60262e-09 Tao Object: 1 MPI processes type: blmvm Gradient steps: 0 TaoLineSearch Object: 1 MPI processes type: more-thuente Active Set subset type: subvec convergence tolerances: gatol=1e-08, steptol=0., gttol=0. Residual in Function/Gradient:=1.60262e-09 Objective value=1.26394e-07 total number of iterations=2, (max: 2000) total number of function/gradient evaluations=3474, (max: 4000) Solution converged: ||g(X)|| <= gatol We think, that the finite difference gradient should be in line with our hand coded gradient for such a simple example. We appreciate any hints on debugging this issue. It is implemented in python (firedrake) and i can provide the code if this is needed. Regards Julian From emconsta at mcs.anl.gov Wed Nov 22 09:27:59 2017 From: emconsta at mcs.anl.gov (Emil Constantinescu) Date: Wed, 22 Nov 2017 09:27:59 -0600 Subject: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues In-Reply-To: References: Message-ID: <618bc0f6-020f-8d76-ee4e-6c65d81ef4c1@mcs.anl.gov> On 11/22/17 3:48 AM, Julian Andrej wrote: > Hello, > > we prepared a small example which computes the gradient via the > continuous adjoint method of a heating problem with a cost functional. > > We implemented the text book example and tested the gradient via a > Taylor Remainder (which works fine). Now we wanted to solve the > optimization problem with TAO and checked the gradient vs. the finite > difference gradient and run into problems. > > Testing hand-coded gradient (hc) against finite difference gradient > (fd), if the ratio ||fd - hc|| / ||hc|| is > 0 (1.e-8), the hand-coded gradient is probably correct. > Run with -tao_test_display to show difference > between hand-coded and finite difference gradient. > ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = > (fd'hc)/||fd||||hc|| = 0.99768 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = > 0.00973464 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference ||fd-hc|| = > 0.00243363 > ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = > (fd'hc)/||fd||||hc|| = 0.997609 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = > 0.0253185 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference ||fd-hc|| = > 0.00624562 > ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = > (fd'hc)/||fd||||hc|| = 0.997338 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| = > 0.00585376 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference ||fd-hc|| = > 0.00137836 > > Despite these differences we achieve convergence with our hand coded > gradient, but have to use -tao_ls_type unit. Both give similar (assume descent) directions, but seem to be scaled differently. It could be a bad scaling by the mass matrix somewhere in the continuous adjoint. This could be seen if you plot them side by side as a quick diagnostic. Emil > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_gatol > 1e-7 -tao_ls_type unit > iter =?? 0, Function value: 0.000316722,? Residual: 0.00126285 > iter =?? 1, Function value: 3.82272e-05,? Residual: 0.000438094 > iter =?? 2, Function value: 1.26011e-07,? Residual: 8.4194e-08 > Tao Object: 1 MPI processes > ? type: blmvm > ????? Gradient steps: 0 > ? TaoLineSearch Object: 1 MPI processes > ??? type: unit > ? Active Set subset type: subvec > ? convergence tolerances: gatol=1e-07,?? steptol=0.,?? gttol=0. > ? Residual in Function/Gradient:=8.4194e-08 > ? Objective value=1.26011e-07 > ? total number of iterations=2,????????????????????????? (max: 2000) > ? total number of function/gradient evaluations=3,????? (max: 4000) > ? Solution converged:??? ||g(X)|| <= gatol > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor > -tao_fd_gradient > iter =?? 0, Function value: 0.000316722,? Residual: 4.87343e-06 > iter =?? 1, Function value: 0.000195676,? Residual: 3.83011e-06 > iter =?? 2, Function value: 1.26394e-07,? Residual: 1.60262e-09 > Tao Object: 1 MPI processes > ? type: blmvm > ????? Gradient steps: 0 > ? TaoLineSearch Object: 1 MPI processes > ??? type: more-thuente > ? Active Set subset type: subvec > ? convergence tolerances: gatol=1e-08,?? steptol=0.,?? gttol=0. > ? Residual in Function/Gradient:=1.60262e-09 > ? Objective value=1.26394e-07 > ? total number of iterations=2,????????????????????????? (max: 2000) > ? total number of function/gradient evaluations=3474,????? (max: 4000) > ? Solution converged:??? ||g(X)|| <= gatol > > > We think, that the finite difference gradient should be in line with our > hand coded gradient for such a simple example. > > We appreciate any hints on debugging this issue. It is implemented in > python (firedrake) and i can provide the code if this is needed. > > Regards > Julian From bsmith at mcs.anl.gov Wed Nov 22 09:34:55 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 22 Nov 2017 15:34:55 +0000 Subject: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues In-Reply-To: References: Message-ID: > On Nov 22, 2017, at 3:48 AM, Julian Andrej wrote: > > Hello, > > we prepared a small example which computes the gradient via the continuous adjoint method of a heating problem with a cost functional. Julian, The first thing to note is that the continuous adjoint is not exactly the same as the adjoint for the actual algebraic system you are solving. (It is only, as I understand it possibly the same in the limit with very fine mesh and time step). Thus you would not actually expect these to match with PETSc fd. Now as your refine space/time do the numbers get closer to each other? Note the angle cosine is very close to one which means that they are producing the same search direction, just different lengths. How is the convergence of the solver if you use -tao_fd_gradient do you still need unit. > but have to use -tao_ls_type unit. This is slightly odd, because this line search always just takes the full step, the other ones would normally be better since they are more sophisticated in picking the step size. Please run without the -tao_ls_type unit. and send the output Also does your problem have bound constraints? If not use -tao_type lmvm and send the output. Just saw Emil's email, yes there could easily be a scaling issue with your continuous adjoint. Barry > > We implemented the text book example and tested the gradient via a Taylor Remainder (which works fine). Now we wanted to solve the > optimization problem with TAO and checked the gradient vs. the finite difference gradient and run into problems. > > Testing hand-coded gradient (hc) against finite difference gradient (fd), if the ratio ||fd - hc|| / ||hc|| is > 0 (1.e-8), the hand-coded gradient is probably correct. > Run with -tao_test_display to show difference > between hand-coded and finite difference gradient. > ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = (fd'hc)/||fd||||hc|| = 0.99768 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = 0.00973464 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference ||fd-hc|| = 0.00243363 > ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = (fd'hc)/||fd||||hc|| = 0.997609 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = 0.0253185 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference ||fd-hc|| = 0.00624562 > ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = (fd'hc)/||fd||||hc|| = 0.997338 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| = 0.00585376 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference ||fd-hc|| = 0.00137836 > > Despite these differences we achieve convergence with our hand coded gradient, but have to use -tao_ls_type unit. > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_gatol 1e-7 -tao_ls_type unit > iter = 0, Function value: 0.000316722, Residual: 0.00126285 > iter = 1, Function value: 3.82272e-05, Residual: 0.000438094 > iter = 2, Function value: 1.26011e-07, Residual: 8.4194e-08 > Tao Object: 1 MPI processes > type: blmvm > Gradient steps: 0 > TaoLineSearch Object: 1 MPI processes > type: unit > Active Set subset type: subvec > convergence tolerances: gatol=1e-07, steptol=0., gttol=0. > Residual in Function/Gradient:=8.4194e-08 > Objective value=1.26011e-07 > total number of iterations=2, (max: 2000) > total number of function/gradient evaluations=3, (max: 4000) > Solution converged: ||g(X)|| <= gatol > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_fd_gradient > iter = 0, Function value: 0.000316722, Residual: 4.87343e-06 > iter = 1, Function value: 0.000195676, Residual: 3.83011e-06 > iter = 2, Function value: 1.26394e-07, Residual: 1.60262e-09 > Tao Object: 1 MPI processes > type: blmvm > Gradient steps: 0 > TaoLineSearch Object: 1 MPI processes > type: more-thuente > Active Set subset type: subvec > convergence tolerances: gatol=1e-08, steptol=0., gttol=0. > Residual in Function/Gradient:=1.60262e-09 > Objective value=1.26394e-07 > total number of iterations=2, (max: 2000) > total number of function/gradient evaluations=3474, (max: 4000) > Solution converged: ||g(X)|| <= gatol > > > We think, that the finite difference gradient should be in line with our hand coded gradient for such a simple example. > > We appreciate any hints on debugging this issue. It is implemented in python (firedrake) and i can provide the code if this is needed. > > Regards > Julian From stefano.zampini at gmail.com Wed Nov 22 09:56:02 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Wed, 22 Nov 2017 18:56:02 +0300 Subject: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues In-Reply-To: References: Message-ID: Just to add on Emil's answer: being the adjoint ode linear, you may either being not properly scaling the initial condition (if your objective is a final value one) or the adjoint forcing (i.e. the gradient wrt the state of the objective function if you have a cost gradient) 2017-11-22 18:34 GMT+03:00 Smith, Barry F. : > > > > On Nov 22, 2017, at 3:48 AM, Julian Andrej wrote: > > > > Hello, > > > > we prepared a small example which computes the gradient via the > continuous adjoint method of a heating problem with a cost functional. > > Julian, > > The first thing to note is that the continuous adjoint is not exactly > the same as the adjoint for the actual algebraic system you are solving. > (It is only, as I understand it possibly the same in the limit with very > fine mesh and time step). Thus you would not actually expect these to match > with PETSc fd. Now as your refine space/time do the numbers get closer to > each other? > > Note the angle cosine is very close to one which means that they are > producing the same search direction, just different lengths. > > How is the convergence of the solver if you use -tao_fd_gradient do you > still need unit. > > > but have to use -tao_ls_type unit. > > This is slightly odd, because this line search always just takes the > full step, the other ones would normally be better since they are more > sophisticated in picking the step size. Please run without the -tao_ls_type > unit. and send the output > > Also does your problem have bound constraints? If not use -tao_type > lmvm and send the output. > > Just saw Emil's email, yes there could easily be a scaling issue with > your continuous adjoint. > > Barry > > > > > > > We implemented the text book example and tested the gradient via a > Taylor Remainder (which works fine). Now we wanted to solve the > > optimization problem with TAO and checked the gradient vs. the finite > difference gradient and run into problems. > > > > Testing hand-coded gradient (hc) against finite difference gradient > (fd), if the ratio ||fd - hc|| / ||hc|| is > > 0 (1.e-8), the hand-coded gradient is probably correct. > > Run with -tao_test_display to show difference > > between hand-coded and finite difference gradient. > > ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = > (fd'hc)/||fd||||hc|| = 0.99768 > > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = > 0.00973464 > > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference ||fd-hc|| = > 0.00243363 > > ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = > (fd'hc)/||fd||||hc|| = 0.997609 > > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = > 0.0253185 > > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference ||fd-hc|| = > 0.00624562 > > ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = > (fd'hc)/||fd||||hc|| = 0.997338 > > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| = > 0.00585376 > > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference ||fd-hc|| = > 0.00137836 > > > > Despite these differences we achieve convergence with our hand coded > gradient, but have to use -tao_ls_type unit. > > > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_gatol > 1e-7 -tao_ls_type unit > > iter = 0, Function value: 0.000316722, Residual: 0.00126285 > > iter = 1, Function value: 3.82272e-05, Residual: 0.000438094 > > iter = 2, Function value: 1.26011e-07, Residual: 8.4194e-08 > > Tao Object: 1 MPI processes > > type: blmvm > > Gradient steps: 0 > > TaoLineSearch Object: 1 MPI processes > > type: unit > > Active Set subset type: subvec > > convergence tolerances: gatol=1e-07, steptol=0., gttol=0. > > Residual in Function/Gradient:=8.4194e-08 > > Objective value=1.26011e-07 > > total number of iterations=2, (max: 2000) > > total number of function/gradient evaluations=3, (max: 4000) > > Solution converged: ||g(X)|| <= gatol > > > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor > -tao_fd_gradient > > iter = 0, Function value: 0.000316722, Residual: 4.87343e-06 > > iter = 1, Function value: 0.000195676, Residual: 3.83011e-06 > > iter = 2, Function value: 1.26394e-07, Residual: 1.60262e-09 > > Tao Object: 1 MPI processes > > type: blmvm > > Gradient steps: 0 > > TaoLineSearch Object: 1 MPI processes > > type: more-thuente > > Active Set subset type: subvec > > convergence tolerances: gatol=1e-08, steptol=0., gttol=0. > > Residual in Function/Gradient:=1.60262e-09 > > Objective value=1.26394e-07 > > total number of iterations=2, (max: 2000) > > total number of function/gradient evaluations=3474, (max: 4000) > > Solution converged: ||g(X)|| <= gatol > > > > > > We think, that the finite difference gradient should be in line with our > hand coded gradient for such a simple example. > > > > We appreciate any hints on debugging this issue. It is implemented in > python (firedrake) and i can provide the code if this is needed. > > > > Regards > > Julian > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eric.Chamberland at giref.ulaval.ca Wed Nov 22 10:13:34 2017 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Wed, 22 Nov 2017 11:13:34 -0500 Subject: [petsc-users] How can I retrieve the IS for all Missing Diagonal entries? Message-ID: <57999e41-6c3f-ae21-6f88-ea7348cf505c@giref.ulaval.ca> Hi, I have 2 questions: First, I am looking for a function that is almost like MatMissingDiagonal, but that would return me *all* missing diagonal entries. Does it exists? If not, is there another way of doing this? Second: after searching through Petsc list, I found this that upset me a bit: https://www.mail-archive.com/petsc-users at mcs.anl.gov/msg22867.html so maybe I should modify our code to be fully compliant with this? I have some examples (MUMPS) that are working without diagonal entries but I didn't tried other PCs or KSPs... Thanks, Eric From hongzhang at anl.gov Wed Nov 22 10:20:48 2017 From: hongzhang at anl.gov (Zhang, Hong) Date: Wed, 22 Nov 2017 16:20:48 +0000 Subject: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues In-Reply-To: References: Message-ID: <96938CAC-1CC5-47EE-93CE-B3DC07F93219@anl.gov> Hi Julian, If I remember correctly, you have a code that worked fine with discrete adjoint (TSAdjoint). Was it for the same example? If so, how are the differences in the validation output between continuous adjoint and discrete adjoint? Hong (Mr.) > On Nov 22, 2017, at 3:48 AM, Julian Andrej wrote: > > Hello, > > we prepared a small example which computes the gradient via the continuous adjoint method of a heating problem with a cost functional. > > We implemented the text book example and tested the gradient via a Taylor Remainder (which works fine). Now we wanted to solve the > optimization problem with TAO and checked the gradient vs. the finite difference gradient and run into problems. > > Testing hand-coded gradient (hc) against finite difference gradient (fd), if the ratio ||fd - hc|| / ||hc|| is > 0 (1.e-8), the hand-coded gradient is probably correct. > Run with -tao_test_display to show difference > between hand-coded and finite difference gradient. > ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = (fd'hc)/||fd||||hc|| = 0.99768 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = 0.00973464 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference ||fd-hc|| = 0.00243363 > ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = (fd'hc)/||fd||||hc|| = 0.997609 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = 0.0253185 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference ||fd-hc|| = 0.00624562 > ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = (fd'hc)/||fd||||hc|| = 0.997338 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| = 0.00585376 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference ||fd-hc|| = 0.00137836 > > Despite these differences we achieve convergence with our hand coded gradient, but have to use -tao_ls_type unit. > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_gatol 1e-7 -tao_ls_type unit > iter = 0, Function value: 0.000316722, Residual: 0.00126285 > iter = 1, Function value: 3.82272e-05, Residual: 0.000438094 > iter = 2, Function value: 1.26011e-07, Residual: 8.4194e-08 > Tao Object: 1 MPI processes > type: blmvm > Gradient steps: 0 > TaoLineSearch Object: 1 MPI processes > type: unit > Active Set subset type: subvec > convergence tolerances: gatol=1e-07, steptol=0., gttol=0. > Residual in Function/Gradient:=8.4194e-08 > Objective value=1.26011e-07 > total number of iterations=2, (max: 2000) > total number of function/gradient evaluations=3, (max: 4000) > Solution converged: ||g(X)|| <= gatol > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_fd_gradient > iter = 0, Function value: 0.000316722, Residual: 4.87343e-06 > iter = 1, Function value: 0.000195676, Residual: 3.83011e-06 > iter = 2, Function value: 1.26394e-07, Residual: 1.60262e-09 > Tao Object: 1 MPI processes > type: blmvm > Gradient steps: 0 > TaoLineSearch Object: 1 MPI processes > type: more-thuente > Active Set subset type: subvec > convergence tolerances: gatol=1e-08, steptol=0., gttol=0. > Residual in Function/Gradient:=1.60262e-09 > Objective value=1.26394e-07 > total number of iterations=2, (max: 2000) > total number of function/gradient evaluations=3474, (max: 4000) > Solution converged: ||g(X)|| <= gatol > > > We think, that the finite difference gradient should be in line with our hand coded gradient for such a simple example. > > We appreciate any hints on debugging this issue. It is implemented in python (firedrake) and i can provide the code if this is needed. > > Regards > Julian From knepley at gmail.com Wed Nov 22 10:26:08 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 22 Nov 2017 11:26:08 -0500 Subject: [petsc-users] How can I retrieve the IS for all Missing Diagonal entries? In-Reply-To: <57999e41-6c3f-ae21-6f88-ea7348cf505c@giref.ulaval.ca> References: <57999e41-6c3f-ae21-6f88-ea7348cf505c@giref.ulaval.ca> Message-ID: On Wed, Nov 22, 2017 at 11:13 AM, Eric Chamberland < Eric.Chamberland at giref.ulaval.ca> wrote: > Hi, > > I have 2 questions: > > First, I am looking for a function that is almost like MatMissingDiagonal, > but that would return me *all* missing diagonal entries. > > Does it exists? > No > If not, is there another way of doing this? > Not a nice way, unfortunately. It is fairly dependent on the implementation. You could call GetRow() for every row and check. > Second: after searching through Petsc list, I found this that upset me a > bit: > > https://www.mail-archive.com/petsc-users at mcs.anl.gov/msg22867.html > > so maybe I should modify our code to be fully compliant with this? I have > some examples (MUMPS) that are working without diagonal entries but I > didn't tried other PCs or KSPs... > We use the diagonal frequently, for instance in the factorization PCs. I am guessing we put in the diagonal when converting to the MUMPS format. Thanks, Matt > Thanks, > > Eric > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Nov 22 10:26:10 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 22 Nov 2017 16:26:10 +0000 Subject: [petsc-users] How can I retrieve the IS for all Missing Diagonal entries? In-Reply-To: <57999e41-6c3f-ae21-6f88-ea7348cf505c@giref.ulaval.ca> References: <57999e41-6c3f-ae21-6f88-ea7348cf505c@giref.ulaval.ca> Message-ID: > On Nov 22, 2017, at 10:13 AM, Eric Chamberland wrote: > > Hi, > > I have 2 questions: > > First, I am looking for a function that is almost like MatMissingDiagonal, but that would return me *all* missing diagonal entries. > > Does it exists? I'm afraid not. > > If not, is there another way of doing this? You would have to copy the appropriate MatMissingDiagonal_SeqAIJ() code and modify for your needs. Barry > > Second: after searching through Petsc list, I found this that upset me a bit: > > https://www.mail-archive.com/petsc-users at mcs.anl.gov/msg22867.html > > so maybe I should modify our code to be fully compliant with this? I have some examples (MUMPS) that are working without diagonal entries but I didn't tried other PCs or KSPs... > > Thanks, > > Eric From yann.jobic at univ-amu.fr Wed Nov 22 11:39:37 2017 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Wed, 22 Nov 2017 18:39:37 +0100 Subject: [petsc-users] 2 Dirichlet conditions for one Element in PetscFE Message-ID: Hello, I've found a strange behavior when looking into a bug for the pressure convergence of a simple Navier-Stokes problem using PetscFE. I followed many examples for labeling boundary faces. I first use DMPlexMarkBoundaryFaces, (label=1 to the faces). I find those faces using DMGetStratumIS and searching 1 as it is the value of the marked boundary faces. Finally i use DMPlexLabelComplete over the new label. I then use : ? ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "in", "Faces", 0, Ncomp, components, (void (*)(void)) uIn, NWest, west, NULL);CHKERRQ(ierr); in order to impose a dirichlet condition for the faces labeled by the correct value (west=1, south=3,...). However, the function "uIn()" is called in all the Elements containing the boundary faces, and thus impose the values at nodes that are not in the labeled faces. Is it a normal behavior ? I then have to test the position of the node calling uIn, in order to impose the good value. I have this problem for a Poiseuille flow, where at 2 corner Elements i have a zero velocity dirichlet condition (wall) and a In flow velocity one. The pressure is then very high at the corner nodes of those 2 Elements. Do you think my pressure problem comes from there ? (The velocity field is correct) Many thanks, Regards, Yann PS : i'm using those runtime options : -vel_petscspace_order 2 -pres_petscspace_order 1 \ -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_schur_fact_type full? \ -fieldsplit_velocity_pc_type lu -fieldsplit_pressure_ksp_rtol 1.0e-10 -fieldsplit_pressure_pc_type jacobi --- L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le logiciel antivirus Avast. https://www.avast.com/antivirus From knepley at gmail.com Wed Nov 22 11:51:37 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 22 Nov 2017 12:51:37 -0500 Subject: [petsc-users] 2 Dirichlet conditions for one Element in PetscFE In-Reply-To: References: Message-ID: On Wed, Nov 22, 2017 at 12:39 PM, Yann Jobic wrote: > Hello, > > I've found a strange behavior when looking into a bug for the pressure > convergence of a simple Navier-Stokes problem using PetscFE. > > I followed many examples for labeling boundary faces. I first use > DMPlexMarkBoundaryFaces, (label=1 to the faces). > I find those faces using DMGetStratumIS and searching 1 as it is the value > of the marked boundary faces. > Finally i use DMPlexLabelComplete over the new label. > I then use : > ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "in", "Faces", 0, > Ncomp, components, (void (*)(void)) uIn, NWest, west, NULL);CHKERRQ(ierr); > in order to impose a dirichlet condition for the faces labeled by the > correct value (west=1, south=3,...). > > However, the function "uIn()" is called in all the Elements containing the > boundary faces, and thus impose the values at nodes that are not in the > labeled faces. > Is it a normal behavior ? I then have to test the position of the node > calling uIn, in order to impose the good value. > I have this problem for a Poiseuille flow, where at 2 corner Elements i > have a zero velocity dirichlet condition (wall) and a In flow velocity one. > I believe I have fixed this in knepley/fix-plex-bc-multiple which should be merged soon. Do you know how to merge that branch and try? Thanks, Matt > The pressure is then very high at the corner nodes of those 2 Elements. > Do you think my pressure problem comes from there ? (The velocity field is > correct) > > Many thanks, > > Regards, > > Yann > > PS : i'm using those runtime options : > -vel_petscspace_order 2 -pres_petscspace_order 1 \ > -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type schur > -pc_fieldsplit_schur_fact_type full \ > -fieldsplit_velocity_pc_type lu -fieldsplit_pressure_ksp_rtol 1.0e-10 > -fieldsplit_pressure_pc_type jacobi > > > --- > L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le > logiciel antivirus Avast. > https://www.avast.com/antivirus > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Nov 22 11:53:47 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 22 Nov 2017 12:53:47 -0500 Subject: [petsc-users] 2 Dirichlet conditions for one Element in PetscFE In-Reply-To: References: Message-ID: On Wed, Nov 22, 2017 at 12:39 PM, Yann Jobic wrote: > Hello, > > I've found a strange behavior when looking into a bug for the pressure > convergence of a simple Navier-Stokes problem using PetscFE. > > I followed many examples for labeling boundary faces. I first use > DMPlexMarkBoundaryFaces, (label=1 to the faces). > I find those faces using DMGetStratumIS and searching 1 as it is the value > of the marked boundary faces. > Finally i use DMPlexLabelComplete over the new label. > I then use : > ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "in", "Faces", 0, > Ncomp, components, (void (*)(void)) uIn, NWest, west, NULL);CHKERRQ(ierr); > in order to impose a dirichlet condition for the faces labeled by the > correct value (west=1, south=3,...). > > However, the function "uIn()" is called in all the Elements containing the > boundary faces, and thus impose the values at nodes that are not in the > labeled faces. > Is it a normal behavior ? I then have to test the position of the node > calling uIn, in order to impose the good value. > I have this problem for a Poiseuille flow, where at 2 corner Elements i > have a zero velocity dirichlet condition (wall) and a In flow velocity one. > I believe I have fixed this in knepley/fix-plex-bc-multiple which should be merged soon. Do you know how to merge that branch and try? Thanks, Matt > The pressure is then very high at the corner nodes of those 2 Elements. > Do you think my pressure problem comes from there ? (The velocity field is > correct) > > Many thanks, > > Regards, > > Yann > > PS : i'm using those runtime options : > -vel_petscspace_order 2 -pres_petscspace_order 1 \ > -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type schur > -pc_fieldsplit_schur_fact_type full \ > -fieldsplit_velocity_pc_type lu -fieldsplit_pressure_ksp_rtol 1.0e-10 > -fieldsplit_pressure_pc_type jacobi > > > --- > L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le > logiciel antivirus Avast. > https://www.avast.com/antivirus > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From juan at tf.uni-kiel.de Thu Nov 23 02:29:49 2017 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Thu, 23 Nov 2017 09:29:49 +0100 Subject: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues In-Reply-To: <618bc0f6-020f-8d76-ee4e-6c65d81ef4c1@mcs.anl.gov> References: <618bc0f6-020f-8d76-ee4e-6c65d81ef4c1@mcs.anl.gov> Message-ID: <9a79ebed640b436bca86712451350b3d@tf.uni-kiel.de> On 2017-11-22 16:27, Emil Constantinescu wrote: > On 11/22/17 3:48 AM, Julian Andrej wrote: >> Hello, >> >> we prepared a small example which computes the gradient via the >> continuous adjoint method of a heating problem with a cost functional. >> >> We implemented the text book example and tested the gradient via a >> Taylor Remainder (which works fine). Now we wanted to solve the >> optimization problem with TAO and checked the gradient vs. the finite >> difference gradient and run into problems. >> >> Testing hand-coded gradient (hc) against finite difference gradient >> (fd), if the ratio ||fd - hc|| / ||hc|| is >> 0 (1.e-8), the hand-coded gradient is probably correct. >> Run with -tao_test_display to show difference >> between hand-coded and finite difference gradient. >> ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = >> (fd'hc)/||fd||||hc|| = 0.99768 >> 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = >> 0.00973464 >> max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference ||fd-hc|| >> = 0.00243363 >> ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = >> (fd'hc)/||fd||||hc|| = 0.997609 >> 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = >> 0.0253185 >> max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference ||fd-hc|| >> = 0.00624562 >> ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = >> (fd'hc)/||fd||||hc|| = 0.997338 >> 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| = >> 0.00585376 >> max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference ||fd-hc|| >> = 0.00137836 >> >> Despite these differences we achieve convergence with our hand coded >> gradient, but have to use -tao_ls_type unit. > > Both give similar (assume descent) directions, but seem to be scaled > differently. It could be a bad scaling by the mass matrix somewhere in > the continuous adjoint. This could be seen if you plot them side by > side as a quick diagnostic. > I visualized and attached the two gradients. The CADJ is hand coded and the DADJ is from pyadjoint which is the same as the finite difference gradient from TAO. If the attachement gets lost in the mailing list,, here is a direct link [1] [1] https://cloud.tf.uni-kiel.de/index.php/s/nmiNOoI213dx1L1 > Emil > >> $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_gatol >> 1e-7 -tao_ls_type unit >> iter =?? 0, Function value: 0.000316722,? Residual: 0.00126285 >> iter =?? 1, Function value: 3.82272e-05,? Residual: 0.000438094 >> iter =?? 2, Function value: 1.26011e-07,? Residual: 8.4194e-08 >> Tao Object: 1 MPI processes >> ? type: blmvm >> ????? Gradient steps: 0 >> ? TaoLineSearch Object: 1 MPI processes >> ??? type: unit >> ? Active Set subset type: subvec >> ? convergence tolerances: gatol=1e-07,?? steptol=0.,?? gttol=0. >> ? Residual in Function/Gradient:=8.4194e-08 >> ? Objective value=1.26011e-07 >> ? total number of iterations=2,????????????????????????? (max: 2000) >> ? total number of function/gradient evaluations=3,????? (max: 4000) >> ? Solution converged:??? ||g(X)|| <= gatol >> >> $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor >> -tao_fd_gradient >> iter =?? 0, Function value: 0.000316722,? Residual: 4.87343e-06 >> iter =?? 1, Function value: 0.000195676,? Residual: 3.83011e-06 >> iter =?? 2, Function value: 1.26394e-07,? Residual: 1.60262e-09 >> Tao Object: 1 MPI processes >> ? type: blmvm >> ????? Gradient steps: 0 >> ? TaoLineSearch Object: 1 MPI processes >> ??? type: more-thuente >> ? Active Set subset type: subvec >> ? convergence tolerances: gatol=1e-08,?? steptol=0.,?? gttol=0. >> ? Residual in Function/Gradient:=1.60262e-09 >> ? Objective value=1.26394e-07 >> ? total number of iterations=2,????????????????????????? (max: 2000) >> ? total number of function/gradient evaluations=3474,????? (max: >> 4000) >> ? Solution converged:??? ||g(X)|| <= gatol >> >> >> We think, that the finite difference gradient should be in line with >> our hand coded gradient for such a simple example. >> >> We appreciate any hints on debugging this issue. It is implemented in >> python (firedrake) and i can provide the code if this is needed. >> >> Regards >> Julian -------------- next part -------------- A non-text attachment was scrubbed... Name: gradients.png Type: image/png Size: 121947 bytes Desc: not available URL: From yann.jobic at univ-amu.fr Thu Nov 23 02:39:12 2017 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Thu, 23 Nov 2017 09:39:12 +0100 Subject: [petsc-users] 2 Dirichlet conditions for one Element in PetscFE In-Reply-To: References: Message-ID: Hello, I checked out? the branch knepley/fix-plex-bc-multiple, but i now have a strange problem. I splited my labels as in ex69.c of snes. It may be the right way to do it. In petsc 3.8.2, i have the same behavior as before, the element containing the face is called by PetscDSAddBoundary. In the git version, PetscDSAddBoundary does not call my boundary function at all. The call : ? ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "wallL", "markerLeft",?? 0, Ncomp, components, (void (*)(void)) uIn,? 1, &id, ctx);CHKERRQ(ierr); I checked that my labels are correct : ? markerLeft: 1 strata with value/size (1 (11)) ? markerTop: 1 strata with value/size (1 (11)) ? markerRight: 1 strata with value/size (1 (11)) ? markerBottom: 1 strata with value/size (1 (11)) What am i doing wrong ? Thanks, Regards, Yann Le 22/11/2017 ? 18:51, Matthew Knepley a ?crit?: > On Wed, Nov 22, 2017 at 12:39 PM, Yann Jobic > wrote: > > Hello, > > I've found a strange behavior when looking into a bug for the > pressure convergence of a simple Navier-Stokes problem using PetscFE. > > I followed many examples for labeling boundary faces. I first use > DMPlexMarkBoundaryFaces, (label=1 to the faces). > I find those faces using DMGetStratumIS and searching 1 as it is > the value of the marked boundary faces. > Finally i use DMPlexLabelComplete over the new label. > I then use : > ? ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "in", "Faces", > 0, Ncomp, components, (void (*)(void)) uIn, NWest, west, > NULL);CHKERRQ(ierr); > in order to impose a dirichlet condition for the faces labeled by > the correct value (west=1, south=3,...). > > However, the function "uIn()" is called in all the Elements > containing the boundary faces, and thus impose the values at nodes > that are not in the labeled faces. > Is it a normal behavior ? I then have to test the position of the > node calling uIn, in order to impose the good value. > I have this problem for a Poiseuille flow, where at 2 corner > Elements i have a zero velocity dirichlet condition (wall) and a > In flow velocity one. > > > I believe I have fixed this in knepley/fix-plex-bc-multiple which > should be merged soon. Do you know how to merge that branch and try? > > ? Thanks, > > ? ? ?Matt > > The pressure is then very high at the corner nodes of those 2 > Elements. > Do you think my pressure problem comes from there ? (The velocity > field is correct) > > Many thanks, > > Regards, > > Yann > > PS : i'm using those runtime options : > -vel_petscspace_order 2 -pres_petscspace_order 1 \ > -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type schur > -pc_fieldsplit_schur_fact_type full? \ > -fieldsplit_velocity_pc_type lu -fieldsplit_pressure_ksp_rtol > 1.0e-10 -fieldsplit_pressure_pc_type jacobi > > > --- > L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e > par le logiciel antivirus Avast. > https://www.avast.com/antivirus > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From juan at tf.uni-kiel.de Thu Nov 23 04:16:35 2017 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Thu, 23 Nov 2017 11:16:35 +0100 Subject: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues In-Reply-To: <9a79ebed640b436bca86712451350b3d@tf.uni-kiel.de> References: <618bc0f6-020f-8d76-ee4e-6c65d81ef4c1@mcs.anl.gov> <9a79ebed640b436bca86712451350b3d@tf.uni-kiel.de> Message-ID: It was indeed a mass scaling issue. We have to project the CADJ derived gradient to the corresponding FE space again. Testing hand-coded gradient (hc) against finite difference gradient (fd), if the ratio ||fd - hc|| / ||hc|| is 0 (1.e-8), the hand-coded gradient is probably correct. Run with -tao_test_display to show difference between hand-coded and finite difference gradient. ||fd|| 0.000150841, ||hc|| = 0.000150841, angle cosine = (fd'hc)/||fd||||hc|| = 1. 2-norm ||fd-hc||/max(||hc||,||fd||) = 4.48554e-06, difference ||fd-hc|| = 6.76604e-10 max-norm ||fd-hc||/max(||hc||,||fd||) = 4.99792e-06, difference ||fd-hc|| = 1.88044e-10 ||fd|| 0.000386312, ||hc|| = 0.000386312, angle cosine = (fd'hc)/||fd||||hc|| = 1. 2-norm ||fd-hc||/max(||hc||,||fd||) = 1.14682e-05, difference ||fd-hc|| = 4.4303e-09 max-norm ||fd-hc||/max(||hc||,||fd||) = 1.56645e-05, difference ||fd-hc|| = 1.49275e-09 ||fd|| 8.46797e-05, ||hc|| = 8.46797e-05, angle cosine = (fd'hc)/||fd||||hc|| = 1. 2-norm ||fd-hc||/max(||hc||,||fd||) = 2.63488e-06, difference ||fd-hc|| = 2.2312e-10 max-norm ||fd-hc||/max(||hc||,||fd||) = 2.7873e-06, difference ||fd-hc|| = 5.58718e-11 Thank you all for the quick responses and input again! On 2017-11-23 09:29, Julian Andrej wrote: > On 2017-11-22 16:27, Emil Constantinescu wrote: >> On 11/22/17 3:48 AM, Julian Andrej wrote: >>> Hello, >>> >>> we prepared a small example which computes the gradient via the >>> continuous adjoint method of a heating problem with a cost >>> functional. >>> >>> We implemented the text book example and tested the gradient via a >>> Taylor Remainder (which works fine). Now we wanted to solve the >>> optimization problem with TAO and checked the gradient vs. the finite >>> difference gradient and run into problems. >>> >>> Testing hand-coded gradient (hc) against finite difference gradient >>> (fd), if the ratio ||fd - hc|| / ||hc|| is >>> 0 (1.e-8), the hand-coded gradient is probably correct. >>> Run with -tao_test_display to show difference >>> between hand-coded and finite difference gradient. >>> ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = >>> (fd'hc)/||fd||||hc|| = 0.99768 >>> 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| >>> = 0.00973464 >>> max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference >>> ||fd-hc|| = 0.00243363 >>> ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = >>> (fd'hc)/||fd||||hc|| = 0.997609 >>> 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| >>> = 0.0253185 >>> max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference >>> ||fd-hc|| = 0.00624562 >>> ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = >>> (fd'hc)/||fd||||hc|| = 0.997338 >>> 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| >>> = 0.00585376 >>> max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference >>> ||fd-hc|| = 0.00137836 >>> >>> Despite these differences we achieve convergence with our hand coded >>> gradient, but have to use -tao_ls_type unit. >> >> Both give similar (assume descent) directions, but seem to be scaled >> differently. It could be a bad scaling by the mass matrix somewhere in >> the continuous adjoint. This could be seen if you plot them side by >> side as a quick diagnostic. >> > > I visualized and attached the two gradients. The CADJ is hand coded and > the DADJ is from pyadjoint which is the same as the finite difference > gradient from TAO. > > If the attachement gets lost in the mailing list,, here is a direct > link [1] > > [1] https://cloud.tf.uni-kiel.de/index.php/s/nmiNOoI213dx1L1 > >> Emil >> >>> $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor >>> -tao_gatol 1e-7 -tao_ls_type unit >>> iter =?? 0, Function value: 0.000316722,? Residual: 0.00126285 >>> iter =?? 1, Function value: 3.82272e-05,? Residual: 0.000438094 >>> iter =?? 2, Function value: 1.26011e-07,? Residual: 8.4194e-08 >>> Tao Object: 1 MPI processes >>> ? type: blmvm >>> ????? Gradient steps: 0 >>> ? TaoLineSearch Object: 1 MPI processes >>> ??? type: unit >>> ? Active Set subset type: subvec >>> ? convergence tolerances: gatol=1e-07,?? steptol=0.,?? gttol=0. >>> ? Residual in Function/Gradient:=8.4194e-08 >>> ? Objective value=1.26011e-07 >>> ? total number of iterations=2,????????????????????????? (max: 2000) >>> ? total number of function/gradient evaluations=3,????? (max: 4000) >>> ? Solution converged:??? ||g(X)|| <= gatol >>> >>> $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor >>> -tao_fd_gradient >>> iter =?? 0, Function value: 0.000316722,? Residual: 4.87343e-06 >>> iter =?? 1, Function value: 0.000195676,? Residual: 3.83011e-06 >>> iter =?? 2, Function value: 1.26394e-07,? Residual: 1.60262e-09 >>> Tao Object: 1 MPI processes >>> ? type: blmvm >>> ????? Gradient steps: 0 >>> ? TaoLineSearch Object: 1 MPI processes >>> ??? type: more-thuente >>> ? Active Set subset type: subvec >>> ? convergence tolerances: gatol=1e-08,?? steptol=0.,?? gttol=0. >>> ? Residual in Function/Gradient:=1.60262e-09 >>> ? Objective value=1.26394e-07 >>> ? total number of iterations=2,????????????????????????? (max: 2000) >>> ? total number of function/gradient evaluations=3474,????? (max: >>> 4000) >>> ? Solution converged:??? ||g(X)|| <= gatol >>> >>> >>> We think, that the finite difference gradient should be in line with >>> our hand coded gradient for such a simple example. >>> >>> We appreciate any hints on debugging this issue. It is implemented in >>> python (firedrake) and i can provide the code if this is needed. >>> >>> Regards >>> Julian From knepley at gmail.com Thu Nov 23 06:45:58 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 23 Nov 2017 07:45:58 -0500 Subject: [petsc-users] 2 Dirichlet conditions for one Element in PetscFE In-Reply-To: References: Message-ID: On Thu, Nov 23, 2017 at 3:39 AM, Yann Jobic wrote: > Hello, > > I checked out the branch knepley/fix-plex-bc-multiple, but i now have a > strange problem. > I splited my labels as in ex69.c of snes. It may be the right way to do it. > In petsc 3.8.2, i have the same behavior as before, the element containing > the face is called by PetscDSAddBoundary. > In the git version, PetscDSAddBoundary does not call my boundary function > at all. > The call : > ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "wallL", > "markerLeft", 0, Ncomp, components, (void (*)(void)) uIn, 1, &id, > ctx);CHKERRQ(ierr); > > I checked that my labels are correct : > markerLeft: 1 strata with value/size (1 (11)) > markerTop: 1 strata with value/size (1 (11)) > markerRight: 1 strata with value/size (1 (11)) > markerBottom: 1 strata with value/size (1 (11)) > > What am i doing wrong ? > So you call AddBoundary() and then the boundary values are never inserted? The call looks correct. Can you send me an example to check? Obviously this works for my simple examples in the repo. I can't see by looking what might be wrong for you. Thanks, Matt > Thanks, > > Regards, > > Yann > > Le 22/11/2017 ? 18:51, Matthew Knepley a ?crit : > > On Wed, Nov 22, 2017 at 12:39 PM, Yann Jobic > wrote: > >> Hello, >> >> I've found a strange behavior when looking into a bug for the pressure >> convergence of a simple Navier-Stokes problem using PetscFE. >> >> I followed many examples for labeling boundary faces. I first use >> DMPlexMarkBoundaryFaces, (label=1 to the faces). >> I find those faces using DMGetStratumIS and searching 1 as it is the >> value of the marked boundary faces. >> Finally i use DMPlexLabelComplete over the new label. >> I then use : >> ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "in", "Faces", 0, >> Ncomp, components, (void (*)(void)) uIn, NWest, west, NULL);CHKERRQ(ierr); >> in order to impose a dirichlet condition for the faces labeled by the >> correct value (west=1, south=3,...). >> >> However, the function "uIn()" is called in all the Elements containing >> the boundary faces, and thus impose the values at nodes that are not in the >> labeled faces. >> Is it a normal behavior ? I then have to test the position of the node >> calling uIn, in order to impose the good value. >> I have this problem for a Poiseuille flow, where at 2 corner Elements i >> have a zero velocity dirichlet condition (wall) and a In flow velocity one. >> > > I believe I have fixed this in knepley/fix-plex-bc-multiple which should > be merged soon. Do you know how to merge that branch and try? > > Thanks, > > Matt > > >> The pressure is then very high at the corner nodes of those 2 Elements. >> Do you think my pressure problem comes from there ? (The velocity field >> is correct) >> >> Many thanks, >> >> Regards, >> >> Yann >> >> PS : i'm using those runtime options : >> -vel_petscspace_order 2 -pres_petscspace_order 1 \ >> -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type schur >> -pc_fieldsplit_schur_fact_type full \ >> -fieldsplit_velocity_pc_type lu -fieldsplit_pressure_ksp_rtol 1.0e-10 >> -fieldsplit_pressure_pc_type jacobi >> >> >> --- >> L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le >> logiciel antivirus Avast. >> https://www.avast.com/antivirus >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yann.jobic at univ-amu.fr Thu Nov 23 07:09:52 2017 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Thu, 23 Nov 2017 14:09:52 +0100 Subject: [petsc-users] 2 Dirichlet conditions for one Element in PetscFE In-Reply-To: References: Message-ID: <3ffc9268-4dd1-5a97-8d8f-cdefd882d6b0@univ-amu.fr> Hello, Le 23/11/2017 ? 13:45, Matthew Knepley a ?crit?: > On Thu, Nov 23, 2017 at 3:39 AM, Yann Jobic > wrote: > > Hello, > > I checked out? the branch knepley/fix-plex-bc-multiple, but i now > have a strange problem. > I splited my labels as in ex69.c of snes. It may be the right way > to do it. > In petsc 3.8.2, i have the same behavior as before, the element > containing the face is called by PetscDSAddBoundary. > In the git version, PetscDSAddBoundary does not call my boundary > function at all. > The call : > ? ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "wallL", > "markerLeft",?? 0, Ncomp, components, (void (*)(void)) uIn,? 1, > &id, ctx);CHKERRQ(ierr); > > I checked that my labels are correct : > ? markerLeft: 1 strata with value/size (1 (11)) > ? markerTop: 1 strata with value/size (1 (11)) > ? markerRight: 1 strata with value/size (1 (11)) > ? markerBottom: 1 strata with value/size (1 (11)) > > What am i doing wrong ? > > > So you call AddBoundary() and then the boundary values are never > inserted? Yes exactly. > The call looks correct. Can you send me an example to check? My code is simple. It's ex46.c from ts, but i removed the temporal contributions in order to debug, and add the boundary. Thanks a lot for the help ! Yann > Obviously this works for my simple examples in the repo. I can't see > by looking what might be wrong for you. > > ? Thanks, > > ? ? Matt > > Thanks, > > Regards, > > Yann > > Le 22/11/2017 ? 18:51, Matthew Knepley a ?crit?: >> On Wed, Nov 22, 2017 at 12:39 PM, Yann Jobic >> > wrote: >> >> Hello, >> >> I've found a strange behavior when looking into a bug for the >> pressure convergence of a simple Navier-Stokes problem using >> PetscFE. >> >> I followed many examples for labeling boundary faces. I first >> use DMPlexMarkBoundaryFaces, (label=1 to the faces). >> I find those faces using DMGetStratumIS and searching 1 as it >> is the value of the marked boundary faces. >> Finally i use DMPlexLabelComplete over the new label. >> I then use : >> ? ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "in", >> "Faces", 0, Ncomp, components, (void (*)(void)) uIn, NWest, >> west, NULL);CHKERRQ(ierr); >> in order to impose a dirichlet condition for the faces >> labeled by the correct value (west=1, south=3,...). >> >> However, the function "uIn()" is called in all the Elements >> containing the boundary faces, and thus impose the values at >> nodes that are not in the labeled faces. >> Is it a normal behavior ? I then have to test the position of >> the node calling uIn, in order to impose the good value. >> I have this problem for a Poiseuille flow, where at 2 corner >> Elements i have a zero velocity dirichlet condition (wall) >> and a In flow velocity one. >> >> >> I believe I have fixed this in knepley/fix-plex-bc-multiple which >> should be merged soon. Do you know how to merge that branch and try? >> >> ? Thanks, >> >> ? ? ?Matt >> >> The pressure is then very high at the corner nodes of those 2 >> Elements. >> Do you think my pressure problem comes from there ? (The >> velocity field is correct) >> >> Many thanks, >> >> Regards, >> >> Yann >> >> PS : i'm using those runtime options : >> -vel_petscspace_order 2 -pres_petscspace_order 1 \ >> -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type >> schur -pc_fieldsplit_schur_fact_type full? \ >> -fieldsplit_velocity_pc_type lu -fieldsplit_pressure_ksp_rtol >> 1.0e-10 -fieldsplit_pressure_pc_type jacobi >> >> >> --- >> L'absence de virus dans ce courrier ?lectronique a ?t? >> v?rifi?e par le logiciel antivirus Avast. >> https://www.avast.com/antivirus >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -- ___________________________ Yann JOBIC HPC engineer IUSTI-CNRS UMR 7343 - Polytech Marseille Technop?le de Ch?teau Gombert 5 rue Enrico Fermi 13453 Marseille cedex 13 Tel : (33) 4 91 10 69 43 Fax : (33) 4 91 10 69 69 --- L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le logiciel antivirus Avast. https://www.avast.com/antivirus -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- static char help[] = "Navier-Stokes problem in 2d and 3d with finite elements.\n\ We solve the Navier-Stokes in a rectangular\n\ domain, using a parallel unstructured mesh (DMPLEX) to discretize it.\n\ This example supports discretized auxiliary fields (Re) as well as\n\ multilevel nonlinear solvers.\n\n\n"; #include #include #include #include #include /* Navier-Stokes equation: u . grad u - \Delta u - grad p = f div u = 0 */ typedef struct { PetscInt dim; PetscInt cells[2]; char filename[2048]; /* The optional mesh file */ PetscErrorCode (**init_zero)(PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nf, PetscScalar *u, void *ctx); } AppCtx; #define REYN 1.0 PetscErrorCode init_u(PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nf, PetscScalar *u, void *ctx) { u[0] = 0.; u[1] = 0.; return 0; } PetscErrorCode init_p(PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nf, PetscScalar *p, void *ctx) { p[0] = 1.; return 0; } PetscErrorCode zero_vector(PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nf, PetscScalar *u, void *ctx) { PetscInt d; for (d = 0; d < dim; ++d) u[d] = 0.0; return 0; } PetscErrorCode constant_p(PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nf, PetscScalar *p, void *ctx) { *p = 1.0; return 0; } static void f0_bd_u_2d(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, const PetscReal x[], const PetscReal n[], PetscInt numConstants, const PetscScalar constants[], PetscScalar f0[]) { f0[0] = 1; f0[1] = 0; } static void f1_bd_u(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, const PetscReal x[], const PetscReal n[], PetscInt numConstants, const PetscScalar constants[], PetscScalar f1[]) { const PetscInt Ncomp = dim; PetscInt comp, d; for (comp = 0; comp < Ncomp; ++comp) { for (d = 0; d < dim; ++d) { f1[comp*dim+d] = 0.0; } } } static void f0_u(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar f0[]) { const PetscInt Ncomp = dim; PetscInt c, d; for (c = 0; c < Ncomp; ++c) { for (d = 0; d < dim; ++d) { f0[c] += u[d] * u_x[c*dim+d]; } } } static void f1_u(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar f1[]) { const PetscReal Re = REYN; const PetscInt Ncomp = dim; PetscInt comp, d; for (comp = 0; comp < Ncomp; ++comp) { for (d = 0; d < dim; ++d) { f1[comp*dim+d] = 1.0/Re * u_x[comp*dim+d]; } f1[comp*dim+comp] -= u[Ncomp]; } } static void f0_p(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar f0[]) { PetscInt d; for (d = 0, f0[0] = 0.0; d < dim; ++d) f0[0] += u_x[d*dim+d]; } static void f1_p(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar f1[]) { PetscInt d; for (d = 0; d < dim; ++d) f1[d] = 0.0; } /* (psi_i, u_j grad_j u_i) ==> (\psi_i, \phi_j grad_j u_i) */ static void g0_uu(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, PetscReal u_tShift, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar g0[]) { PetscInt NcI = dim, NcJ = dim; PetscInt fc, gc; PetscInt d; for (d = 0; d < dim; ++d) { g0[d*dim+d] = 0; } for (fc = 0; fc < NcI; ++fc) { for (gc = 0; gc < NcJ; ++gc) { g0[fc*NcJ+gc] += u_x[fc*NcJ+gc]; } } } /* (psi_i, u_j grad_j u_i) ==> (\psi_i, \u_j grad_j \phi_i) */ static void g1_uu(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, PetscReal u_tShift, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar g1[]) { PetscInt NcI = dim; PetscInt NcJ = dim; PetscInt fc, gc, dg; for (fc = 0; fc < NcI; ++fc) { for (gc = 0; gc < NcJ; ++gc) { for (dg = 0; dg < dim; ++dg) { /* kronecker delta */ if (fc == gc) { g1[(fc*NcJ+gc)*dim+dg] += u[dg]; } } } } } /* < q, \nabla\cdot u > NcompI = 1, NcompJ = dim */ static void g1_pu(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, PetscReal u_tShift, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar g1[]) { PetscInt d; for (d = 0; d < dim; ++d) g1[d*dim+d] = 1.0; /* \frac{\partial\phi^{u_d}}{\partial x_d} */ } /* -< \nabla\cdot v, p > NcompI = dim, NcompJ = 1 */ static void g2_up(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, PetscReal u_tShift, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar g2[]) { PetscInt d; for (d = 0; d < dim; ++d) g2[d*dim+d] = -1.0; /* \frac{\partial\psi^{u_d}}{\partial x_d} */ } /* < \nabla v, \nabla u + {\nabla u}^T > This just gives \nabla u, give the perdiagonal for the transpose */ static void g3_uu(PetscInt dim, PetscInt Nf, PetscInt NfAux, const PetscInt uOff[], const PetscInt uOff_x[], const PetscScalar u[], const PetscScalar u_t[], const PetscScalar u_x[], const PetscInt aOff[], const PetscInt aOff_x[], const PetscScalar a[], const PetscScalar a_t[], const PetscScalar a_x[], PetscReal t, PetscReal u_tShift, const PetscReal x[], PetscInt numConstants, const PetscScalar constants[], PetscScalar g3[]) { const PetscReal Re = REYN; const PetscInt Ncomp = dim; PetscInt compI, d; for (compI = 0; compI < Ncomp; ++compI) { for (d = 0; d < dim; ++d) { g3[((compI*Ncomp+compI)*dim+d)*dim+d] = 1.0/Re; } } } static PetscErrorCode ProcessOptions(MPI_Comm comm, AppCtx *options) { PetscInt n; PetscBool flg; PetscErrorCode ierr; PetscFunctionBeginUser; options->dim = 2; options->cells[0] = 2; options->cells[1] = 2; options->filename[0] = '\0'; ierr = PetscOptionsBegin(comm, "", "Navier-Stokes Equation Options", "DMPLEX");CHKERRQ(ierr); n = 2; ierr = PetscOptionsIntArray("-cells", "The initial mesh division", "ex.c", options->cells, &n, NULL);CHKERRQ(ierr); ierr = PetscOptionsString("-f", "Mesh filename to read", "ex.c", options->filename, options->filename, sizeof(options->filename), &flg);CHKERRQ(ierr); ierr = PetscOptionsInt("-dim", "The topological mesh dimension", "ex46.c", options->dim, &options->dim, NULL);CHKERRQ(ierr); ierr = PetscOptionsEnd(); PetscFunctionReturn(0); } /* pris sur ex56.c de snes */ /* "boundary" == 1 => WEST */ /* "boundary" == 2 => EST */ /* "boundary" == 3 => SUD */ /* "boundary" == 4 => NORD */ static PetscErrorCode MarkBoundaryFaces(DM dm) { DMLabel label,label2; PetscInt dim; IS is; PetscErrorCode ierr; PetscFunctionBeginUser; ierr = DMGetDimension(dm, &dim);CHKERRQ(ierr); ierr = DMCreateLabel(dm, "boundary");CHKERRQ(ierr); ierr = DMGetLabel(dm, "boundary", &label);CHKERRQ(ierr); ierr = DMPlexMarkBoundaryFaces(dm, label);CHKERRQ(ierr); ierr = DMGetStratumIS(dm, "boundary", 1, &is);CHKERRQ(ierr); ierr = DMCreateLabel(dm,"Faces");CHKERRQ(ierr); if (is) { PetscInt d, f, Nf; const PetscInt *faces; PetscInt csize; PetscSection cs; Vec coordinates ; DM cdm; ISGetLocalSize(is, &Nf); ISGetIndices(is, &faces); DMGetCoordinatesLocal(dm, &coordinates); DMGetCoordinateDM(dm, &cdm); DMGetDefaultSection(cdm, &cs); for (f = 0; f < Nf; ++f) { PetscReal faceCoord; PetscInt b,v; PetscScalar *coords = NULL; PetscInt Nv; DMPlexVecGetClosure(cdm, cs, coordinates, faces[f], &csize, &coords); Nv = csize/dim; for (d = 0; d < dim; ++d) { faceCoord = 0.0; for (v = 0; v < Nv; ++v) faceCoord += PetscRealPart(coords[v*dim+d]); faceCoord /= Nv; for (b = 0; b < 2; ++b) { if (PetscAbs(faceCoord - b*1.0) < PETSC_SMALL) { DMSetLabelValue(dm, "Faces", faces[f], d*2+b+1); PetscPrintf(PETSC_COMM_WORLD, "face %d facecoords : %g label : %d\n", f,(double)faceCoord,d*2+b+1); } } } DMPlexVecRestoreClosure(cdm, cs, coordinates, faces[f], &csize, &coords); } ISRestoreIndices(is, &faces); } ISDestroy(&is); DMGetLabel(dm, "Faces", &label2); DMPlexLabelComplete(dm, label2); /* Make split labels so that we can have corners in multiple labels */ { const char *names[4] = {"markerBottom", "markerRight", "markerTop", "markerLeft"}; PetscInt ids[4] = {3, 2, 4, 1}, Nf; DMLabel label; IS is; PetscInt f; for (f = 0; f < 4; ++f) { ierr = DMGetStratumIS(dm, "Faces", ids[f], &is);CHKERRQ(ierr); if (!is) continue; ierr = ISGetLocalSize(is, &Nf);CHKERRQ(ierr); ierr = DMCreateLabel(dm, names[f]);CHKERRQ(ierr); ierr = DMGetLabel(dm, names[f], &label);CHKERRQ(ierr); if (is) { ierr = PetscPrintf(PETSC_COMM_WORLD, "Nombre de face %s : %d ou l'on insere 1\n",names[f],Nf); ierr = DMLabelInsertIS(label, is, 1);CHKERRQ(ierr); } ierr = ISDestroy(&is);CHKERRQ(ierr); } } PetscFunctionReturn(0); } static PetscErrorCode CreateMesh(MPI_Comm comm, DM *dm, AppCtx *ctx) { DM pdm = NULL; const PetscInt dim = ctx->dim; const char *filename = ctx->filename; PetscBool hasLabel; size_t len; PetscErrorCode ierr; PetscFunctionBeginUser; ierr = PetscStrlen(filename, &len);CHKERRQ(ierr); if (!len) { /* ierr = DMPlexCreateHexBoxMesh(comm, dim, ctx->cells, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, dm);CHKERRQ(ierr); */ ierr = DMPlexCreateBoxMesh (comm, dim, 0, ctx->cells, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, 1, dm);CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject) *dm, "Mesh");CHKERRQ(ierr); } else { ierr = DMPlexCreateFromFile(comm, filename, 1, dm);CHKERRQ(ierr); ierr = DMPlexSetRefinementUniform(*dm, PETSC_FALSE);CHKERRQ(ierr); } ierr = PetscObjectSetName((PetscObject) *dm, "Mesh");CHKERRQ(ierr); /* Distribute mesh over processes */ ierr = DMPlexDistribute(*dm, 0, NULL, &pdm);CHKERRQ(ierr); if (pdm) { ierr = DMDestroy(dm);CHKERRQ(ierr); *dm = pdm; } ierr = DMSetFromOptions(*dm);CHKERRQ(ierr); /* If no boundary marker exists, mark the whole boundary, after refine of dm_refine from line options */ ierr = DMHasLabel(*dm, "boundary", &hasLabel);CHKERRQ(ierr); if (!hasLabel) {ierr = MarkBoundaryFaces(*dm);CHKERRQ(ierr);} ierr = DMViewFromOptions(*dm, NULL, "-dm_view");CHKERRQ(ierr); PetscFunctionReturn(0); } PetscErrorCode zero(PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nf, PetscScalar *u, void *ctx) { PetscPrintf(PETSC_COMM_WORLD, "UIN : coord : [%g,%g]\n",x[0],x[1]); u[0] = 0.; u[1] = 0.; return 0; } PetscErrorCode uIn(PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nf, PetscScalar *u, void *ctx) { PetscPrintf(PETSC_COMM_WORLD, "UIN : coord : [%g,%g]\n",x[0],x[1]); u[0] = 0.005; u[1] = 0.; return 0; } static PetscErrorCode SetupProblem(PetscDS prob, AppCtx *ctx) { const PetscInt id = 1; const PetscInt Ncomp = 2; /* dim = 2 */ const PetscInt components[] = {0,1,2}; PetscErrorCode ierr; PetscFunctionBeginUser; ierr = PetscDSSetResidual(prob, 0, f0_u, f1_u);CHKERRQ(ierr); ierr = PetscDSSetResidual(prob, 1, f0_p, f1_p);CHKERRQ(ierr); ierr = PetscDSSetJacobian(prob, 0, 0, g0_uu, g1_uu, NULL, g3_uu);CHKERRQ(ierr); ierr = PetscDSSetJacobian(prob, 0, 1, NULL, NULL, g2_up, NULL);CHKERRQ(ierr); ierr = PetscDSSetJacobian(prob, 1, 0, NULL, g1_pu, NULL, NULL);CHKERRQ(ierr); /* Neumann */ /*ierr = PetscDSSetBdResidual(prob, 0, f0_bd_u_2d, f1_bd_u);CHKERRQ(ierr);*/ ctx->init_zero[0] = init_u; ctx->init_zero[1] = init_p; ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "wallB", "markerBottom", 0, Ncomp, components, (void (*)(void)) zero, 1, &id, ctx);CHKERRQ(ierr); ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "wallT", "markerTop", 0, Ncomp, components, (void (*)(void)) zero, 1, &id, ctx);CHKERRQ(ierr); ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "wallL", "markerLeft", 0, Ncomp, components, (void (*)(void)) uIn, 1, &id, ctx);CHKERRQ(ierr); /*ierr = PetscDSAddBoundary(prob, DM_BC_NATURAL, "out", "Faces", 0, Ncomp, components, NULL, NEst, est, NULL);CHKERRQ(ierr);*/ PetscFunctionReturn(0); } static PetscErrorCode SetupDiscretization(DM dm, MatNullSpace *nullSpace, AppCtx *ctx) { DM cdm = dm; const PetscInt dim = ctx->dim; PetscDS prob; PetscFE fe[2]; PetscQuadrature q; PetscObject pressure; PetscErrorCode ierr; PetscFunctionBeginUser; /* Create finite element */ ierr = PetscFECreateDefault(dm, dim, dim, 0, "vel_", PETSC_DEFAULT, &fe[0]);CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject) fe[0], "velocity");CHKERRQ(ierr); ierr = PetscFEGetQuadrature(fe[0], &q);CHKERRQ(ierr); ierr = PetscFECreateDefault(dm, dim, 1, 0, "pres_", PETSC_DEFAULT, &fe[1]);CHKERRQ(ierr); ierr = PetscFESetQuadrature(fe[1], q);CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject) fe[1], "pressure");CHKERRQ(ierr); /* Set discretization and boundary conditions for each mesh */ ierr = DMGetDS(dm, &prob);CHKERRQ(ierr); ierr = PetscDSSetDiscretization(prob, 0, (PetscObject) fe[0]);CHKERRQ(ierr); ierr = PetscDSSetDiscretization(prob, 1, (PetscObject) fe[1]);CHKERRQ(ierr); ierr = SetupProblem(prob, ctx);CHKERRQ(ierr); ierr = DMGetField(cdm, 1, &pressure);CHKERRQ(ierr); ierr = MatNullSpaceCreate(PetscObjectComm(pressure), PETSC_TRUE, 0, NULL, nullSpace);CHKERRQ(ierr); ierr = PetscObjectCompose(pressure, "nullspace", (PetscObject) *nullSpace);CHKERRQ(ierr); ierr = PetscFEDestroy(&fe[0]);CHKERRQ(ierr); ierr = PetscFEDestroy(&fe[1]);CHKERRQ(ierr); PetscFunctionReturn(0); } PetscErrorCode CreatePressureNullSpace(DM dm, AppCtx *user, Vec *v, MatNullSpace *nullSpace) { Vec vec; PetscErrorCode (*funcs[2])(PetscInt dim, PetscReal time, const PetscReal x[], PetscInt Nf, PetscScalar *u, void* ctx) = {zero_vector, constant_p}; PetscErrorCode ierr; PetscFunctionBeginUser; ierr = DMGetGlobalVector(dm, &vec);CHKERRQ(ierr); ierr = DMProjectFunction(dm, 0.0, funcs, NULL, INSERT_ALL_VALUES, vec);CHKERRQ(ierr); ierr = VecNormalize(vec, NULL);CHKERRQ(ierr); ierr = MatNullSpaceCreate(PetscObjectComm((PetscObject)dm), PETSC_FALSE, 1, &vec, nullSpace);CHKERRQ(ierr); if (v) { ierr = DMCreateGlobalVector(dm, v);CHKERRQ(ierr); ierr = VecCopy(vec, *v);CHKERRQ(ierr); } ierr = DMRestoreGlobalVector(dm, &vec);CHKERRQ(ierr); /* New style for field null spaces */ { PetscObject pressure; MatNullSpace nullSpacePres; ierr = DMGetField(dm, 1, &pressure);CHKERRQ(ierr); ierr = MatNullSpaceCreate(PetscObjectComm(pressure), PETSC_TRUE, 0, NULL, &nullSpacePres);CHKERRQ(ierr); ierr = PetscObjectCompose(pressure, "nullspace", (PetscObject) nullSpacePres);CHKERRQ(ierr); ierr = MatNullSpaceDestroy(&nullSpacePres);CHKERRQ(ierr); } PetscFunctionReturn(0); } int main(int argc, char **argv) { AppCtx ctx; DM dm; SNES snes; Mat A,J; PetscInt its; Vec u, r; MatNullSpace nullSpace; /* necessary for pressure */ PetscErrorCode ierr; ierr = PetscInitialize(&argc, &argv, NULL, help);CHKERRQ(ierr); ierr = ProcessOptions(PETSC_COMM_WORLD, &ctx);CHKERRQ(ierr); ierr = CreateMesh(PETSC_COMM_WORLD, &dm, &ctx);CHKERRQ(ierr); ierr = DMSetApplicationContext(dm, &ctx);CHKERRQ(ierr); ierr = PetscMalloc1(2, &ctx.init_zero);CHKERRQ(ierr); ierr = SetupDiscretization(dm, &nullSpace, &ctx);CHKERRQ(ierr); ierr = DMPlexCreateClosureIndex(dm, NULL);CHKERRQ(ierr); ierr = DMCreateGlobalVector(dm, &u);CHKERRQ(ierr); ierr = VecDuplicate(u, &r);CHKERRQ(ierr); ierr = SNESCreate(PETSC_COMM_WORLD, &snes);CHKERRQ(ierr); ierr = SNESSetDM(snes, dm);CHKERRQ(ierr); ierr = DMSetApplicationContext(dm, &ctx);CHKERRQ(ierr); ierr = DMCreateMatrix(dm, &J);CHKERRQ(ierr); A = J; /* ierr = CreatePressureNullSpace(dm, &ctx, NULL, &nullSpace);CHKERRQ(ierr);*/ /* ierr = MatSetNullSpace(A, nullSpace);CHKERRQ(ierr);*/ ierr = DMPlexSetSNESLocalFEM(dm,&ctx,&ctx,&ctx);CHKERRQ(ierr); ierr = SNESSetJacobian(snes, A, J, NULL, NULL);CHKERRQ(ierr); ierr = SNESSetFromOptions(snes);CHKERRQ(ierr); ierr = DMProjectFunction(dm, 0.0, ctx.init_zero, NULL, INSERT_ALL_VALUES, u);CHKERRQ(ierr); ierr = VecViewFromOptions(u, NULL, "-sol_init");CHKERRQ(ierr); ierr = SNESSolve(snes, NULL, u);CHKERRQ(ierr); ierr = SNESGetIterationNumber(snes, &its);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "Number of SNES iterations = %D\n", its);CHKERRQ(ierr); ierr = VecViewFromOptions(u, NULL, "-sol_vec_view");CHKERRQ(ierr); ierr = VecDestroy(&u);CHKERRQ(ierr); ierr = VecDestroy(&r);CHKERRQ(ierr); if (A != J) {ierr = MatDestroy(&A);CHKERRQ(ierr);} ierr = MatDestroy(&J);CHKERRQ(ierr); ierr = VecDestroy(&u);CHKERRQ(ierr); ierr = VecDestroy(&r);CHKERRQ(ierr); ierr = SNESDestroy(&snes);CHKERRQ(ierr); ierr = DMDestroy(&dm);CHKERRQ(ierr); ierr = PetscFree(ctx.init_zero);CHKERRQ(ierr); ierr = PetscFinalize(); return ierr; } /*TEST mpirun -np 1 ./ns_stat -cells 10,10 -vel_petscspace_order 2 -pres_petscspace_order 1 -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_schur_fact_type full -fieldsplit_velocity_pc_type lu -fieldsplit_pressure_ksp_rtol 1.0e-10 -fieldsplit_pressure_pc_type jacobi -ksp_monitor_short -ksp_converged_reason -snes_monitor_short -snes_converged_reason -sol_vec_view vtk:sol_ns.vtu:VTK_VTU mpirun -np 1 ./ns_stat -cells 10,10 -vel_petscspace_order 2 -pres_petscspace_order 1 -ksp_type fgmres -ksp_gmres_restart 10 -ksp_rtol 1.0e-9 -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_schur_factorization_type full -fieldsplit_pressure_ksp_rtol 1e-10 -fieldsplit_velocity_ksp_type gmres -fieldsplit_velocity_pc_type lu -fieldsplit_pressure_pc_type jacobi -snes_error_if_not_converged -ksp_error_if_not_converged -snes_view -sol_vec_view vtk:sol_ns.vtu:VTK_VTU bidon : mpirun -np 1 ./ns_stat -cells 10,10 -vel_petscspace_order 2 -pres_petscspace_order 1 -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_schur_fact_type full -fieldsplit_velocity_pc_type lu -fieldsplit_pressure_ksp_rtol 1.0e-10 -fieldsplit_pressure_pc_type jacobi -ksp_monitor_short -ksp_type fgmres mpirun -np 1 ./ns_stat -cells 10,10 -vel_petscspace_order 2 -pres_petscspace_order 1 -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_schur_fact_type lower -fieldsplit_0_ksp_type gmres -fieldsplit_0_pc_type bjacobi -fieldsplit_1_pc_type jacobi -fieldsplit_1_inner_ksp_type preonly -fieldsplit_1_inner_pc_type jacobi -fieldsplit_1_upper_ksp_type preonly -fieldsplit_1_upper_pc_type jacobi -ksp_converged_reason -snes_monitor_short -snes_converged_reason -sol_vec_view vtk:sol_ns.vtu:VTK_VTU TEST*/ From Eric.Chamberland at giref.ulaval.ca Thu Nov 23 13:44:17 2017 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Thu, 23 Nov 2017 14:44:17 -0500 Subject: [petsc-users] How can I retrieve the IS for all Missing Diagonal entries? In-Reply-To: References: <57999e41-6c3f-ae21-6f88-ea7348cf505c@giref.ulaval.ca> Message-ID: Thank you for your answers. First, I was wrong when telling that having empty rows is working for LU solver: after verification, it is the exact opposite: it does not work for MUMPS. We have a nightly test that solve that kind of matrix with a gmres+jacobi combination and it works well... Second, since LU solvers are interesting to use as the solvers for the coarsest level in a PCGAMG for example, we still have to give a non-zero for all the diagonal... So my solution will be to add it anyway... :) Thanks again, Eric On 22/11/17 11:26 AM, Matthew Knepley wrote: > On Wed, Nov 22, 2017 at 11:13 AM, Eric Chamberland > > wrote: > > Hi, > > I have 2 questions: > > First, I am looking for a function that is almost like > MatMissingDiagonal, but that would return me *all* missing diagonal > entries. > > Does it exists? > > > No > > If not, is there another way of doing this? > > > Not a nice way, unfortunately. It is fairly dependent on the > implementation. You could call GetRow() for every row and check. > > Second: after searching through Petsc list, I found this that upset > me a bit: > > https://www.mail-archive.com/petsc-users at mcs.anl.gov/msg22867.html > > > so maybe I should modify our code to be fully compliant with this? > I have some examples (MUMPS) that are working without diagonal > entries but I didn't tried other PCs or KSPs... > > > We use the diagonal frequently, for instance in the factorization PCs. I > am guessing we put in the diagonal when converting to the MUMPS format. > > Thanks, > > Matt > > Thanks, > > Eric > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From s.lanthaler at gmail.com Thu Nov 23 16:43:11 2017 From: s.lanthaler at gmail.com (Samuel Lanthaler) Date: Thu, 23 Nov 2017 23:43:11 +0100 Subject: [petsc-users] MatZeroRowsColumns Message-ID: Hi there, I'm new to PETSc and have been trying to do some basic manipulation of matrices in fortran, but don't seem to be able to set a row/column to zero using the MatZeroRowsColumns command: The following is a small example program:| | |! initialize PETSc|| ||? CALL PetscInitialize(PETSC_NULL_CHARACTER,ierr)|| || ||? ! Set up a new matrix|| ||? m = 3|| ||? CALL MatCreate(PETSC_COMM_WORLD,matA,ierr); CHKERRQ(ierr);|| ||? CALL MatSetType(matA,MATMPIAIJ,ierr); CHKERRQ(ierr);|| ||? CALL MatSetSizes(matA,PETSC_DECIDE,PETSC_DECIDE,m,m,ierr); CHKERRQ(ierr);|| ||? CALL MatMPIAIJSetPreallocation(matA,3,PETSC_NULL_INTEGER,3,PETSC_NULL_INTEGER,ierr); CHKERRQ(ierr);|| || ||? ! set values of matrix|| ||? vals(1,:) = (/1.,2.,3./)|| ||? vals(2,:) = (/4.,5.,6./)|| ||? vals(3,:) = (/7.,8.,9./)|| ||? ! || ||? idxm = (/0,1,2/)|| ||? idxn = (/0,1,2/)|| ||? !|| ||? CALL MatSetValues(matA,3,idxm,3,idxn,vals,INSERT_VALUES,ierr); CHKERRQ(ierr);|| |||| ||? ! assemble matrix|| ||? CALL MatAssemblyBegin(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr);|| ||? CALL MatAssemblyEnd(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr);|| |||| ||? ! set one row/column to zero, put 6.0d0 on diagonal|| ||? idone(1) = 2|| ||? val = 6.0d0|| ||? CALL MatZeroRowsColumns(matA,1,idone,val,ierr); CHKERRQ(ierr);||| |! finalize PETSc|| ||? CALL PetscFinalize(PETSC_NULL_CHARACTER,ierr)|| || |When running the program, I get the following error message: |[0]PETSC ERROR: ------------------------------------------------------------------------|| ||[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range|| ||[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger|| ||[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind|| ||[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors|| ||[0]PETSC ERROR: likely location of problem given in stack below|| ||[0]PETSC ERROR: ---------------------? Stack Frames ------------------------------------|| ||[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,|| ||[0]PETSC ERROR:?????? INSTEAD the line number of the start of the function|| ||[0]PETSC ERROR:?????? is given.|| ||[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------|| ||[0]PETSC ERROR: Signal received|| ||[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.|| ||[0]PETSC ERROR: Petsc Release Version 3.8.2, Nov, 09, 2017 || ||[0]PETSC ERROR: ./test on a arch-complex-debug named sam-ThinkPad-T450s by sam Thu Nov 23 23:28:15 2017|| ||[0]PETSC ERROR: Configure options PETSC_DIR=/home/sam/Progs/petsc-3.8.2 PETSC_ARCH=arch-complex-debug --with-scalar-type=complex|| ||[0]PETSC ERROR: #1 User provided function() line 0 in? unknown file|| ||--------------------------------------------------------------------------|| ||MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD || ||with errorcode 59.| Would someone be so kind as to tell me what I'm doing wrong? Thank you! Best regards, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Nov 23 18:22:54 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 23 Nov 2017 19:22:54 -0500 Subject: [petsc-users] MatZeroRowsColumns In-Reply-To: References: Message-ID: I suspect this has to do with Fortran declarations. First, make sure you are using the latest release. Second, look at http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/UsingFortran.html Make sure you have the right modules. Thanks, Matt On Thu, Nov 23, 2017 at 5:43 PM, Samuel Lanthaler wrote: > Hi there, > > I'm new to PETSc and have been trying to do some basic manipulation of > matrices in fortran, but don't seem to be able to set a row/column to zero > using the MatZeroRowsColumns command: The following is a small example > program: > > ! initialize PETSc > CALL PetscInitialize(PETSC_NULL_CHARACTER,ierr) > > ! Set up a new matrix > m = 3 > CALL MatCreate(PETSC_COMM_WORLD,matA,ierr); CHKERRQ(ierr); > CALL MatSetType(matA,MATMPIAIJ,ierr); CHKERRQ(ierr); > CALL MatSetSizes(matA,PETSC_DECIDE,PETSC_DECIDE,m,m,ierr); > CHKERRQ(ierr); > CALL MatMPIAIJSetPreallocation(matA,3,PETSC_NULL_INTEGER,3,PETSC_NULL_INTEGER,ierr); > CHKERRQ(ierr); > > ! set values of matrix > vals(1,:) = (/1.,2.,3./) > vals(2,:) = (/4.,5.,6./) > vals(3,:) = (/7.,8.,9./) > ! > idxm = (/0,1,2/) > idxn = (/0,1,2/) > ! > CALL MatSetValues(matA,3,idxm,3,idxn,vals,INSERT_VALUES,ierr); > CHKERRQ(ierr); > > ! assemble matrix > CALL MatAssemblyBegin(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr); > CALL MatAssemblyEnd(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr); > > ! set one row/column to zero, put 6.0d0 on diagonal > idone(1) = 2 > val = 6.0d0 > CALL MatZeroRowsColumns(matA,1,idone,val,ierr); CHKERRQ(ierr); > > ! finalize PETSc > CALL PetscFinalize(PETSC_NULL_CHARACTER,ierr) > > When running the program, I get the following error message: > > [0]PETSC ERROR: ------------------------------ > ------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.8.2, Nov, 09, 2017 > [0]PETSC ERROR: ./test on a arch-complex-debug named sam-ThinkPad-T450s by > sam Thu Nov 23 23:28:15 2017 > [0]PETSC ERROR: Configure options PETSC_DIR=/home/sam/Progs/petsc-3.8.2 > PETSC_ARCH=arch-complex-debug --with-scalar-type=complex > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 59. > > Would someone be so kind as to tell me what I'm doing wrong? Thank you! > Best regards, > Sam > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Nov 23 18:56:39 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 24 Nov 2017 00:56:39 +0000 Subject: [petsc-users] MatZeroRowsColumns In-Reply-To: References: Message-ID: MatZeroRowsColumns() as two vector arguments you are missing > On Nov 23, 2017, at 2:43 PM, Samuel Lanthaler wrote: > > Hi there, > I'm new to PETSc and have been trying to do some basic manipulation of matrices in fortran, but don't seem to be able to set a row/column to zero using the MatZeroRowsColumns command: The following is a small example program: > > ! initialize PETSc > CALL PetscInitialize(PETSC_NULL_CHARACTER,ierr) > > ! Set up a new matrix > m = 3 > CALL MatCreate(PETSC_COMM_WORLD,matA,ierr); CHKERRQ(ierr); > CALL MatSetType(matA,MATMPIAIJ,ierr); CHKERRQ(ierr); > CALL MatSetSizes(matA,PETSC_DECIDE,PETSC_DECIDE,m,m,ierr); CHKERRQ(ierr); > CALL MatMPIAIJSetPreallocation(matA,3,PETSC_NULL_INTEGER,3,PETSC_NULL_INTEGER,ierr); CHKERRQ(ierr); > > ! set values of matrix > vals(1,:) = (/1.,2.,3./) > vals(2,:) = (/4.,5.,6./) > vals(3,:) = (/7.,8.,9./) > ! > idxm = (/0,1,2/) > idxn = (/0,1,2/) > ! > CALL MatSetValues(matA,3,idxm,3,idxn,vals,INSERT_VALUES,ierr); CHKERRQ(ierr); > > ! assemble matrix > CALL MatAssemblyBegin(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr); > CALL MatAssemblyEnd(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr); > > ! set one row/column to zero, put 6.0d0 on diagonal > idone(1) = 2 > val = 6.0d0 > CALL MatZeroRowsColumns(matA,1,idone,val,ierr); CHKERRQ(ierr); > > ! finalize PETSc > CALL PetscFinalize(PETSC_NULL_CHARACTER,ierr) > > When running the program, I get the following error message: > > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.8.2, Nov, 09, 2017 > [0]PETSC ERROR: ./test on a arch-complex-debug named sam-ThinkPad-T450s by sam Thu Nov 23 23:28:15 2017 > [0]PETSC ERROR: Configure options PETSC_DIR=/home/sam/Progs/petsc-3.8.2 PETSC_ARCH=arch-complex-debug --with-scalar-type=complex > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 59. > Would someone be so kind as to tell me what I'm doing wrong? Thank you! > > Best regards, > Sam From yann.jobic at univ-amu.fr Fri Nov 24 04:34:13 2017 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Fri, 24 Nov 2017 11:34:13 +0100 Subject: [petsc-users] 2 Dirichlet conditions for one Element in PetscFE In-Reply-To: References: Message-ID: Le 23/11/2017 ? 13:45, Matthew Knepley a ?crit?: > On Thu, Nov 23, 2017 at 3:39 AM, Yann Jobic > wrote: > > Hello, > > I checked out? the branch knepley/fix-plex-bc-multiple, but i now > have a strange problem. > I splited my labels as in ex69.c of snes. It may be the right way > to do it. > In petsc 3.8.2, i have the same behavior as before, the element > containing the face is called by PetscDSAddBoundary. > In the git version, PetscDSAddBoundary does not call my boundary > function at all. > The call : > ? ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "wallL", > "markerLeft",?? 0, Ncomp, components, (void (*)(void)) uIn,? 1, > &id, ctx);CHKERRQ(ierr); > > I checked that my labels are correct : > ? markerLeft: 1 strata with value/size (1 (11)) > ? markerTop: 1 strata with value/size (1 (11)) > ? markerRight: 1 strata with value/size (1 (11)) > ? markerBottom: 1 strata with value/size (1 (11)) > > What am i doing wrong ? > > > So you call AddBoundary() and then the boundary values are never > inserted? The call looks correct. Can you send me an example to check? > Obviously this works for my simple examples in the repo. I can't see > by looking what might be wrong for you. Hello, I may have the wrong git directory. I've got the message when i'm compiling : The version of PETSc you are using is out-of-date, we recommend updating to the new release ?Available Version: 3.8.2?? Installed Version: 3.8 As i'm not familiar with git, I've done : git clone https://bitbucket.org/petsc/petsc git checkout knepley/fix-plex-bc-multiple With this petsc version, ex46 and ex47 are not working (from TS), and after calling PetscDSAddBoundary, it seems that the boundary values are never inserted, as in my own code. Do i have the right petsc development version ? Thanks, Regards, Yann PS : In my old code, i've tried to mix ex56 and ex69 for marking boundary values. I first set labels for "Faces", and using this label, i create new separated ones (North, South, ...). It worked in sequential, but in parallel, the program hang at DMPlexCreateClosureIndex. So i just kept "Faces", as in ex56, and it worked in parallel. > > ? Thanks, > > ? ? Matt > > Thanks, > > Regards, > > Yann > > Le 22/11/2017 ? 18:51, Matthew Knepley a ?crit?: >> On Wed, Nov 22, 2017 at 12:39 PM, Yann Jobic >> > wrote: >> >> Hello, >> >> I've found a strange behavior when looking into a bug for the >> pressure convergence of a simple Navier-Stokes problem using >> PetscFE. >> >> I followed many examples for labeling boundary faces. I first >> use DMPlexMarkBoundaryFaces, (label=1 to the faces). >> I find those faces using DMGetStratumIS and searching 1 as it >> is the value of the marked boundary faces. >> Finally i use DMPlexLabelComplete over the new label. >> I then use : >> ? ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "in", >> "Faces", 0, Ncomp, components, (void (*)(void)) uIn, NWest, >> west, NULL);CHKERRQ(ierr); >> in order to impose a dirichlet condition for the faces >> labeled by the correct value (west=1, south=3,...). >> >> However, the function "uIn()" is called in all the Elements >> containing the boundary faces, and thus impose the values at >> nodes that are not in the labeled faces. >> Is it a normal behavior ? I then have to test the position of >> the node calling uIn, in order to impose the good value. >> I have this problem for a Poiseuille flow, where at 2 corner >> Elements i have a zero velocity dirichlet condition (wall) >> and a In flow velocity one. >> >> >> I believe I have fixed this in knepley/fix-plex-bc-multiple which >> should be merged soon. Do you know how to merge that branch and try? >> >> ? Thanks, >> >> ? ? ?Matt >> >> The pressure is then very high at the corner nodes of those 2 >> Elements. >> Do you think my pressure problem comes from there ? (The >> velocity field is correct) >> >> Many thanks, >> >> Regards, >> >> Yann >> >> PS : i'm using those runtime options : >> -vel_petscspace_order 2 -pres_petscspace_order 1 \ >> -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type >> schur -pc_fieldsplit_schur_fact_type full? \ >> -fieldsplit_velocity_pc_type lu -fieldsplit_pressure_ksp_rtol >> 1.0e-10 -fieldsplit_pressure_pc_type jacobi >> >> >> --- >> L'absence de virus dans ce courrier ?lectronique a ?t? >> v?rifi?e par le logiciel antivirus Avast. >> https://www.avast.com/antivirus >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.lanthaler at gmail.com Fri Nov 24 05:42:33 2017 From: s.lanthaler at gmail.com (Samuel Lanthaler) Date: Fri, 24 Nov 2017 12:42:33 +0100 Subject: [petsc-users] MatZeroRowsColumns In-Reply-To: References: Message-ID: <5A1805A9.3070101@gmail.com> Ah, great. I didn't understand that "optional" arguments are not optional in the fortran sense of the word. After setting up two vectors to pass to the routine, the program works. Thank you very much! One additional question: In the documentation, it is written that I can pass PETSC_NULL for these two vectors, if I don't actually need them. That probably only works in C, but not in fortran. Is there a corresponding argument I can pass to the routine in fortran? It seems, that PETSC_NULL_REAL doesn't work; is there some other PETSC_NULL_XXX that should be passed in this case from fortran? If not, then I'll just stick to creating vectors and distroying them afterwards, which is fine for me as well. Cheers, Sam On 11/24/2017 01:56 AM, Smith, Barry F. wrote: > MatZeroRowsColumns() as two vector arguments you are missing > > >> On Nov 23, 2017, at 2:43 PM, Samuel Lanthaler wrote: >> >> Hi there, >> I'm new to PETSc and have been trying to do some basic manipulation of matrices in fortran, but don't seem to be able to set a row/column to zero using the MatZeroRowsColumns command: The following is a small example program: >> >> ! initialize PETSc >> CALL PetscInitialize(PETSC_NULL_CHARACTER,ierr) >> >> ! Set up a new matrix >> m = 3 >> CALL MatCreate(PETSC_COMM_WORLD,matA,ierr); CHKERRQ(ierr); >> CALL MatSetType(matA,MATMPIAIJ,ierr); CHKERRQ(ierr); >> CALL MatSetSizes(matA,PETSC_DECIDE,PETSC_DECIDE,m,m,ierr); CHKERRQ(ierr); >> CALL MatMPIAIJSetPreallocation(matA,3,PETSC_NULL_INTEGER,3,PETSC_NULL_INTEGER,ierr); CHKERRQ(ierr); >> >> ! set values of matrix >> vals(1,:) = (/1.,2.,3./) >> vals(2,:) = (/4.,5.,6./) >> vals(3,:) = (/7.,8.,9./) >> ! >> idxm = (/0,1,2/) >> idxn = (/0,1,2/) >> ! >> CALL MatSetValues(matA,3,idxm,3,idxn,vals,INSERT_VALUES,ierr); CHKERRQ(ierr); >> >> ! assemble matrix >> CALL MatAssemblyBegin(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr); >> CALL MatAssemblyEnd(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr); >> >> ! set one row/column to zero, put 6.0d0 on diagonal >> idone(1) = 2 >> val = 6.0d0 >> CALL MatZeroRowsColumns(matA,1,idone,val,ierr); CHKERRQ(ierr); >> >> ! finalize PETSc >> CALL PetscFinalize(PETSC_NULL_CHARACTER,ierr) >> >> When running the program, I get the following error message: >> >> [0]PETSC ERROR: ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >> [0]PETSC ERROR: likely location of problem given in stack below >> [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [0]PETSC ERROR: INSTEAD the line number of the start of the function >> [0]PETSC ERROR: is given. >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Signal received >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.8.2, Nov, 09, 2017 >> [0]PETSC ERROR: ./test on a arch-complex-debug named sam-ThinkPad-T450s by sam Thu Nov 23 23:28:15 2017 >> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sam/Progs/petsc-3.8.2 PETSC_ARCH=arch-complex-debug --with-scalar-type=complex >> [0]PETSC ERROR: #1 User provided function() line 0 in unknown file >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >> with errorcode 59. >> Would someone be so kind as to tell me what I'm doing wrong? Thank you! >> >> Best regards, >> Sam From jed at jedbrown.org Fri Nov 24 07:57:08 2017 From: jed at jedbrown.org (Jed Brown) Date: Fri, 24 Nov 2017 06:57:08 -0700 Subject: [petsc-users] MatZeroRowsColumns In-Reply-To: <5A1805A9.3070101@gmail.com> References: <5A1805A9.3070101@gmail.com> Message-ID: <87k1yfsrt7.fsf@jedbrown.org> Samuel Lanthaler writes: > Ah, great. I didn't understand that "optional" arguments are not > optional in the fortran sense of the word. After setting up two vectors > to pass to the routine, the program works. Thank you very much! > > One additional question: In the documentation, it is written that I can > pass PETSC_NULL for these two vectors, if I don't actually need them. > That probably only works in C, but not in fortran. Is there a > corresponding argument I can pass to the routine in fortran? It seems, > that PETSC_NULL_REAL doesn't work; is there some other PETSC_NULL_XXX > that should be passed in this case from fortran? PETSC_NULL_OBJECT > If not, then I'll just stick to creating vectors and distroying them > afterwards, which is fine for me as well. > > Cheers, > Sam > > > On 11/24/2017 01:56 AM, Smith, Barry F. wrote: >> MatZeroRowsColumns() as two vector arguments you are missing >> >> >>> On Nov 23, 2017, at 2:43 PM, Samuel Lanthaler wrote: >>> >>> Hi there, >>> I'm new to PETSc and have been trying to do some basic manipulation of matrices in fortran, but don't seem to be able to set a row/column to zero using the MatZeroRowsColumns command: The following is a small example program: >>> >>> ! initialize PETSc >>> CALL PetscInitialize(PETSC_NULL_CHARACTER,ierr) >>> >>> ! Set up a new matrix >>> m = 3 >>> CALL MatCreate(PETSC_COMM_WORLD,matA,ierr); CHKERRQ(ierr); >>> CALL MatSetType(matA,MATMPIAIJ,ierr); CHKERRQ(ierr); >>> CALL MatSetSizes(matA,PETSC_DECIDE,PETSC_DECIDE,m,m,ierr); CHKERRQ(ierr); >>> CALL MatMPIAIJSetPreallocation(matA,3,PETSC_NULL_INTEGER,3,PETSC_NULL_INTEGER,ierr); CHKERRQ(ierr); >>> >>> ! set values of matrix >>> vals(1,:) = (/1.,2.,3./) >>> vals(2,:) = (/4.,5.,6./) >>> vals(3,:) = (/7.,8.,9./) >>> ! >>> idxm = (/0,1,2/) >>> idxn = (/0,1,2/) >>> ! >>> CALL MatSetValues(matA,3,idxm,3,idxn,vals,INSERT_VALUES,ierr); CHKERRQ(ierr); >>> >>> ! assemble matrix >>> CALL MatAssemblyBegin(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr); >>> CALL MatAssemblyEnd(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr); >>> >>> ! set one row/column to zero, put 6.0d0 on diagonal >>> idone(1) = 2 >>> val = 6.0d0 >>> CALL MatZeroRowsColumns(matA,1,idone,val,ierr); CHKERRQ(ierr); >>> >>> ! finalize PETSc >>> CALL PetscFinalize(PETSC_NULL_CHARACTER,ierr) >>> >>> When running the program, I get the following error message: >>> >>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>> [0]PETSC ERROR: likely location of problem given in stack below >>> [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>> [0]PETSC ERROR: INSTEAD the line number of the start of the function >>> [0]PETSC ERROR: is given. >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: Signal received >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.8.2, Nov, 09, 2017 >>> [0]PETSC ERROR: ./test on a arch-complex-debug named sam-ThinkPad-T450s by sam Thu Nov 23 23:28:15 2017 >>> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sam/Progs/petsc-3.8.2 PETSC_ARCH=arch-complex-debug --with-scalar-type=complex >>> [0]PETSC ERROR: #1 User provided function() line 0 in unknown file >>> -------------------------------------------------------------------------- >>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>> with errorcode 59. >>> Would someone be so kind as to tell me what I'm doing wrong? Thank you! >>> >>> Best regards, >>> Sam From knepley at gmail.com Fri Nov 24 08:18:15 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 24 Nov 2017 09:18:15 -0500 Subject: [petsc-users] MatZeroRowsColumns In-Reply-To: <5A1805A9.3070101@gmail.com> References: <5A1805A9.3070101@gmail.com> Message-ID: On Fri, Nov 24, 2017 at 6:42 AM, Samuel Lanthaler wrote: > Ah, great. I didn't understand that "optional" arguments are not optional > in the fortran sense of the word. After setting up two vectors to pass to > the routine, the program works. Thank you very much! > > One additional question: In the documentation, it is written that I can > pass PETSC_NULL for these two vectors, if I don't actually need them. That > probably only works in C, but not in fortran. Is there a corresponding > argument I can pass to the routine in fortran? It seems, that > PETSC_NULL_REAL doesn't work; is there some other PETSC_NULL_XXX that > should be passed in this case from fortran? If not, then I'll just stick to > creating vectors and distroying them afterwards, which is fine for me as > well. > You can use PETSC_NULL_VEC Thanks, Matt > Cheers, > Sam > > > On 11/24/2017 01:56 AM, Smith, Barry F. wrote: > >> MatZeroRowsColumns() as two vector arguments you are missing >> >> >> On Nov 23, 2017, at 2:43 PM, Samuel Lanthaler >>> wrote: >>> >>> Hi there, >>> I'm new to PETSc and have been trying to do some basic manipulation of >>> matrices in fortran, but don't seem to be able to set a row/column to zero >>> using the MatZeroRowsColumns command: The following is a small example >>> program: >>> >>> ! initialize PETSc >>> CALL PetscInitialize(PETSC_NULL_CHARACTER,ierr) >>> >>> ! Set up a new matrix >>> m = 3 >>> CALL MatCreate(PETSC_COMM_WORLD,matA,ierr); CHKERRQ(ierr); >>> CALL MatSetType(matA,MATMPIAIJ,ierr); CHKERRQ(ierr); >>> CALL MatSetSizes(matA,PETSC_DECIDE,PETSC_DECIDE,m,m,ierr); >>> CHKERRQ(ierr); >>> CALL MatMPIAIJSetPreallocation(matA,3,PETSC_NULL_INTEGER,3,PETSC_NULL_INTEGER,ierr); >>> CHKERRQ(ierr); >>> >>> ! set values of matrix >>> vals(1,:) = (/1.,2.,3./) >>> vals(2,:) = (/4.,5.,6./) >>> vals(3,:) = (/7.,8.,9./) >>> ! >>> idxm = (/0,1,2/) >>> idxn = (/0,1,2/) >>> ! >>> CALL MatSetValues(matA,3,idxm,3,idxn,vals,INSERT_VALUES,ierr); >>> CHKERRQ(ierr); >>> ! assemble matrix >>> CALL MatAssemblyBegin(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr); >>> CALL MatAssemblyEnd(matA,MAT_FINAL_ASSEMBLY,ierr); CHKERRQ(ierr); >>> ! set one row/column to zero, put 6.0d0 on diagonal >>> idone(1) = 2 >>> val = 6.0d0 >>> CALL MatZeroRowsColumns(matA,1,idone,val,ierr); CHKERRQ(ierr); >>> >>> ! finalize PETSc >>> CALL PetscFinalize(PETSC_NULL_CHARACTER,ierr) >>> >>> When running the program, I get the following error message: >>> >>> [0]PETSC ERROR: ------------------------------ >>> ------------------------------------------ >>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >>> probably memory access out of range >>> [0]PETSC ERROR: Try option -start_in_debugger or >>> -on_error_attach_debugger >>> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/d >>> ocumentation/faq.html#valgrind >>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >>> OS X to find memory corruption errors >>> [0]PETSC ERROR: likely location of problem given in stack below >>> [0]PETSC ERROR: --------------------- Stack Frames >>> ------------------------------------ >>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not >>> available, >>> [0]PETSC ERROR: INSTEAD the line number of the start of the >>> function >>> [0]PETSC ERROR: is given. >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: Signal received >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.8.2, Nov, 09, 2017 >>> [0]PETSC ERROR: ./test on a arch-complex-debug named sam-ThinkPad-T450s >>> by sam Thu Nov 23 23:28:15 2017 >>> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sam/Progs/petsc-3.8.2 >>> PETSC_ARCH=arch-complex-debug --with-scalar-type=complex >>> [0]PETSC ERROR: #1 User provided function() line 0 in unknown file >>> ------------------------------------------------------------ >>> -------------- >>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>> with errorcode 59. >>> Would someone be so kind as to tell me what I'm doing wrong? Thank you! >>> >>> Best regards, >>> Sam >>> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.lanthaler at gmail.com Fri Nov 24 08:22:02 2017 From: s.lanthaler at gmail.com (Samuel Lanthaler) Date: Fri, 24 Nov 2017 15:22:02 +0100 Subject: [petsc-users] MatZeroRowsColumns In-Reply-To: References: <5A1805A9.3070101@gmail.com> Message-ID: <5A182B0A.8040602@gmail.com> Perfect, PETSC_NULL_VEC works in fortran. Thank you very much for your help! :-) Cheers, Sam On 11/24/2017 03:18 PM, Matthew Knepley wrote: > On Fri, Nov 24, 2017 at 6:42 AM, Samuel Lanthaler > > wrote: > > Ah, great. I didn't understand that "optional" arguments are not > optional in the fortran sense of the word. After setting up two > vectors to pass to the routine, the program works. Thank you very > much! > > One additional question: In the documentation, it is written that > I can pass PETSC_NULL for these two vectors, if I don't actually > need them. That probably only works in C, but not in fortran. Is > there a corresponding argument I can pass to the routine in > fortran? It seems, that PETSC_NULL_REAL doesn't work; is there > some other PETSC_NULL_XXX that should be passed in this case from > fortran? If not, then I'll just stick to creating vectors and > distroying them afterwards, which is fine for me as well. > > > You can use PETSC_NULL_VEC > > Thanks, > > Matt > > Cheers, > Sam > > > On 11/24/2017 01:56 AM, Smith, Barry F. wrote: > > MatZeroRowsColumns() as two vector arguments you are missing > > > On Nov 23, 2017, at 2:43 PM, Samuel Lanthaler > > wrote: > > Hi there, > I'm new to PETSc and have been trying to do some basic > manipulation of matrices in fortran, but don't seem to be > able to set a row/column to zero using the > MatZeroRowsColumns command: The following is a small > example program: > > ! initialize PETSc > CALL PetscInitialize(PETSC_NULL_CHARACTER,ierr) > > ! Set up a new matrix > m = 3 > CALL MatCreate(PETSC_COMM_WORLD,matA,ierr); CHKERRQ(ierr); > CALL MatSetType(matA,MATMPIAIJ,ierr); CHKERRQ(ierr); > CALL > MatSetSizes(matA,PETSC_DECIDE,PETSC_DECIDE,m,m,ierr); > CHKERRQ(ierr); > CALL > MatMPIAIJSetPreallocation(matA,3,PETSC_NULL_INTEGER,3,PETSC_NULL_INTEGER,ierr); > CHKERRQ(ierr); > > ! set values of matrix > vals(1,:) = (/1.,2.,3./) > vals(2,:) = (/4.,5.,6./) > vals(3,:) = (/7.,8.,9./) > ! > idxm = (/0,1,2/) > idxn = (/0,1,2/) > ! > CALL > MatSetValues(matA,3,idxm,3,idxn,vals,INSERT_VALUES,ierr); > CHKERRQ(ierr); > ! assemble matrix > CALL MatAssemblyBegin(matA,MAT_FINAL_ASSEMBLY,ierr); > CHKERRQ(ierr); > CALL MatAssemblyEnd(matA,MAT_FINAL_ASSEMBLY,ierr); > CHKERRQ(ierr); > ! set one row/column to zero, put 6.0d0 on diagonal > idone(1) = 2 > val = 6.0d0 > CALL MatZeroRowsColumns(matA,1,idone,val,ierr); > CHKERRQ(ierr); > > ! finalize PETSc > CALL PetscFinalize(PETSC_NULL_CHARACTER,ierr) > > When running the program, I get the following error message: > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation > Violation, probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux > and Apple Mac OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack > below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack > are not available, > [0]PETSC ERROR: INSTEAD the line number of the start > of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html > for > trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.8.2, Nov, 09, 2017 > [0]PETSC ERROR: ./test on a arch-complex-debug named > sam-ThinkPad-T450s by sam Thu Nov 23 23:28:15 2017 > [0]PETSC ERROR: Configure options > PETSC_DIR=/home/sam/Progs/petsc-3.8.2 > PETSC_ARCH=arch-complex-debug --with-scalar-type=complex > [0]PETSC ERROR: #1 User provided function() line 0 in > unknown file > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 59. > Would someone be so kind as to tell me what I'm doing > wrong? Thank you! > > Best regards, > Sam > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Nov 24 08:51:18 2017 From: jed at jedbrown.org (Jed Brown) Date: Fri, 24 Nov 2017 07:51:18 -0700 Subject: [petsc-users] MatZeroRowsColumns In-Reply-To: References: <5A1805A9.3070101@gmail.com> Message-ID: <87h8tjspax.fsf@jedbrown.org> Matthew Knepley writes: > On Fri, Nov 24, 2017 at 6:42 AM, Samuel Lanthaler > wrote: > >> Ah, great. I didn't understand that "optional" arguments are not optional >> in the fortran sense of the word. After setting up two vectors to pass to >> the routine, the program works. Thank you very much! >> >> One additional question: In the documentation, it is written that I can >> pass PETSC_NULL for these two vectors, if I don't actually need them. That >> probably only works in C, but not in fortran. Is there a corresponding >> argument I can pass to the routine in fortran? It seems, that >> PETSC_NULL_REAL doesn't work; is there some other PETSC_NULL_XXX that >> should be passed in this case from fortran? If not, then I'll just stick to >> creating vectors and distroying them afterwards, which is fine for me as >> well. >> > > You can use PETSC_NULL_VEC Matt is right; PETSC_NULL_OBJECT is gone in petsc-3.8. From yann.jobic at univ-amu.fr Fri Nov 24 09:13:34 2017 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Fri, 24 Nov 2017 16:13:34 +0100 Subject: [petsc-users] 2 Dirichlet conditions for one Element in PetscFE In-Reply-To: References: Message-ID: <5be041b7-ce78-dd5a-e3e4-0468dfa87c7d@univ-amu.fr> Hello, I tried the "next" branch? from the git repository. The function PetscDSAddBoundary correctly set the boundary values, but there is still the bug of the boundary applied to the whole element. I'll dig a little in DMPlexVecSetFieldClosure_Internal() of knepley/fix-plex-bc-multiple where the possible bug should be. I now use a simple Poisson FE test case in order to check the boundaries. I hope these details helps a little... Regards, Yann Le 24/11/2017 ? 11:34, Yann Jobic a ?crit?: > Le 23/11/2017 ? 13:45, Matthew Knepley a ?crit?: >> On Thu, Nov 23, 2017 at 3:39 AM, Yann Jobic > > wrote: >> >> Hello, >> >> I checked out? the branch knepley/fix-plex-bc-multiple, but i now >> have a strange problem. >> I splited my labels as in ex69.c of snes. It may be the right way >> to do it. >> In petsc 3.8.2, i have the same behavior as before, the element >> containing the face is called by PetscDSAddBoundary. >> In the git version, PetscDSAddBoundary does not call my boundary >> function at all. >> The call : >> ? ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "wallL", >> "markerLeft",?? 0, Ncomp, components, (void (*)(void)) uIn,? 1, >> &id, ctx);CHKERRQ(ierr); >> >> I checked that my labels are correct : >> ? markerLeft: 1 strata with value/size (1 (11)) >> ? markerTop: 1 strata with value/size (1 (11)) >> ? markerRight: 1 strata with value/size (1 (11)) >> ? markerBottom: 1 strata with value/size (1 (11)) >> >> What am i doing wrong ? >> >> >> So you call AddBoundary() and then the boundary values are never >> inserted? The call looks correct. Can you send me an example to check? >> Obviously this works for my simple examples in the repo. I can't see >> by looking what might be wrong for you. > Hello, > I may have the wrong git directory. > I've got the message when i'm compiling : > The version of PETSc you are using is out-of-date, we recommend > updating to the new release > ?Available Version: 3.8.2?? Installed Version: 3.8 > > As i'm not familiar with git, I've done : > git clone https://bitbucket.org/petsc/petsc > git checkout knepley/fix-plex-bc-multiple > > With this petsc version, ex46 and ex47 are not working (from TS), and > after calling PetscDSAddBoundary, it seems that the boundary values > are never inserted, as in my own code. > > Do i have the right petsc development version ? > > Thanks, > > Regards, > > Yann > > PS : In my old code, i've tried to mix ex56 and ex69 for marking > boundary values. > I first set labels for "Faces", and using this label, i create new > separated ones (North, South, ...). > It worked in sequential, but in parallel, the program hang at > DMPlexCreateClosureIndex. > So i just kept "Faces", as in ex56, and it worked in parallel. > >> >> ? Thanks, >> >> ? ? Matt >> >> Thanks, >> >> Regards, >> >> Yann >> >> Le 22/11/2017 ? 18:51, Matthew Knepley a ?crit?: >>> On Wed, Nov 22, 2017 at 12:39 PM, Yann Jobic >>> > wrote: >>> >>> Hello, >>> >>> I've found a strange behavior when looking into a bug for >>> the pressure convergence of a simple Navier-Stokes problem >>> using PetscFE. >>> >>> I followed many examples for labeling boundary faces. I >>> first use DMPlexMarkBoundaryFaces, (label=1 to the faces). >>> I find those faces using DMGetStratumIS and searching 1 as >>> it is the value of the marked boundary faces. >>> Finally i use DMPlexLabelComplete over the new label. >>> I then use : >>> ? ierr = PetscDSAddBoundary(prob, DM_BC_ESSENTIAL, "in", >>> "Faces", 0, Ncomp, components, (void (*)(void)) uIn, NWest, >>> west, NULL);CHKERRQ(ierr); >>> in order to impose a dirichlet condition for the faces >>> labeled by the correct value (west=1, south=3,...). >>> >>> However, the function "uIn()" is called in all the Elements >>> containing the boundary faces, and thus impose the values at >>> nodes that are not in the labeled faces. >>> Is it a normal behavior ? I then have to test the position >>> of the node calling uIn, in order to impose the good value. >>> I have this problem for a Poiseuille flow, where at 2 corner >>> Elements i have a zero velocity dirichlet condition (wall) >>> and a In flow velocity one. >>> >>> >>> I believe I have fixed this in knepley/fix-plex-bc-multiple >>> which should be merged soon. Do you know how to merge that >>> branch and try? >>> >>> ? Thanks, >>> >>> ? ? ?Matt >>> >>> The pressure is then very high at the corner nodes of those >>> 2 Elements. >>> Do you think my pressure problem comes from there ? (The >>> velocity field is correct) >>> >>> Many thanks, >>> >>> Regards, >>> >>> Yann >>> >>> PS : i'm using those runtime options : >>> -vel_petscspace_order 2 -pres_petscspace_order 1 \ >>> -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_type >>> schur -pc_fieldsplit_schur_fact_type full? \ >>> -fieldsplit_velocity_pc_type lu >>> -fieldsplit_pressure_ksp_rtol 1.0e-10 >>> -fieldsplit_pressure_pc_type jacobi >>> >>> >>> --- >>> L'absence de virus dans ce courrier ?lectronique a ?t? >>> v?rifi?e par le logiciel antivirus Avast. >>> https://www.avast.com/antivirus >>> >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to >>> which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.stone at opengosim.com Fri Nov 24 10:03:56 2017 From: daniel.stone at opengosim.com (Daniel Stone) Date: Fri, 24 Nov 2017 16:03:56 +0000 Subject: [petsc-users] crash with PCASM in parallel Message-ID: Hello, I'm getting a memory exception crash every time I try to run the ASM preconditioner in parallel, can anyone help? I'm using a debugger so I can give most of the stack: PCApply_ASM (asm.c:line 485) KSPSolve (itfunc.c:line 599) KSPSetUp (itfunc.c:line 379) PCSetUp (precon.c: 924) PCSetUp_ILU (ilu.c:line 162) MatDestroy (matrix.c:line 1168) MatDestroy_MPIBAIJ_MatGetSubMatrices (baijov.c:line 609) The problem line is then in MatDestroy_MPIBAIJ_MatGetSubMatrices, in the file baijov.c, line 609: if (!submatj->id) { At this point submatj has no value, address 0x0, and so the attempt to access submatj->id causes the memory error. We can see in the lines just above 609 where submatj is supposed to come from, it should basically be an attribute of C->data, where C is the input matrix. Does anyone have any ideas where to start with getting this to work? I can provide a lot more information from the debugger if need. Many thanks in advance, Daniel Stone -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Nov 24 10:08:42 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 24 Nov 2017 16:08:42 +0000 Subject: [petsc-users] crash with PCASM in parallel In-Reply-To: References: Message-ID: First run under valgrind. https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind If that doesn't help send the exact output from the debugger (cut and paste) and the exact version of PETSc you are using. Also out put from -ksp_view_pre Barry > On Nov 24, 2017, at 8:03 AM, Daniel Stone wrote: > > Hello, > > I'm getting a memory exception crash every time I try to run the ASM preconditioner in parallel, can anyone help? > > I'm using a debugger so I can give most of the stack: > > PCApply_ASM (asm.c:line 485) > KSPSolve (itfunc.c:line 599) > KSPSetUp (itfunc.c:line 379) > PCSetUp (precon.c: 924) > PCSetUp_ILU (ilu.c:line 162) > MatDestroy (matrix.c:line 1168) > MatDestroy_MPIBAIJ_MatGetSubMatrices (baijov.c:line 609) > > > The problem line is then in MatDestroy_MPIBAIJ_MatGetSubMatrices, > in the file baijov.c, line 609: > > if (!submatj->id) { > > At this point submatj has no value, address 0x0, and so the attempt to access submatj->id > causes the memory error. We can see in the lines just above 609 where submatj is supposed to > come from, it should basically be an attribute of C->data, where C is the input matrix. > > Does anyone have any ideas where to start with getting this to work? I can provide a lot more information > from the debugger if need. > > Many thanks in advance, > > Daniel Stone > From edoardo.alinovi at gmail.com Fri Nov 24 14:47:02 2017 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Fri, 24 Nov 2017 21:47:02 +0100 Subject: [petsc-users] Re-use the matrix in petsc Message-ID: Dear petsc users, I am new to petsc, but I am really enjoying it. I am developing a CFD code in fortran and I have a (newby) question for you. I have a subroutine that assembles the matrices arising from the finite volume method. They are sparse and the coefficients change every time step, but their sparsity remains unchanged (i.e. non-zeros entries are always in the same position). In the subroutine, which is called every time step, I am basically following the manual and I do something like: - call MatCreate(PETSC_COMM_WORLD,A,ierr) - call MatSetSizes(A,lm,lm,M,M,ierr) - call MatSetType(A, myType,ierr) - call MatMPIAIJSetPreallocation(A,d_nz,PETSC_NULL_INTEGER, o_nz, PETSC_NULL_INTEGER,ierr) call MatSetUp(A,ierr) - call MatSetValues - call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr) - call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr) - Solve the system - call MatDestroy Here is my question: As you can note, every time step, I allocate the matrix, fill it, use it and the destroy it. Is there a way to avoid the matrix preallocation every time? I just would like to change matrix the entries and avoid the preallocation. Thank you for the help that you can give me, Edoardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Nov 24 14:56:05 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 24 Nov 2017 15:56:05 -0500 Subject: [petsc-users] Re-use the matrix in petsc In-Reply-To: References: Message-ID: On Fri, Nov 24, 2017 at 3:47 PM, Edoardo alinovi wrote: > Dear petsc users, > > I am new to petsc, but I am really enjoying it. > > I am developing a CFD code in fortran and I have a (newby) question for > you. > > I have a subroutine that assembles the matrices arising from the finite > volume method. They are sparse and the coefficients change every time step, > but their sparsity remains unchanged (i.e. non-zeros entries are always in > the same position). > > In the subroutine, which is called every time step, I am basically > following the manual and I do something like: > > - call MatCreate(PETSC_COMM_WORLD,A,ierr) > - call MatSetSizes(A,lm,lm,M,M,ierr) > - call MatSetType(A, myType,ierr) > - call MatMPIAIJSetPreallocation(A,d_nz,PETSC_NULL_INTEGER, o_nz, > PETSC_NULL_INTEGER,ierr) > call MatSetUp(A,ierr) > > - call MatSetValues > - call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr) > - call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr) > > - Solve the system > > - call MatDestroy > > > Here is my question: > > As you can note, every time step, I allocate the matrix, fill it, use it > and the destroy it. Is there a way to avoid the matrix preallocation every > time? I just would like to change matrix the entries and avoid the > preallocation. > Make the first 5 calls outside your timestep loop. At each step, (after the first timestep call MatZeroEntries()), MatSetValues() for all your values, and then matAssemblyBegin/End(). After all your timesteps, call MatDestroy(). If you use a factorization preconditioner, it will detect that you only change values, not structure. Thanks, Matt > Thank you for the help that you can give me, > > Edoardo > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at unito.it Fri Nov 24 15:21:55 2017 From: matteo.semplice at unito.it (Matteo Semplice) Date: Fri, 24 Nov 2017 22:21:55 +0100 Subject: [petsc-users] preallocation after DMCreateMatrix? Message-ID: <33f753de-e783-03a0-711a-510a88389cb7@unito.it> Hi. The manual for DMCreateMatrix says "Notes: This properly preallocates the number of nonzeros in the sparse matrix so you do not need to do it yourself", so I got the impression that one does not need to call the preallocation routine for the matrix and indeed in most examples listed in the manual page for DMCreateMatrix this is not done or (KSP tutorial ex4) it is called declaring 0 entries per row. However, if read in a mesh in a DMPlex ore create a DMDA and then call DMCreateMatrix, the resulting matrix errors out when I call MatSetValues. I have currently followed the suggestion of the error message and call MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE), but I'd like to fix this properly. How should I interpret that note in the manual? Thanks, ??? Matteo From bsmith at mcs.anl.gov Fri Nov 24 17:21:42 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 24 Nov 2017 23:21:42 +0000 Subject: [petsc-users] preallocation after DMCreateMatrix? In-Reply-To: <33f753de-e783-03a0-711a-510a88389cb7@unito.it> References: <33f753de-e783-03a0-711a-510a88389cb7@unito.it> Message-ID: <904C6941-777C-408C-98ED-A7F3524DE27E@mcs.anl.gov> > On Nov 24, 2017, at 1:21 PM, Matteo Semplice wrote: > > Hi. > > The manual for DMCreateMatrix says "Notes: This properly preallocates the number of nonzeros in the sparse matrix so you do not need to do it yourself", so I got the impression that one does not need to call the preallocation routine for the matrix and indeed in most examples listed in the manual page for DMCreateMatrix this is not done or (KSP tutorial ex4) it is called declaring 0 entries per row. > > However, if read in a mesh in a DMPlex ore create a DMDA and then call DMCreateMatrix, the resulting matrix errors out when I call MatSetValues. I have currently followed the suggestion of the error message and call MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE), but I'd like to fix this properly. With DMDA for sure it preallocates properly respecting the stencil width and periodicity you define. If it requires additional entries than you are putting in values outside the stencil width. For example if normally you have a stencil width of 1 and set it for the DMDA but you put in matrix entries for a stencil width of 2 at the boundary it will error out. Barry > > How should I interpret that note in the manual? > > Thanks, > > Matteo > From knepley at gmail.com Fri Nov 24 19:05:39 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 24 Nov 2017 20:05:39 -0500 Subject: [petsc-users] preallocation after DMCreateMatrix? In-Reply-To: <33f753de-e783-03a0-711a-510a88389cb7@unito.it> References: <33f753de-e783-03a0-711a-510a88389cb7@unito.it> Message-ID: On Fri, Nov 24, 2017 at 4:21 PM, Matteo Semplice wrote: > Hi. > > The manual for DMCreateMatrix says "Notes: This properly preallocates the > number of nonzeros in the sparse matrix so you do not need to do it > yourself", so I got the impression that one does not need to call the > preallocation routine for the matrix and indeed in most examples listed in > the manual page for DMCreateMatrix this is not done or (KSP tutorial ex4) > it is called declaring 0 entries per row. > > However, if read in a mesh in a DMPlex ore create a DMDA and then call > DMCreateMatrix, the resulting matrix errors out when I call MatSetValues. I > have currently followed the suggestion of the error message and call > MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE), but I'd > like to fix this properly. > It sounds like your nonzero pattern does not obey the topology. What nonzero pattern are you trying to input? Thanks, Matt > How should I interpret that note in the manual? > > Thanks, > > Matteo > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.stone at opengosim.com Sat Nov 25 11:09:59 2017 From: daniel.stone at opengosim.com (Daniel Stone) Date: Sat, 25 Nov 2017 17:09:59 +0000 Subject: [petsc-users] crash with PCASM in parallel In-Reply-To: References: Message-ID: Thanks for the quick response. I tried Valgrind. Apart from a couple of other warnings in other parts of my code, now fixed, it shows the same stack I described: ==22498== Invalid read of size 4 ==22498== at 0x55A5BFF: MatDestroy_MPIBAIJ_MatGetSubmatrices (baijov.c:609) ==22498== by 0x538A206: MatDestroy (matrix.c:1168) ==22498== by 0x5F21F2F: PCSetUp_ILU (ilu.c:162) ==22498== by 0x604898A: PCSetUp (precon.c:924) ==22498== by 0x6189005: KSPSetUp (itfunc.c:379) ==22498== by 0x618AB57: KSPSolve (itfunc.c:599) ==22498== by 0x5FD4816: PCApply_ASM (asm.c:485) ==22498== by 0x604204C: PCApply (precon.c:458) ==22498== by 0x6055C76: pcapply_ (preconf.c:223) ==22498== by 0x42F500: __cpr_linsolver_MOD_cprapply (cpr_linsolver.F90:419) ==22498== by 0x5F42431: ourshellapply (zshellpcf.c:41) ==22498== by 0x5F3697A: PCApply_Shell (shellpc.c:115) ==22498== by 0x604204C: PCApply (precon.c:458) ==22498== by 0x61B74E7: KSP_PCApply (kspimpl.h:251) ==22498== by 0x61B83C3: KSPInitialResidual (itres.c:67) ==22498== by 0x6104EF9: KSPSolve_BCGS (bcgs.c:44) ==22498== by 0x618B77E: KSPSolve (itfunc.c:656) ==22498== by 0x62BB02D: SNESSolve_NEWTONLS (ls.c:224) ==22498== by 0x6245706: SNESSolve (snes.c:3967) ==22498== by 0x6265A58: snessolve_ (zsnesf.c:167) ==22498== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==22498== PETSc version: this is from include/petscversion.h: #define PETSC_VERSION_RELEASE 0 #define PETSC_VERSION_MAJOR 3 #define PETSC_VERSION_MINOR 7 #define PETSC_VERSION_SUBMINOR 5 #define PETSC_VERSION_PATCH 0 #define PETSC_RELEASE_DATE "Apr, 25, 2016" #define PETSC_VERSION_DATE "unknown" This is the recommended version of PETSc for using with PFLOTRAN: http://documentation.pflotran.org/user_guide/how_to/installation/linux.html#linux-install Exact debugger output: It's a graphical debugger so there isn't much to copy/paste. The exact message is: Memory error detected in MatDestroy?_MPIBAIJ?_MatGetSubmatrices ?(baijov.c:609)?: null pointer dereference or unaligned memory access. I can provide screenshots if that would help. -ksp_view_pre: I tried this, it doesn't seem to give information about the KSPs in question. To be clear, this is part of an attempt to implement the two stage CPR-AMG preconditioner in PFLOTRAN, so the KSP and PC objects involved are: KSP: linear solver inside a SNES, inside PFLOTRAN (BCGS), which has a PC: PC: shell, the CPR implementation, which calls two more preconditioners, T1 and T2, in sequence: T1: another shell, which calls a KSP (GMRES), which has a PC which is HYPRE BOOMERAMG T2: ASM, this is the problematic one. -ksp_view_pre doesn't seem to give us any information about the ASM preconditioner object or it's ILU sub-KSPs; presumably it crashes before getting there. We do get a lot of output about T1, for example: KSP Object: T1 24 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test PC Object: 24 MPI processes type: hypre PC has not been set up so information may be incomplete HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0. HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0. HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1. HYPRE BoomerAMG: Outer relax weight (all) 1. HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Not using more complex smoothers. HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 24 MPI processes type: mpiaij rows=1122000, cols=1122000 total: nonzeros=7780000, allocated nonzeros=7780000 total number of mallocs used during MatSetValues calls =0 Thanks, Daniel Stone On Fri, Nov 24, 2017 at 4:08 PM, Smith, Barry F. wrote: > > First run under valgrind. https://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > > If that doesn't help send the exact output from the debugger (cut and > paste) and the exact version of PETSc you are using. > Also out put from -ksp_view_pre > > Barry > > > On Nov 24, 2017, at 8:03 AM, Daniel Stone > wrote: > > > > Hello, > > > > I'm getting a memory exception crash every time I try to run the ASM > preconditioner in parallel, can anyone help? > > > > I'm using a debugger so I can give most of the stack: > > > > PCApply_ASM (asm.c:line 485) > > KSPSolve (itfunc.c:line 599) > > KSPSetUp (itfunc.c:line 379) > > PCSetUp (precon.c: 924) > > PCSetUp_ILU (ilu.c:line 162) > > MatDestroy (matrix.c:line 1168) > > MatDestroy_MPIBAIJ_MatGetSubMatrices (baijov.c:line 609) > > > > > > The problem line is then in MatDestroy_MPIBAIJ_MatGetSubMatrices, > > in the file baijov.c, line 609: > > > > if (!submatj->id) { > > > > At this point submatj has no value, address 0x0, and so the attempt to > access submatj->id > > causes the memory error. We can see in the lines just above 609 where > submatj is supposed to > > come from, it should basically be an attribute of C->data, where C is > the input matrix. > > > > Does anyone have any ideas where to start with getting this to work? I > can provide a lot more information > > from the debugger if need. > > > > Many thanks in advance, > > > > Daniel Stone > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Nov 25 16:03:09 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sat, 25 Nov 2017 22:03:09 +0000 Subject: [petsc-users] crash with PCASM in parallel In-Reply-To: References: Message-ID: I cannot find that routine ~/Src/petsc ((v3.6.4)) arch-basic $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices ~/Src/petsc ((v3.6.4)) arch-basic $ git checkout v3.7.5 Previous HEAD position was 401b1b531b... Increase patchlevel to 3.6.4 HEAD is now at b827f1350a... Increase patchlevel to 3.7.5 ~/Src/petsc ((v3.7.5)) arch-basic $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices ~/Src/petsc ((v3.7.5)) arch-basic $ git checkout v3.7.0 error: pathspec 'v3.7.0' did not match any file(s) known to git. ~/Src/petsc ((v3.7.5)) arch-basic $ git checkout v3.7 Previous HEAD position was b827f1350a... Increase patchlevel to 3.7.5 HEAD is now at ae618e6989... release: set v3.7 strings ~/Src/petsc ((v3.7)) arch-basic $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices ~/Src/petsc ((v3.7)) arch-basic $ git checkout v3.8 Previous HEAD position was ae618e6989... release: set v3.7 strings HEAD is now at 0e50f9e530... release: set v3.8 strings ~/Src/petsc ((v3.8)) arch-basic $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices ~/Src/petsc ((v3.8)) arch-basic Are you using a the PETSc git repository and some particular branch or commit in it? > On Nov 25, 2017, at 11:09 AM, Daniel Stone wrote: > > Thanks for the quick response. > > I tried Valgrind. Apart from a couple of other warnings in other parts of my code, now fixed, it shows the same stack I described: > ==22498== Invalid read of size 4 > ==22498== at 0x55A5BFF: MatDestroy_MPIBAIJ_MatGetSubmatrices (baijov.c:609) > ==22498== by 0x538A206: MatDestroy (matrix.c:1168) > ==22498== by 0x5F21F2F: PCSetUp_ILU (ilu.c:162) > ==22498== by 0x604898A: PCSetUp (precon.c:924) > ==22498== by 0x6189005: KSPSetUp (itfunc.c:379) > ==22498== by 0x618AB57: KSPSolve (itfunc.c:599) > ==22498== by 0x5FD4816: PCApply_ASM (asm.c:485) > ==22498== by 0x604204C: PCApply (precon.c:458) > ==22498== by 0x6055C76: pcapply_ (preconf.c:223) > ==22498== by 0x42F500: __cpr_linsolver_MOD_cprapply (cpr_linsolver.F90:419) > ==22498== by 0x5F42431: ourshellapply (zshellpcf.c:41) > ==22498== by 0x5F3697A: PCApply_Shell (shellpc.c:115) > ==22498== by 0x604204C: PCApply (precon.c:458) > ==22498== by 0x61B74E7: KSP_PCApply (kspimpl.h:251) > ==22498== by 0x61B83C3: KSPInitialResidual (itres.c:67) > ==22498== by 0x6104EF9: KSPSolve_BCGS (bcgs.c:44) > ==22498== by 0x618B77E: KSPSolve (itfunc.c:656) > ==22498== by 0x62BB02D: SNESSolve_NEWTONLS (ls.c:224) > ==22498== by 0x6245706: SNESSolve (snes.c:3967) > ==22498== by 0x6265A58: snessolve_ (zsnesf.c:167) > ==22498== Address 0x0 is not stack'd, malloc'd or (recently) free'd > ==22498== > > PETSc version: this is from include/petscversion.h: > #define PETSC_VERSION_RELEASE 0 > #define PETSC_VERSION_MAJOR 3 > #define PETSC_VERSION_MINOR 7 > #define PETSC_VERSION_SUBMINOR 5 > #define PETSC_VERSION_PATCH 0 > #define PETSC_RELEASE_DATE "Apr, 25, 2016" > #define PETSC_VERSION_DATE "unknown" > > This is the recommended version of PETSc for using with PFLOTRAN: > http://documentation.pflotran.org/user_guide/how_to/installation/linux.html#linux-install > > Exact debugger output: > It's a graphical debugger so there isn't much to copy/paste. > The exact message is: > > Memory error detected in MatDestroy?_MPIBAIJ?_MatGetSubmatrices ?(baijov.c:609)?: > > null pointer dereference or unaligned memory access. > > I can provide screenshots if that would help. > > -ksp_view_pre: > I tried this, it doesn't seem to give information about the KSPs in question. To be clear, this is > part of an attempt to implement the two stage CPR-AMG preconditioner in PFLOTRAN, so the > KSP and PC objects involved are: > > KSP: linear solver inside a SNES, inside PFLOTRAN (BCGS), > which has a PC: > PC: shell, the CPR implementation, which calls two more preconditioners, T1 and T2, in sequence: > T1: another shell, which calls a KSP (GMRES), which has a PC which is HYPRE BOOMERAMG > T2: ASM, this is the problematic one. > > -ksp_view_pre doesn't seem to give us any information about the ASM preconditioner object > or it's ILU sub-KSPs; presumably it crashes before getting there. We do get a lot of output about > T1, for example: > > KSP Object: T1 24 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > PC Object: 24 MPI processes > type: hypre > PC has not been set up so information may be incomplete > HYPRE BoomerAMG preconditioning > HYPRE BoomerAMG: Cycle type V > HYPRE BoomerAMG: Maximum number of levels 25 > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0. > HYPRE BoomerAMG: Threshold for strong coupling 0.25 > HYPRE BoomerAMG: Interpolation truncation factor 0. > HYPRE BoomerAMG: Interpolation: max elements per row 0 > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > HYPRE BoomerAMG: Maximum row sums 0.9 > HYPRE BoomerAMG: Sweeps down 1 > HYPRE BoomerAMG: Sweeps up 1 > HYPRE BoomerAMG: Sweeps on coarse 1 > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > HYPRE BoomerAMG: Relax weight (all) 1. > HYPRE BoomerAMG: Outer relax weight (all) 1. > HYPRE BoomerAMG: Using CF-relaxation > HYPRE BoomerAMG: Not using more complex smoothers. > HYPRE BoomerAMG: Measure type local > HYPRE BoomerAMG: Coarsen type Falgout > HYPRE BoomerAMG: Interpolation type classical > linear system matrix = precond matrix: > Mat Object: 24 MPI processes > type: mpiaij > rows=1122000, cols=1122000 > total: nonzeros=7780000, allocated nonzeros=7780000 > total number of mallocs used during MatSetValues calls =0 > > Thanks, > > Daniel Stone > > > On Fri, Nov 24, 2017 at 4:08 PM, Smith, Barry F. wrote: > > First run under valgrind. https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > If that doesn't help send the exact output from the debugger (cut and paste) and the exact version of PETSc you are using. > Also out put from -ksp_view_pre > > Barry > > > On Nov 24, 2017, at 8:03 AM, Daniel Stone wrote: > > > > Hello, > > > > I'm getting a memory exception crash every time I try to run the ASM preconditioner in parallel, can anyone help? > > > > I'm using a debugger so I can give most of the stack: > > > > PCApply_ASM (asm.c:line 485) > > KSPSolve (itfunc.c:line 599) > > KSPSetUp (itfunc.c:line 379) > > PCSetUp (precon.c: 924) > > PCSetUp_ILU (ilu.c:line 162) > > MatDestroy (matrix.c:line 1168) > > MatDestroy_MPIBAIJ_MatGetSubMatrices (baijov.c:line 609) > > > > > > The problem line is then in MatDestroy_MPIBAIJ_MatGetSubMatrices, > > in the file baijov.c, line 609: > > > > if (!submatj->id) { > > > > At this point submatj has no value, address 0x0, and so the attempt to access submatj->id > > causes the memory error. We can see in the lines just above 609 where submatj is supposed to > > come from, it should basically be an attribute of C->data, where C is the input matrix. > > > > Does anyone have any ideas where to start with getting this to work? I can provide a lot more information > > from the debugger if need. > > > > Many thanks in advance, > > > > Daniel Stone > > > > From daniel.stone at opengosim.com Sat Nov 25 16:44:43 2017 From: daniel.stone at opengosim.com (Daniel Stone) Date: Sat, 25 Nov 2017 22:44:43 +0000 Subject: [petsc-users] crash with PCASM in parallel In-Reply-To: References: Message-ID: The PFLOTRAN wiki seems to want us to install a very old version of PETSc: http://documentation.pflotran.org/user_guide/how_to/installation/linux.html#linux-install The relevant part is: " Install PETSc 3.1. Clone petsc and check out the supported version: git clone https://bitbucket.org/petsc/petsc petsc cd petsc git checkout xsdk-0.2.0 NOTE:PFLOTRAN currently uses a snapshot of PETSc ?maint? (release) branch. The only supported snapshot/version is specified by the changeset-id above. The supported version will change periodically as we need bug fixes or new features and changes will be announced on the mailing lists. The supported version of petsc is used on the buildbot automated testing system. " Doing git checkout xsdk-0.2.0 causes petsc/src/mat/impls/baij/mpi/baijov.c to be replaced with a version that contains the subroutine. I'm fairly certain this version is ancient. I've had trouble before with just installing the most recent version of PETSc and then trying to run PFLOTRAN on top of it. I've asked my employer where we stand with supported PETSc versions. Also plan to try compiling PFLOTRAN on top of a more recent PETSc, to see if it works this time. On Sat, Nov 25, 2017 at 10:03 PM, Smith, Barry F. wrote: > > I cannot find that routine > > ~/Src/petsc ((v3.6.4)) arch-basic > $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices > ~/Src/petsc ((v3.6.4)) arch-basic > $ git checkout v3.7.5 > Previous HEAD position was 401b1b531b... Increase patchlevel to 3.6.4 > HEAD is now at b827f1350a... Increase patchlevel to 3.7.5 > ~/Src/petsc ((v3.7.5)) arch-basic > $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices > ~/Src/petsc ((v3.7.5)) arch-basic > $ git checkout v3.7.0 > error: pathspec 'v3.7.0' did not match any file(s) known to git. > ~/Src/petsc ((v3.7.5)) arch-basic > $ git checkout v3.7 > Previous HEAD position was b827f1350a... Increase patchlevel to 3.7.5 > HEAD is now at ae618e6989... release: set v3.7 strings > ~/Src/petsc ((v3.7)) arch-basic > $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices > ~/Src/petsc ((v3.7)) arch-basic > $ git checkout v3.8 > Previous HEAD position was ae618e6989... release: set v3.7 strings > HEAD is now at 0e50f9e530... release: set v3.8 strings > ~/Src/petsc ((v3.8)) arch-basic > $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices > ~/Src/petsc ((v3.8)) arch-basic > > > > Are you using a the PETSc git repository and some particular branch or > commit in it? > > > > On Nov 25, 2017, at 11:09 AM, Daniel Stone > wrote: > > > > Thanks for the quick response. > > > > I tried Valgrind. Apart from a couple of other warnings in other parts > of my code, now fixed, it shows the same stack I described: > > ==22498== Invalid read of size 4 > > ==22498== at 0x55A5BFF: MatDestroy_MPIBAIJ_MatGetSubmatrices > (baijov.c:609) > > ==22498== by 0x538A206: MatDestroy (matrix.c:1168) > > ==22498== by 0x5F21F2F: PCSetUp_ILU (ilu.c:162) > > ==22498== by 0x604898A: PCSetUp (precon.c:924) > > ==22498== by 0x6189005: KSPSetUp (itfunc.c:379) > > ==22498== by 0x618AB57: KSPSolve (itfunc.c:599) > > ==22498== by 0x5FD4816: PCApply_ASM (asm.c:485) > > ==22498== by 0x604204C: PCApply (precon.c:458) > > ==22498== by 0x6055C76: pcapply_ (preconf.c:223) > > ==22498== by 0x42F500: __cpr_linsolver_MOD_cprapply > (cpr_linsolver.F90:419) > > ==22498== by 0x5F42431: ourshellapply (zshellpcf.c:41) > > ==22498== by 0x5F3697A: PCApply_Shell (shellpc.c:115) > > ==22498== by 0x604204C: PCApply (precon.c:458) > > ==22498== by 0x61B74E7: KSP_PCApply (kspimpl.h:251) > > ==22498== by 0x61B83C3: KSPInitialResidual (itres.c:67) > > ==22498== by 0x6104EF9: KSPSolve_BCGS (bcgs.c:44) > > ==22498== by 0x618B77E: KSPSolve (itfunc.c:656) > > ==22498== by 0x62BB02D: SNESSolve_NEWTONLS (ls.c:224) > > ==22498== by 0x6245706: SNESSolve (snes.c:3967) > > ==22498== by 0x6265A58: snessolve_ (zsnesf.c:167) > > ==22498== Address 0x0 is not stack'd, malloc'd or (recently) free'd > > ==22498== > > > > PETSc version: this is from include/petscversion.h: > > #define PETSC_VERSION_RELEASE 0 > > #define PETSC_VERSION_MAJOR 3 > > #define PETSC_VERSION_MINOR 7 > > #define PETSC_VERSION_SUBMINOR 5 > > #define PETSC_VERSION_PATCH 0 > > #define PETSC_RELEASE_DATE "Apr, 25, 2016" > > #define PETSC_VERSION_DATE "unknown" > > > > This is the recommended version of PETSc for using with PFLOTRAN: > > http://documentation.pflotran.org/user_guide/how_to/ > installation/linux.html#linux-install > > > > Exact debugger output: > > It's a graphical debugger so there isn't much to copy/paste. > > The exact message is: > > > > Memory error detected in MatDestroy?_MPIBAIJ?_MatGetSubmatrices > ?(baijov.c:609)?: > > > > null pointer dereference or unaligned memory access. > > > > I can provide screenshots if that would help. > > > > -ksp_view_pre: > > I tried this, it doesn't seem to give information about the KSPs in > question. To be clear, this is > > part of an attempt to implement the two stage CPR-AMG preconditioner in > PFLOTRAN, so the > > KSP and PC objects involved are: > > > > KSP: linear solver inside a SNES, inside PFLOTRAN (BCGS), > > which has a PC: > > PC: shell, the CPR implementation, which calls two more > preconditioners, T1 and T2, in sequence: > > T1: another shell, which calls a KSP (GMRES), which has a PC which > is HYPRE BOOMERAMG > > T2: ASM, this is the problematic one. > > > > -ksp_view_pre doesn't seem to give us any information about the ASM > preconditioner object > > or it's ILU sub-KSPs; presumably it crashes before getting there. We do > get a lot of output about > > T1, for example: > > > > KSP Object: T1 24 MPI processes > > type: gmres > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using DEFAULT norm type for convergence test > > PC Object: 24 MPI processes > > type: hypre > > PC has not been set up so information may be incomplete > > HYPRE BoomerAMG preconditioning > > HYPRE BoomerAMG: Cycle type V > > HYPRE BoomerAMG: Maximum number of levels 25 > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0. > > HYPRE BoomerAMG: Threshold for strong coupling 0.25 > > HYPRE BoomerAMG: Interpolation truncation factor 0. > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > HYPRE BoomerAMG: Maximum row sums 0.9 > > HYPRE BoomerAMG: Sweeps down 1 > > HYPRE BoomerAMG: Sweeps up 1 > > HYPRE BoomerAMG: Sweeps on coarse 1 > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > HYPRE BoomerAMG: Relax weight (all) 1. > > HYPRE BoomerAMG: Outer relax weight (all) 1. > > HYPRE BoomerAMG: Using CF-relaxation > > HYPRE BoomerAMG: Not using more complex smoothers. > > HYPRE BoomerAMG: Measure type local > > HYPRE BoomerAMG: Coarsen type Falgout > > HYPRE BoomerAMG: Interpolation type classical > > linear system matrix = precond matrix: > > Mat Object: 24 MPI processes > > type: mpiaij > > rows=1122000, cols=1122000 > > total: nonzeros=7780000, allocated nonzeros=7780000 > > total number of mallocs used during MatSetValues calls =0 > > > > Thanks, > > > > Daniel Stone > > > > > > On Fri, Nov 24, 2017 at 4:08 PM, Smith, Barry F. > wrote: > > > > First run under valgrind. https://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > > > > If that doesn't help send the exact output from the debugger (cut and > paste) and the exact version of PETSc you are using. > > Also out put from -ksp_view_pre > > > > Barry > > > > > On Nov 24, 2017, at 8:03 AM, Daniel Stone > wrote: > > > > > > Hello, > > > > > > I'm getting a memory exception crash every time I try to run the ASM > preconditioner in parallel, can anyone help? > > > > > > I'm using a debugger so I can give most of the stack: > > > > > > PCApply_ASM (asm.c:line 485) > > > KSPSolve (itfunc.c:line 599) > > > KSPSetUp (itfunc.c:line 379) > > > PCSetUp (precon.c: 924) > > > PCSetUp_ILU (ilu.c:line 162) > > > MatDestroy (matrix.c:line 1168) > > > MatDestroy_MPIBAIJ_MatGetSubMatrices (baijov.c:line 609) > > > > > > > > > The problem line is then in MatDestroy_MPIBAIJ_MatGetSubMatrices, > > > in the file baijov.c, line 609: > > > > > > if (!submatj->id) { > > > > > > At this point submatj has no value, address 0x0, and so the attempt to > access submatj->id > > > causes the memory error. We can see in the lines just above 609 where > submatj is supposed to > > > come from, it should basically be an attribute of C->data, where C is > the input matrix. > > > > > > Does anyone have any ideas where to start with getting this to work? I > can provide a lot more information > > > from the debugger if need. > > > > > > Many thanks in advance, > > > > > > Daniel Stone > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stormweiner at berkeley.edu Sat Nov 25 19:25:58 2017 From: stormweiner at berkeley.edu (Storm Weiner) Date: Sat, 25 Nov 2017 17:25:58 -0800 Subject: [petsc-users] storing many petsc objects in a single file Message-ID: Hey there, For simulations, its useful to store the history as a series of state vectors. For simulations with many time-steps it can get annoying to store each state vector as a separate file. It would be useful if there were some way to manage a database of petsc vectors. To save the current time-step, append the state vector to the database. To restart a simulation, load the corresponding state vector out of the database. Is there a standard way to do this in PETSc? Thanks, Storm -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhbaghaei at mail.sjtu.edu.cn Sat Nov 25 19:31:31 2017 From: mhbaghaei at mail.sjtu.edu.cn (Mohammad Hassan Baghaei) Date: Sun, 26 Nov 2017 09:31:31 +0800 (CST) Subject: [petsc-users] tensor-product mesh Message-ID: <000401d36656$2ebf5cb0$8c3e1610$@mail.sjtu.edu.cn> Hi I am going to solve the PDEs on a domain of a circle, which I need to use the polar coordinates. As dm is supposed for Cartesian coordinate system. Is it possible to define a new coordinate system in that or I need to use dmplex for that. Thanks for your great time answering me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Sun Nov 26 02:27:45 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sun, 26 Nov 2017 09:27:45 +0100 Subject: [petsc-users] storing many petsc objects in a single file In-Reply-To: References: Message-ID: > El 26 nov 2017, a las 2:25, Storm Weiner escribi?: > > Hey there, > > For simulations, its useful to store the history as a series of state vectors. For simulations with many time-steps it can get annoying to store each state vector as a separate file. It would be useful if there were some way to manage a database of petsc vectors. To save the current time-step, append the state vector to the database. To restart a simulation, load the corresponding state vector out of the database. > > Is there a standard way to do this in PETSc? > > Thanks, > Storm In the command line, you can use the ?append? option for the viewer. For instance in the MFN solver in SLEPc you can do this: $ ./ex23 -mfn_view_solution binary:vectors.bin::append It will save one vector in each call to MFNSolve(), and all vectors will be stored in the same file ?vectors.bin?. Alternatively, in the source code you can use PetscViewerBinaryOpen() to open the viewer, then save as many vectors as you want with VecView(), and finally close the file with PetscViewerDestroy(). Use VecLoad() to load the vectors. Jose From bsmith at mcs.anl.gov Sun Nov 26 07:49:12 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sun, 26 Nov 2017 13:49:12 +0000 Subject: [petsc-users] storing many petsc objects in a single file In-Reply-To: References: Message-ID: Storm, Specifically for TS there is an abstract object called TSTrajectory which is a way to store histories of simulations (it is used by TSAdjoint but also useable for other purposes). It has several ways to store histories and more can be added. One draw back to saving everything in PETSc binary in one file is that we don't have simple support for random access of a particular vector. You can also store to HDF5 format and some others that may be useful for you: Barry > On Nov 26, 2017, at 2:27 AM, Jose E. Roman wrote: > > > >> El 26 nov 2017, a las 2:25, Storm Weiner escribi?: >> >> Hey there, >> >> For simulations, its useful to store the history as a series of state vectors. For simulations with many time-steps it can get annoying to store each state vector as a separate file. It would be useful if there were some way to manage a database of petsc vectors. To save the current time-step, append the state vector to the database. To restart a simulation, load the corresponding state vector out of the database. >> >> Is there a standard way to do this in PETSc? >> >> Thanks, >> Storm > > In the command line, you can use the ?append? option for the viewer. For instance in the MFN solver in SLEPc you can do this: > $ ./ex23 -mfn_view_solution binary:vectors.bin::append > It will save one vector in each call to MFNSolve(), and all vectors will be stored in the same file ?vectors.bin?. > > Alternatively, in the source code you can use PetscViewerBinaryOpen() to open the viewer, then save as many vectors as you want with VecView(), and finally close the file with PetscViewerDestroy(). Use VecLoad() to load the vectors. > > Jose > From jed at jedbrown.org Sun Nov 26 22:12:36 2017 From: jed at jedbrown.org (Jed Brown) Date: Sun, 26 Nov 2017 21:12:36 -0700 Subject: [petsc-users] tensor-product mesh In-Reply-To: <000401d36656$2ebf5cb0$8c3e1610$@mail.sjtu.edu.cn> References: <000401d36656$2ebf5cb0$8c3e1610$@mail.sjtu.edu.cn> Message-ID: <871skks6kr.fsf@jedbrown.org> You could use DMDA but would need to write some dummy equations for all the points replicated in the center, and that might slow your solver. Mohammad Hassan Baghaei writes: > Hi > > I am going to solve the PDEs on a domain of a circle, which I need to use > the polar coordinates. As dm is supposed for Cartesian coordinate system. Is > it possible to define a new coordinate system in that or I need to use > dmplex for that. Thanks for your great time answering me. From gmulas at oa-cagliari.inaf.it Mon Nov 27 03:12:25 2017 From: gmulas at oa-cagliari.inaf.it (Giacomo Mulas) Date: Mon, 27 Nov 2017 10:12:25 +0100 (CET) Subject: [petsc-users] consistence of PETSC/SLEPC with MPI, BLACS, SCALAPACK calls... Message-ID: Hello. I am using, within the same C code, both SLEPC/PETSC and Scalapack/Blacs. On the big parallel machine on which I do production runs, I compiled SLEPC/PETSC with the "--known-64-bit-blas-indices" and "--with-64-bit-indices" options, linking them with the ilp64 version of the Intel MKL libraries, while on the workstation on which I do the development I use the standard libraries provided by the (debian, in my case) packaging system. For Slepc/Petsc themselves I just use the PETSC data types and this automagically defines integers of the appropriate size on both machines. However, when using BLACS, Scalapack and MPI directly in the same code, I will obviously need to use consistent function definitions for them as well. Do I need to set up some complicated independent #ifdef machinery for this or are there some appropriate PETSC data types that I can use that will ensure this consistency? Of course I am including slepc/petsc include files, so all PETSC data types are defined according to the local PETSC/SLEPC options. Can some PETSC developer give me some hint on how to make my MPI, BLACS, SCALAPACK (and PBLAS etc.) calls clean and consistent with this? Perhaps even referring to some examples in the PETSC source code that I can read and take as a reference for this. Thanks in advance Giacomo -- _________________________________________________________________ Giacomo Mulas _________________________________________________________________ INAF - Osservatorio Astronomico di Cagliari via della scienza 5 - 09047 Selargius (CA) tel. +39 070 71180255 mob. : +39 329 6603810 _________________________________________________________________ "When the storms are raging around you, stay right where you are" (Freddy Mercury) _________________________________________________________________ From jroman at dsic.upv.es Mon Nov 27 03:33:55 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 27 Nov 2017 10:33:55 +0100 Subject: [petsc-users] consistence of PETSC/SLEPC with MPI, BLACS, SCALAPACK calls... In-Reply-To: References: Message-ID: <39CC65C5-03D0-4C5A-A008-C13380DF16AA@dsic.upv.es> You have PetscInt, PetscBLASInt and PetscMPIInt. Presumably ScaLAPACK uses the same integer length as BLAS, so you should use PetscBLASInt variables for the arguments of ScaLAPACK subroutines. See also the documentation for PetscBLASIntCast() http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscBLASInt.html#PetscBLASInt http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscBLASIntCast.html Similarly for MPI calls with PetscMPIInt. Jose > El 27 nov 2017, a las 10:12, Giacomo Mulas escribi?: > > Hello. > > I am using, within the same C code, both SLEPC/PETSC and Scalapack/Blacs. On the big parallel machine on which I do production runs, I compiled > SLEPC/PETSC with the "--known-64-bit-blas-indices" and > "--with-64-bit-indices" options, linking them with the ilp64 version of the > Intel MKL libraries, while on the workstation on which I do the development > I use the standard libraries provided by the (debian, in my case) packaging > system. For Slepc/Petsc themselves I just use the PETSC data types and this > automagically defines integers of the appropriate size on both machines. > > However, when using BLACS, Scalapack and MPI directly in the same code, I > will obviously need to use consistent function definitions for them as well. Do I need to set up some complicated independent #ifdef machinery for this > or are there some appropriate PETSC data types that I can use that will > ensure this consistency? Of course I am including slepc/petsc include > files, so all PETSC data types are defined according to the local > PETSC/SLEPC options. Can some PETSC developer give me some hint on how to > make my MPI, BLACS, SCALAPACK (and PBLAS etc.) calls clean and consistent > with this? Perhaps even referring to some examples in the PETSC source code > that I can read and take as a reference for this. > > Thanks in advance > Giacomo > > -- > _________________________________________________________________ > > Giacomo Mulas > _________________________________________________________________ > > INAF - Osservatorio Astronomico di Cagliari > via della scienza 5 - 09047 Selargius (CA) > > tel. +39 070 71180255 > mob. : +39 329 6603810 > _________________________________________________________________ > > "When the storms are raging around you, stay right where you are" > (Freddy Mercury) > _________________________________________________________________ From gmulas at oa-cagliari.inaf.it Mon Nov 27 03:41:01 2017 From: gmulas at oa-cagliari.inaf.it (Giacomo Mulas) Date: Mon, 27 Nov 2017 10:41:01 +0100 (CET) Subject: [petsc-users] consistence of PETSC/SLEPC with MPI, BLACS, SCALAPACK calls... In-Reply-To: <39CC65C5-03D0-4C5A-A008-C13380DF16AA@dsic.upv.es> References: <39CC65C5-03D0-4C5A-A008-C13380DF16AA@dsic.upv.es> Message-ID: On Mon, 27 Nov 2017, Jose E. Roman wrote: > You have PetscInt, PetscBLASInt and PetscMPIInt. > > Presumably ScaLAPACK uses the same integer length as BLAS, so you should use PetscBLASInt variables for the arguments of ScaLAPACK subroutines. See also the documentation for PetscBLASIntCast() > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscBLASInt.html#PetscBLASInt > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscBLASIntCast.html > > Similarly for MPI calls with PetscMPIInt. Thanks! Giacomo -- _________________________________________________________________ Giacomo Mulas _________________________________________________________________ INAF - Osservatorio Astronomico di Cagliari via della scienza 5 - 09047 Selargius (CA) tel. +39 070 71180255 mob. : +39 329 6603810 _________________________________________________________________ "When the storms are raging around you, stay right where you are" (Freddy Mercury) _________________________________________________________________ From gmulas at oa-cagliari.inaf.it Mon Nov 27 04:01:25 2017 From: gmulas at oa-cagliari.inaf.it (Giacomo Mulas) Date: Mon, 27 Nov 2017 11:01:25 +0100 (CET) Subject: [petsc-users] consistence of PETSC/SLEPC with MPI, BLACS, SCALAPACK calls... In-Reply-To: <39CC65C5-03D0-4C5A-A008-C13380DF16AA@dsic.upv.es> References: <39CC65C5-03D0-4C5A-A008-C13380DF16AA@dsic.upv.es> Message-ID: On Mon, 27 Nov 2017, Jose E. Roman wrote: > You have PetscInt, PetscBLASInt and PetscMPIInt. I will try to work my way through it. In most cases it looks very clear. There are some borderline cases in which things are not so clearly cut though. I.e. if I call Cblacs_get( -1, 0, &ictxt ) to get a context, I would guess that the context should be PetscMPIInt even if I get it via a blacs call. Similarly for Cblacs_gridinit() or Cblacs_gridinfo() they all deal with strictly MPI stuff so I bet they should all get and return PetscMPIInt variables. On the other hand, if I use lnr = numroc_(&n, &nbr, &myrow, &ione, &nprows); to get the number of rows (in this case) locally allocated to a distributed blacs array, I would bet that lnr, n, nbr, ione should be PetscBlasInt and myrow, nprows should be PetscMPIInt. Would anyone proficient with both blacs/scalapack and petsc care to confirm or correct if wrong? Possibly just pointing me to where to look to find the answers without bothering him/her further? Thanks Giacomo > >> El 27 nov 2017, a las 10:12, Giacomo Mulas escribi?: >> >> Hello. >> >> I am using, within the same C code, both SLEPC/PETSC and Scalapack/Blacs. On the big parallel machine on which I do production runs, I compiled >> SLEPC/PETSC with the "--known-64-bit-blas-indices" and >> "--with-64-bit-indices" options, linking them with the ilp64 version of the >> Intel MKL libraries, while on the workstation on which I do the development >> I use the standard libraries provided by the (debian, in my case) packaging >> system. For Slepc/Petsc themselves I just use the PETSC data types and this >> automagically defines integers of the appropriate size on both machines. >> >> However, when using BLACS, Scalapack and MPI directly in the same code, I >> will obviously need to use consistent function definitions for them as well. Do I need to set up some complicated independent #ifdef machinery for this >> or are there some appropriate PETSC data types that I can use that will >> ensure this consistency? Of course I am including slepc/petsc include >> files, so all PETSC data types are defined according to the local >> PETSC/SLEPC options. Can some PETSC developer give me some hint on how to >> make my MPI, BLACS, SCALAPACK (and PBLAS etc.) calls clean and consistent >> with this? Perhaps even referring to some examples in the PETSC source code >> that I can read and take as a reference for this. >> >> Thanks in advance >> Giacomo >> >> -- >> _________________________________________________________________ >> >> Giacomo Mulas >> _________________________________________________________________ >> >> INAF - Osservatorio Astronomico di Cagliari >> via della scienza 5 - 09047 Selargius (CA) >> >> tel. +39 070 71180255 >> mob. : +39 329 6603810 >> _________________________________________________________________ >> >> "When the storms are raging around you, stay right where you are" >> (Freddy Mercury) >> _________________________________________________________________ > > -- _________________________________________________________________ Giacomo Mulas _________________________________________________________________ INAF - Osservatorio Astronomico di Cagliari via della scienza 5 - 09047 Selargius (CA) tel. +39 070 71180255 mob. : +39 329 6603810 _________________________________________________________________ "When the storms are raging around you, stay right where you are" (Freddy Mercury) _________________________________________________________________ From edoardo.alinovi at gmail.com Mon Nov 27 04:54:40 2017 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Mon, 27 Nov 2017 11:54:40 +0100 Subject: [petsc-users] Petcs preconditioners Message-ID: Dear users, Since this mailing list has already solved one of my problem, I would like to take advantage of your knowlege one more time (thank you so much!). I am a new user (Fortran) and, as you can imagine, I have a lot of questions. Actually, some of them are about preconditioners. In particular I would like to ask: - I'm dealing with matrices that change the value of the coeffs in time, but maintain the same zero pattern. Is it possible to create a ksp (linear solver + preconditioner) object before the time loop and pass it to the solver subroutine, avoiding the construction/destruction every iteration? In this case, is the preconditioner updated each iteration? - Is the ILU suitable for parallel matrix format (MPIAIJ)? Is there in petsc the DILU (diagonal ILU) preconditioner? - I would like to use a multigrid preconditioner. Is it better to use GAMG or boomerAMG in your opinion? In the manual there are a lot of options that create a mess in my head... Would you so kind to post a standard example of settings for GAMG/boomerAMG? Every hint will be very appreciated. Thank you very much and sorry for the newby questions, Edoardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon Nov 27 05:01:49 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 27 Nov 2017 14:01:49 +0300 Subject: [petsc-users] Petcs preconditioners In-Reply-To: References: Message-ID: > > > - I'm dealing with matrices that change the value of the coeffs in time, > but maintain the same zero pattern. Is it possible to create a ksp (linear > solver + preconditioner) object before the time loop and pass it to the > solver subroutine, avoiding the construction/destruction every iteration? > In this case, is the preconditioner updated each iteration? > > Just call KSPSetOperators evry time the matrix get updated. PETSc preconditioners are aware of any nonzero pattern / matrix value changes > - Is the ILU suitable for parallel matrix format (MPIAIJ)? Is there in > petsc the DILU (diagonal ILU) preconditioner? > Petsc does not have a parallel ILU. Default is block Jacoby, with ILU(0) on the diagonal blocks. Is this DILU? > > - I would like to use a multigrid preconditioner. Is it better to use > GAMG or boomerAMG in your opinion? In the manual there are a lot of options > that create a mess in my head... Would you so kind to post a standard > example of settings for GAMG/boomerAMG? > > What is the problem you are solving? For now, you can just select boomerAMG (-pc_type hypre) or GAMG a(-pc_type gamg) at command line and you can start experimenting with their options. Depdending on the structure of your problem, you may need extra customizations (e.g MatSetNearNullSpace for passing rigid body modes for elasticity) > Every hint will be very appreciated. > > Thank you very much and sorry for the newby questions, > > Edoardo > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukas.drinkt.thee at gmail.com Mon Nov 27 05:59:18 2017 From: lukas.drinkt.thee at gmail.com (Lukas van de Wiel) Date: Mon, 27 Nov 2017 12:59:18 +0100 Subject: [petsc-users] oddities with MPI Message-ID: Good day! configuring PETSc 3.8.2 gives a spot of a bother with the openMPI bit and I wondered if other PETSc users experienced the same issue. Although PETSc does not appear to be to blame for this, this mailing list seemed a logical place to mention it in the hope that somebody may already have found a trivial solution. Otherwise I will just dig deeper: My standard configure script: ./configure \ COPTFLAGS='-O3 -march=native -mtune=native' \ CXXOPTFLAGS='-O3 -march=native -mtune=native' \ FOPTFLAGS='-O3 -march=native -mtune=native' \ --with-debugging=0 \ --with-x=0 \ --with-ssl=0 \ --with-shared-libraries=0 \ --download-metis \ --download-parmetis \ --download-fblaslapack \ --download-scalapack \ --download-openmpi \ --download-mumps \ --download-hypre \ --download-ptscotch which works perfectly on our compute cluster, which runs Slackware Linux 13.37.0, from 2011... and mpif90 wrapping to GCC 4.5.2. I know... don't ask... However, on more modern machines, such a Debian 9 machine I test with, and a Mac OS X machine from one of our users, all with contemporary GCC versions, give issues when compiling the openmpi part: /usr/bin/ld: ../../../oshmem/.libs/liboshmem.a(memheap_base_static.o): undefined reference to symbol '_end' //usr/lib/x86_64-linux-gnu/libnl-route-3.so.200: error adding symbols: DSO missing from command line (see attached the full configure.log from one of the Debian 9 machines) I would expect that our archaic Slackware cluster has trouble compiling recent PETSc version, and the newer machines would experience less problems, but exactly the opposite is true. Has anybody run into similar issues? Best wishes Lukas -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1726781 bytes Desc: not available URL: From lukas.drinkt.thee at gmail.com Mon Nov 27 06:32:51 2017 From: lukas.drinkt.thee at gmail.com (Lukas van de Wiel) Date: Mon, 27 Nov 2017 13:32:51 +0100 Subject: [petsc-users] oddities with MPI In-Reply-To: References: Message-ID: Hi guys, apparently the newer systems need libnl installed, which has 'end' and potentially other things needed by openmpi. sudo apt-get install libnl-3-dev will fix it in Debian. Best wishes Lukas On 11/27/17, Lukas van de Wiel wrote: > Good day! > > configuring PETSc 3.8.2 gives a spot of a bother with the openMPI bit > and I wondered if other PETSc users experienced the same issue. > Although PETSc does not appear to be to blame for this, this mailing > list seemed a logical place to mention it in the hope that somebody > may already have found a trivial solution. Otherwise I will just dig > deeper: > > My standard configure script: > > > ./configure \ > COPTFLAGS='-O3 -march=native -mtune=native' \ > CXXOPTFLAGS='-O3 -march=native -mtune=native' \ > FOPTFLAGS='-O3 -march=native -mtune=native' \ > --with-debugging=0 \ > --with-x=0 \ > --with-ssl=0 \ > --with-shared-libraries=0 \ > --download-metis \ > --download-parmetis \ > --download-fblaslapack \ > --download-scalapack \ > --download-openmpi \ > --download-mumps \ > --download-hypre \ > --download-ptscotch > > which works perfectly on our compute cluster, which runs Slackware > Linux 13.37.0, from 2011... and mpif90 wrapping to GCC 4.5.2. I > know... don't ask... > > However, on more modern machines, such a Debian 9 machine I test with, > and a Mac OS X machine from one of our users, all with contemporary > GCC versions, give issues when compiling the openmpi part: > > /usr/bin/ld: ../../../oshmem/.libs/liboshmem.a(memheap_base_static.o): > undefined reference to symbol '_end' > //usr/lib/x86_64-linux-gnu/libnl-route-3.so.200: error adding symbols: > DSO missing from command line > > (see attached the full configure.log from one of the Debian 9 machines) > > I would expect that our archaic Slackware cluster has trouble > compiling recent PETSc version, and the newer machines would > experience less problems, but exactly the opposite is true. > > Has anybody run into similar issues? > > Best wishes > > Lukas > From bsmith at mcs.anl.gov Mon Nov 27 08:52:42 2017 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 27 Nov 2017 14:52:42 +0000 Subject: [petsc-users] Petcs preconditioners In-Reply-To: References: Message-ID: > On Nov 27, 2017, at 5:01 AM, Stefano Zampini wrote: > > > > > - I'm dealing with matrices that change the value of the coeffs in time, but maintain the same zero pattern. Is it possible to create a ksp (linear solver + preconditioner) object before the time loop and pass it to the solver subroutine, avoiding the construction/destruction every iteration? In this case, is the preconditioner updated each iteration? > > > Just call KSPSetOperators evry time the matrix get updated. PETSc preconditioners are aware of any nonzero pattern / matrix value changes Actually you do not even need to call KSPSetOperators() again. Just set the values in the matrix, call MatAssemblyBegin/End() and KSP will automatically build a new preconditioner only when needed. You can call KSPSetReusePreconditioner() and PETSc will keep the same preconditioner even if you change the matrix, this is useful if the matrix changes only slightly. > > > - Is the ILU suitable for parallel matrix format (MPIAIJ)? Is there in petsc the DILU (diagonal ILU) preconditioner? No DILU > > Petsc does not have a parallel ILU. Default is block Jacoby, with ILU(0) on the diagonal blocks. Is this DILU? > > - I would like to use a multigrid preconditioner. Is it better to use GAMG or boomerAMG in your opinion? In the manual there are a lot of options that create a mess in my head... Would you so kind to post a standard example of settings for GAMG/boomerAMG? > > > What is the problem you are solving? For now, you can just select boomerAMG (-pc_type hypre) or GAMG a(-pc_type gamg) at command line and you can start experimenting with their options. Depdending on the structure of your problem, you may need extra customizations (e.g MatSetNearNullSpace for passing rigid body modes for elasticity) > > Every hint will be very appreciated. > > Thank you very much and sorry for the newby questions, > > Edoardo > > > > -- > Stefano From balay at mcs.anl.gov Mon Nov 27 12:09:43 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 27 Nov 2017 12:09:43 -0600 Subject: [petsc-users] consistence of PETSC/SLEPC with MPI, BLACS, SCALAPACK calls... In-Reply-To: References: <39CC65C5-03D0-4C5A-A008-C13380DF16AA@dsic.upv.es> Message-ID: These questions are more pertinant to MKL - i.e what interface does MKL ilp64 blacs library provide for Cblacs_get() etc.. The following url has some info - and some references to ilp64 MPI https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/474149 [PETSc is not tested with ilp64 MPI] Satish On Mon, 27 Nov 2017, Giacomo Mulas wrote: > On Mon, 27 Nov 2017, Jose E. Roman wrote: > > > You have PetscInt, PetscBLASInt and PetscMPIInt. > > I will try to work my way through it. In most cases it looks very clear. > There are some borderline cases in which things are not so clearly cut > though. > I.e. if I call Cblacs_get( -1, 0, &ictxt ) to get a context, I would guess > that the context should be PetscMPIInt even if I get it via a blacs call. > Similarly for Cblacs_gridinit() or Cblacs_gridinfo() they all deal with > strictly MPI stuff so I bet they should all get and return PetscMPIInt > variables. On the other hand, if I use lnr = numroc_(&n, &nbr, &myrow, &ione, > &nprows); > to get the number of rows (in this case) locally allocated to a distributed > blacs array, I would bet that lnr, n, nbr, ione should be PetscBlasInt and > myrow, nprows should be PetscMPIInt. Would anyone proficient with both > blacs/scalapack and petsc care to confirm > or correct if wrong? Possibly just pointing me to where to look to find the > answers without bothering him/her further? > > Thanks > Giacomo > > > > > > El 27 nov 2017, a las 10:12, Giacomo Mulas > > > escribi?: > > > > > > Hello. > > > > > > I am using, within the same C code, both SLEPC/PETSC and Scalapack/Blacs. > > > On the big parallel machine on which I do production runs, I compiled > > > SLEPC/PETSC with the "--known-64-bit-blas-indices" and > > > "--with-64-bit-indices" options, linking them with the ilp64 version of > > > the > > > Intel MKL libraries, while on the workstation on which I do the > > > development > > > I use the standard libraries provided by the (debian, in my case) > > > packaging > > > system. For Slepc/Petsc themselves I just use the PETSC data types and > > > this > > > automagically defines integers of the appropriate size on both machines. > > > > > > However, when using BLACS, Scalapack and MPI directly in the same code, I > > > will obviously need to use consistent function definitions for them as > > > well. Do I need to set up some complicated independent #ifdef machinery > > > for this > > > or are there some appropriate PETSC data types that I can use that will > > > ensure this consistency? Of course I am including slepc/petsc include > > > files, so all PETSC data types are defined according to the local > > > PETSC/SLEPC options. Can some PETSC developer give me some hint on how to > > > make my MPI, BLACS, SCALAPACK (and PBLAS etc.) calls clean and consistent > > > with this? Perhaps even referring to some examples in the PETSC source > > > code > > > that I can read and take as a reference for this. > > > > > > Thanks in advance > > > Giacomo > > > > > > -- > > > _________________________________________________________________ > > > > > > Giacomo Mulas > > > _________________________________________________________________ > > > > > > INAF - Osservatorio Astronomico di Cagliari > > > via della scienza 5 - 09047 Selargius (CA) > > > > > > tel. +39 070 71180255 > > > mob. : +39 329 6603810 > > > _________________________________________________________________ > > > > > > "When the storms are raging around you, stay right where you are" > > > (Freddy Mercury) > > > _________________________________________________________________ > > > > > > From stormweiner at berkeley.edu Mon Nov 27 14:46:32 2017 From: stormweiner at berkeley.edu (Storm Weiner) Date: Mon, 27 Nov 2017 12:46:32 -0800 Subject: [petsc-users] storing many petsc objects in a single file In-Reply-To: References: Message-ID: Thanks for the advice. Barry, did you mean to add more information after that ":"? HDF5 sounds like a great option for my application, but I don't see much information about how it interfaces with PETSc. All I can find are the doc pages for a few viewer routines. Do you have a link to a more detailed description? Specifically, I'd like to know if PETSc has automatically configured HDF5 datatypes and how to access them. And if there is a standard way to make compound data types derived from PETSc datatypes. In short, how much do I need to muck around in HDF5 myself, and how much can I let PETSc take care of? -Storm On Nov 26, 2017 5:49 AM, "Smith, Barry F." wrote: > > Storm, > > Specifically for TS there is an abstract object called TSTrajectory > which is a way to store histories of simulations (it is used by TSAdjoint > but also useable for other purposes). It has several ways to store > histories and more can be added. > > One draw back to saving everything in PETSc binary in one file is that > we don't have simple support for random access of a particular vector. > > You can also store to HDF5 format and some others that may be useful > for you: > > > Barry > > > > On Nov 26, 2017, at 2:27 AM, Jose E. Roman wrote: > > > > > > > >> El 26 nov 2017, a las 2:25, Storm Weiner > escribi?: > >> > >> Hey there, > >> > >> For simulations, its useful to store the history as a series of state > vectors. For simulations with many time-steps it can get annoying to store > each state vector as a separate file. It would be useful if there were > some way to manage a database of petsc vectors. To save the current > time-step, append the state vector to the database. To restart a > simulation, load the corresponding state vector out of the database. > >> > >> Is there a standard way to do this in PETSc? > >> > >> Thanks, > >> Storm > > > > In the command line, you can use the ?append? option for the viewer. For > instance in the MFN solver in SLEPc you can do this: > > $ ./ex23 -mfn_view_solution binary:vectors.bin::append > > It will save one vector in each call to MFNSolve(), and all vectors will > be stored in the same file ?vectors.bin?. > > > > Alternatively, in the source code you can use PetscViewerBinaryOpen() to > open the viewer, then save as many vectors as you want with VecView(), and > finally close the file with PetscViewerDestroy(). Use VecLoad() to load the > vectors. > > > > Jose > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Nov 27 20:25:13 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 Nov 2017 20:25:13 -0600 Subject: [petsc-users] storing many petsc objects in a single file In-Reply-To: References: Message-ID: On Mon, Nov 27, 2017 at 2:46 PM, Storm Weiner wrote: > Thanks for the advice. > > Barry, did you mean to add more information after that ":"? > > HDF5 sounds like a great option for my application, but I don't see much > information about how it interfaces with PETSc. All I can find are the > doc pages for a few viewer routines. Do you have a link to a more detailed > description? > > Specifically, I'd like to know if PETSc has automatically configured HDF5 > datatypes and how to access them. And if there is a standard way to make > compound data types derived from PETSc datatypes. In short, how much do > I need to muck around in HDF5 myself, and how much can I let PETSc take > care of? > You should be able to use HDF5 exactly as you would use the PETSc binary viewer and the filesystem. Instead of giving a directory and filename, you give a groupname http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Viewer/PetscViewerHDF5PushGroup.html and the filename is the object name. Then just call View(). We also allow you to write metadata about the object http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Viewer/PetscViewerHDF5WriteAttribute.html Matt > -Storm > > On Nov 26, 2017 5:49 AM, "Smith, Barry F." wrote: > >> >> Storm, >> >> Specifically for TS there is an abstract object called TSTrajectory >> which is a way to store histories of simulations (it is used by TSAdjoint >> but also useable for other purposes). It has several ways to store >> histories and more can be added. >> >> One draw back to saving everything in PETSc binary in one file is that >> we don't have simple support for random access of a particular vector. >> >> You can also store to HDF5 format and some others that may be useful >> for you: >> >> >> Barry >> >> >> > On Nov 26, 2017, at 2:27 AM, Jose E. Roman wrote: >> > >> > >> > >> >> El 26 nov 2017, a las 2:25, Storm Weiner >> escribi?: >> >> >> >> Hey there, >> >> >> >> For simulations, its useful to store the history as a series of state >> vectors. For simulations with many time-steps it can get annoying to store >> each state vector as a separate file. It would be useful if there were >> some way to manage a database of petsc vectors. To save the current >> time-step, append the state vector to the database. To restart a >> simulation, load the corresponding state vector out of the database. >> >> >> >> Is there a standard way to do this in PETSc? >> >> >> >> Thanks, >> >> Storm >> > >> > In the command line, you can use the ?append? option for the viewer. >> For instance in the MFN solver in SLEPc you can do this: >> > $ ./ex23 -mfn_view_solution binary:vectors.bin::append >> > It will save one vector in each call to MFNSolve(), and all vectors >> will be stored in the same file ?vectors.bin?. >> > >> > Alternatively, in the source code you can use PetscViewerBinaryOpen() >> to open the viewer, then save as many vectors as you want with VecView(), >> and finally close the file with PetscViewerDestroy(). Use VecLoad() to load >> the vectors. >> > >> > Jose >> > >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at unito.it Wed Nov 29 02:26:35 2017 From: matteo.semplice at unito.it (Matteo Semplice) Date: Wed, 29 Nov 2017 09:26:35 +0100 Subject: [petsc-users] preallocation after DMCreateMatrix? In-Reply-To: References: <33f753de-e783-03a0-711a-510a88389cb7@unito.it> Message-ID: On 25/11/2017 02:05, Matthew Knepley wrote: > On Fri, Nov 24, 2017 at 4:21 PM, Matteo Semplice > > wrote: > > Hi. > > The manual for DMCreateMatrix says "Notes: This properly > preallocates the number of nonzeros in the sparse matrix so you do > not need to do it yourself", so I got the impression that one does > not need to call the preallocation routine for the matrix and > indeed in most examples listed in the manual page for > DMCreateMatrix this is not done or (KSP tutorial ex4) it is called > declaring 0 entries per row. > > However, if read in a mesh in a DMPlex ore create a DMDA and then > call DMCreateMatrix, the resulting matrix errors out when I call > MatSetValues. I have currently followed the suggestion of the > error message and call MatSetOption(A, > MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE), but I'd like to fix > this properly. > > > It sounds like your nonzero pattern does not obey the topology. What > nonzero pattern are you trying to input? Hi. ??? The problem with the DMDA was a bug in our code. Sorry for the noise. On the other hand, with the DMPLex, I still experience problems. It's a FV code and I reduced it to the case of the simple case of a laplacian operator: I need non-diagonal entries at (i,j) if cell i and cell j have a common face. Here below is my code that reads in a grid of 4 cells (unit square divided by the diagonals), create a section with 1 dof per cell, creates a matrix and assembles it. ? ierr = DMPlexCreateFromFile(PETSC_COMM_WORLD, "square1.msh", PETSC_TRUE, &dm);CHKERRQ(ierr); ? ierr = DMPlexSetAdjacencyUseCone(dm, PETSC_TRUE); CHKERRQ(ierr); ? ierr = DMPlexSetAdjacencyUseClosure(dm, PETSC_FALSE); CHKERRQ(ierr); ? ierr = DMPlexComputeGeometryFVM(dm, &ctx.cellgeom, &ctx.facegeom); CHKERRQ(ierr); ? PetscInt nFields = 1, numComp[1]; ? ierr = DMGetDimension(dm, &ctx.dim);CHKERRQ(ierr); ? /* u has 1 dof on each cell */ ? numComp[0] = 1; ? PetscInt *numDofs; ? ierr = PetscCalloc1(nFields*(ctx.dim+1), &numDofs); CHKERRQ(ierr); ? numDofs[0*(ctx.dim+1)+ctx.dim] = 1; ? /* Create a PetscSection with this data layout */ ? PetscSection?? section; ? DMPlexCreateSection(dm, ctx.dim, nFields, numComp, numDofs, 0, NULL, NULL, NULL, NULL, §ion); ? ierr = PetscFree(numDofs); CHKERRQ(ierr); ? PetscSectionSetFieldName(section, 0, "u"); ? /* Tell the DM to use this data layout */ ? DMSetDefaultSection(dm, section); ? PetscSectionDestroy(§ion); ? /* Print out the list of cells */ ? printCells(dm,ctx); ? //Assemble the system ? Mat A; ? PetscPrintf(PETSC_COMM_WORLD,"Creating matrix\n"); ? ierr = DMCreateMatrix(dm,&A);CHKERRQ(ierr); ? ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE);CHKERRQ(ierr); ? //ierr = MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE);CHKERRQ(ierr); ? PetscPrintf(PETSC_COMM_WORLD,"Assembling matrix\n"); ? ierr = assembleMatrix(dm,ctx,A);CHKERRQ(ierr); An error is generated when the assembleMatrix function tried to insert an element at position (0,1) of the matrix. When running with the -mat_view option I get the following output. matteo:~/software/petscMplex$ ./mplexMatrix -mat_view 0] Le celle sono i nodi da 0 a 3 nel DMPlex 0]? cell 0 has centroid (0.166667, 0.500000) and volume 0.250000 0]?????? has 3 faces: 9 10 11 0]? cell 1 has centroid (0.500000, 0.166667) and volume 0.250000 0]?????? has 3 faces: 12 13 9 0]? cell 2 has centroid (0.500000, 0.833333) and volume 0.250000 0]?????? has 3 faces: 10 14 15 0]? cell 3 has centroid (0.833333, 0.500000) and volume 0.250000 0]?????? has 3 faces: 14 13 16 Creating matrix Mat Object: 1 MPI processes ? type: seqaij row 0: (0, 0.) row 1: (1, 0.) row 2: (2, 0.) row 3: (3, 0.) Assembling matrix [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Argument out of range [0]PETSC ERROR: New nonzero at (0,1) caused a malloc Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [0]PETSC ERROR: ./mplexMatrix on a x86_64-linux-gnu-real named signalkuppe by matteo Wed Nov 29 09:10:12 2017 [0]PETSC ERROR: Configure options --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --with-silent-rules=0 --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --with-maintainer-mode=0 --with-dependency-tracking=0 --with-debugging=0 --shared-library-extension=_real --with-clanguage=C++ --with-shared-libraries --with-pic=1 --useThreads=0 --with-fortran-interfaces=1 --with-mpi-dir=/usr/lib/x86_64-linux-gnu/openmpi --with-blas-lib=-lblas --with-lapack-lib=-llapack --with-blacs=1 --with-blacs-lib="-lblacsCinit-openmpi -lblacs-openmpi" --with-scalapack=1 --with-scalapack-lib=-lscalapack-openmpi --with-mumps=1 --with-mumps-include="[]" --with-mumps-lib="-ldmumps -lzmumps -lsmumps -lcmumps -lmumps_common -lpord" --with-suitesparse=1 --with-suitesparse-include=/usr/include/suitesparse --with-suitesparse-lib="-lumfpack -lamd -lcholmod -lklu" --with-spooles=1 --with-spooles-include=/usr/include/spooles --with-spooles-lib=-lspooles --with-ptscotch=1 --with-ptscotch-include=/usr/include/scotch --with-ptscotch-lib="-lptesmumps -lptscotch -lptscotcherr" --with-fftw=1 --with-fftw-include="[]" --with-fftw-lib="-lfftw3 -lfftw3_mpi" --with-superlu=1 --with-superlu-include=/usr/include/superlu --with-superlu-lib=-lsuperlu --with-hdf5=1 --with-hdf5-dir=/usr/lib/x86_64-linux-gnu/hdf5/openmpi --CXX_LINKER_FLAGS=-Wl,--no-as-needed --with-hypre=1 --with-hypre-include=/usr/include/hypre --with-hypre-lib="-lHYPRE_IJ_mv -lHYPRE_parcsr_ls -lHYPRE_sstruct_ls -lHYPRE_sstruct_mv -lHYPRE_struct_ls -lHYPRE_struct_mv -lHYPRE_utilities" --prefix=/usr/lib/petscdir/3.7.5/x86_64-linux-gnu-real PETSC_DIR=/build/petsc-XG7COe/petsc-3.7.5+dfsg1 --PETSC_ARCH=x86_64-linux-gnu-real CFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-XG7COe/petsc-3.7.5+dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" CXXFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-XG7COe/petsc-3.7.5+dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" FCFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-XG7COe/petsc-3.7.5+dfsg1=. -fstack-protector-strong -fPIC" FFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-XG7COe/petsc-3.7.5+dfsg1=. -fstack-protector-strong -fPIC" CPPFLAGS="-Wdate-time -D_FORTIFY_SOURCE=2" LDFLAGS="-Wl,-z,relro -fPIC" MAKEFLAGS=w [0]PETSC ERROR: #1 MatSetValues_SeqAIJ() line 485 in /build/petsc-XG7COe/petsc-3.7.5+dfsg1/src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: #2 MatSetValues() line 1190 in /build/petsc-XG7COe/petsc-3.7.5+dfsg1/src/mat/interface/matrix.c [0]PETSC ERROR: #3 assembleMatrix() line 150 in /home/matteo/software/petscMplex/mplexMatrix.cpp [0]PETSC ERROR: #4 main() line 202 in /home/matteo/software/petscMplex/mplexMatrix.cpp [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -mat_view [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 63. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- I guess that the matrix is created with only diagonal entries... ? (The same happens with 3.8.0 from debian experimental) Thanks, ??? Matteo -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Nov 29 05:46:44 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 Nov 2017 05:46:44 -0600 Subject: [petsc-users] preallocation after DMCreateMatrix? In-Reply-To: References: <33f753de-e783-03a0-711a-510a88389cb7@unito.it> Message-ID: On Wed, Nov 29, 2017 at 2:26 AM, Matteo Semplice wrote: > On 25/11/2017 02:05, Matthew Knepley wrote: > > On Fri, Nov 24, 2017 at 4:21 PM, Matteo Semplice > wrote: > >> Hi. >> >> The manual for DMCreateMatrix says "Notes: This properly preallocates the >> number of nonzeros in the sparse matrix so you do not need to do it >> yourself", so I got the impression that one does not need to call the >> preallocation routine for the matrix and indeed in most examples listed in >> the manual page for DMCreateMatrix this is not done or (KSP tutorial ex4) >> it is called declaring 0 entries per row. >> >> However, if read in a mesh in a DMPlex ore create a DMDA and then call >> DMCreateMatrix, the resulting matrix errors out when I call MatSetValues. I >> have currently followed the suggestion of the error message and call >> MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE), but I'd >> like to fix this properly. >> > > It sounds like your nonzero pattern does not obey the topology. What > nonzero pattern are you trying to input? > > > Hi. > > The problem with the DMDA was a bug in our code. Sorry for the noise. > > On the other hand, with the DMPLex, I still experience problems. It's a FV > code and I reduced it to the case of the simple case of a laplacian > operator: I need non-diagonal entries at (i,j) if cell i and cell j have a > common face. > Here below is my code that reads in a grid of 4 cells (unit square divided > by the diagonals), create a section with 1 dof per cell, creates a matrix > and assembles it. > > ierr = DMPlexCreateFromFile(PETSC_COMM_WORLD, "square1.msh", > PETSC_TRUE, &dm);CHKERRQ(ierr); > Can you show me the mesh? ierr = DMViewFromOptions(dm, NULL, "-dm_view");CHKERRQ(ierr); and then run with -dm_view ::ascii_info_detail > ierr = DMPlexSetAdjacencyUseCone(dm, PETSC_TRUE); CHKERRQ(ierr); > ierr = DMPlexSetAdjacencyUseClosure(dm, PETSC_FALSE); CHKERRQ(ierr); > This looks like the right FVM adjacency, but your matrix is diagonal it appears below. TS ex11 has an identical call, but produces the correct matrix, which is why I want to look at your mesh. Thanks, Matt > > ierr = DMPlexComputeGeometryFVM(dm, &ctx.cellgeom, &ctx.facegeom); > CHKERRQ(ierr); > PetscInt nFields = 1, numComp[1]; > ierr = DMGetDimension(dm, &ctx.dim);CHKERRQ(ierr); > /* u has 1 dof on each cell */ > numComp[0] = 1; > PetscInt *numDofs; > ierr = PetscCalloc1(nFields*(ctx.dim+1), &numDofs); CHKERRQ(ierr); > numDofs[0*(ctx.dim+1)+ctx.dim] = 1; > /* Create a PetscSection with this data layout */ > PetscSection section; > DMPlexCreateSection(dm, ctx.dim, nFields, numComp, numDofs, 0, NULL, > NULL, NULL, NULL, §ion); > ierr = PetscFree(numDofs); CHKERRQ(ierr); > PetscSectionSetFieldName(section, 0, "u"); > /* Tell the DM to use this data layout */ > DMSetDefaultSection(dm, section); > PetscSectionDestroy(§ion); > /* Print out the list of cells */ > printCells(dm,ctx); > //Assemble the system > Mat A; > PetscPrintf(PETSC_COMM_WORLD,"Creating matrix\n"); > ierr = DMCreateMatrix(dm,&A);CHKERRQ(ierr); > ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE);CHKERRQ(ierr); > //ierr = MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, > PETSC_FALSE);CHKERRQ(ierr); > PetscPrintf(PETSC_COMM_WORLD,"Assembling matrix\n"); > ierr = assembleMatrix(dm,ctx,A);CHKERRQ(ierr); > > > An error is generated when the assembleMatrix function tried to insert an > element at position (0,1) of the matrix. When running with the -mat_view > option I get the following output. > > matteo:~/software/petscMplex$ ./mplexMatrix -mat_view > 0] Le celle sono i nodi da 0 a 3 nel DMPlex > 0] cell 0 has centroid (0.166667, 0.500000) and volume 0.250000 > 0] has 3 faces: 9 10 11 > 0] cell 1 has centroid (0.500000, 0.166667) and volume 0.250000 > 0] has 3 faces: 12 13 9 > 0] cell 2 has centroid (0.500000, 0.833333) and volume 0.250000 > 0] has 3 faces: 10 14 15 > 0] cell 3 has centroid (0.833333, 0.500000) and volume 0.250000 > 0] has 3 faces: 14 13 16 > Creating matrix > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 0.) > row 1: (1, 0.) > row 2: (2, 0.) > row 3: (3, 0.) > Assembling matrix > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: New nonzero at (0,1) caused a malloc > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn > off this check > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 > [0]PETSC ERROR: ./mplexMatrix on a x86_64-linux-gnu-real named signalkuppe > by matteo Wed Nov 29 09:10:12 2017 > [0]PETSC ERROR: Configure options --build=x86_64-linux-gnu --prefix=/usr > --includedir=${prefix}/include --mandir=${prefix}/share/man > --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var > --with-silent-rules=0 --libdir=${prefix}/lib/x86_64-linux-gnu > --libexecdir=${prefix}/lib/x86_64-linux-gnu --with-maintainer-mode=0 > --with-dependency-tracking=0 --with-debugging=0 --shared-library-extension=_real > --with-clanguage=C++ --with-shared-libraries --with-pic=1 --useThreads=0 > --with-fortran-interfaces=1 --with-mpi-dir=/usr/lib/x86_64-linux-gnu/openmpi > --with-blas-lib=-lblas --with-lapack-lib=-llapack --with-blacs=1 > --with-blacs-lib="-lblacsCinit-openmpi -lblacs-openmpi" > --with-scalapack=1 --with-scalapack-lib=-lscalapack-openmpi > --with-mumps=1 --with-mumps-include="[]" --with-mumps-lib="-ldmumps > -lzmumps -lsmumps -lcmumps -lmumps_common -lpord" --with-suitesparse=1 > --with-suitesparse-include=/usr/include/suitesparse > --with-suitesparse-lib="-lumfpack -lamd -lcholmod -lklu" --with-spooles=1 > --with-spooles-include=/usr/include/spooles --with-spooles-lib=-lspooles > --with-ptscotch=1 --with-ptscotch-include=/usr/include/scotch > --with-ptscotch-lib="-lptesmumps -lptscotch -lptscotcherr" --with-fftw=1 > --with-fftw-include="[]" --with-fftw-lib="-lfftw3 -lfftw3_mpi" > --with-superlu=1 --with-superlu-include=/usr/include/superlu > --with-superlu-lib=-lsuperlu --with-hdf5=1 --with-hdf5-dir=/usr/lib/x86_64-linux-gnu/hdf5/openmpi > --CXX_LINKER_FLAGS=-Wl,--no-as-needed --with-hypre=1 > --with-hypre-include=/usr/include/hypre --with-hypre-lib="-lHYPRE_IJ_mv > -lHYPRE_parcsr_ls -lHYPRE_sstruct_ls -lHYPRE_sstruct_mv -lHYPRE_struct_ls > -lHYPRE_struct_mv -lHYPRE_utilities" --prefix=/usr/lib/petscdir/3.7.5/x86_64-linux-gnu-real > PETSC_DIR=/build/petsc-XG7COe/petsc-3.7.5+dfsg1 > --PETSC_ARCH=x86_64-linux-gnu-real CFLAGS="-g -O2 > -fdebug-prefix-map=/build/petsc-XG7COe/petsc-3.7.5+dfsg1=. > -fstack-protector-strong -Wformat -Werror=format-security -fPIC" > CXXFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-XG7COe/petsc-3.7.5+dfsg1=. > -fstack-protector-strong -Wformat -Werror=format-security -fPIC" > FCFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-XG7COe/petsc-3.7.5+dfsg1=. > -fstack-protector-strong -fPIC" FFLAGS="-g -O2 -fdebug-prefix-map=/build/ > petsc-XG7COe/petsc-3.7.5+dfsg1=. -fstack-protector-strong -fPIC" > CPPFLAGS="-Wdate-time -D_FORTIFY_SOURCE=2" LDFLAGS="-Wl,-z,relro -fPIC" > MAKEFLAGS=w > [0]PETSC ERROR: #1 MatSetValues_SeqAIJ() line 485 in > /build/petsc-XG7COe/petsc-3.7.5+dfsg1/src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: #2 MatSetValues() line 1190 in > /build/petsc-XG7COe/petsc-3.7.5+dfsg1/src/mat/interface/matrix.c > [0]PETSC ERROR: #3 assembleMatrix() line 150 in /home/matteo/software/ > petscMplex/mplexMatrix.cpp > [0]PETSC ERROR: #4 main() line 202 in /home/matteo/software/ > petscMplex/mplexMatrix.cpp > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -mat_view > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 63. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > > I guess that the matrix is created with only diagonal entries... ? > > (The same happens with 3.8.0 from debian experimental) > > Thanks, > Matteo > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at unito.it Wed Nov 29 06:36:39 2017 From: matteo.semplice at unito.it (Matteo Semplice) Date: Wed, 29 Nov 2017 13:36:39 +0100 Subject: [petsc-users] preallocation after DMCreateMatrix? In-Reply-To: References: <33f753de-e783-03a0-711a-510a88389cb7@unito.it> Message-ID: <0b480f6d-d642-e444-e24f-2f5e94743956@unito.it> On 29/11/2017 12:46, Matthew Knepley wrote: > On Wed, Nov 29, 2017 at 2:26 AM, Matteo Semplice > > wrote: > > On 25/11/2017 02:05, Matthew Knepley wrote: >> On Fri, Nov 24, 2017 at 4:21 PM, Matteo Semplice >> > wrote: >> >> Hi. >> >> The manual for DMCreateMatrix says "Notes: This properly >> preallocates the number of nonzeros in the sparse matrix so >> you do not need to do it yourself", so I got the impression >> that one does not need to call the preallocation routine for >> the matrix and indeed in most examples listed in the manual >> page for DMCreateMatrix this is not done or (KSP tutorial >> ex4) it is called declaring 0 entries per row. >> >> However, if read in a mesh in a DMPlex ore create a DMDA and >> then call DMCreateMatrix, the resulting matrix errors out >> when I call MatSetValues. I have currently followed the >> suggestion of the error message and call MatSetOption(A, >> MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE), but I'd like to >> fix this properly. >> >> >> It sounds like your nonzero pattern does not obey the topology. >> What nonzero pattern are you trying to input? > > Hi. > > ??? The problem with the DMDA was a bug in our code. Sorry for the > noise. > > On the other hand, with the DMPLex, I still experience problems. > It's a FV code and I reduced it to the case of the simple case of > a laplacian operator: I need non-diagonal entries at (i,j) if cell > i and cell j have a common face. > Here below is my code that reads in a grid of 4 cells (unit square > divided by the diagonals), create a section with 1 dof per cell, > creates a matrix and assembles it. > > > ? ierr = DMPlexCreateFromFile(PETSC_COMM_WORLD, "square1.msh", > PETSC_TRUE, &dm);CHKERRQ(ierr); > > > Can you show me the mesh? > > ? ? ? ? ?ierr = DMViewFromOptions(dm, NULL, "-dm_view");CHKERRQ(ierr); > > and then run with -dm_view ::ascii_info_detail > > > ? ierr = DMPlexSetAdjacencyUseCone(dm, PETSC_TRUE); CHKERRQ(ierr); > ? ierr = DMPlexSetAdjacencyUseClosure(dm, PETSC_FALSE); CHKERRQ(ierr); > > > This looks like the right FVM adjacency, but your matrix is diagonal > it appears below. TS ex11 > has an identical call, but produces the correct matrix, which is why I > want to look at your mesh. I have put the DMViewFromOptions call after the SetAdjacency calls and this is the output: matteo at signalkuppe:~/software/petscMplex$ ./mplexMatrix -dm_view ::ascii_info_detail DM Object: 1 MPI processes ? type: plex Mesh 'DM_0x557496ad8380_0': orientation is missing cap --> base: [0] Max sizes cone: 3 support: 4 [0]: 4 ----> 9 [0]: 4 ----> 11 [0]: 4 ----> 12 [0]: 5 ----> 10 [0]: 5 ----> 11 [0]: 5 ----> 15 [0]: 6 ----> 14 [0]: 6 ----> 15 [0]: 6 ----> 16 [0]: 7 ----> 12 [0]: 7 ----> 13 [0]: 7 ----> 16 [0]: 8 ----> 9 [0]: 8 ----> 10 [0]: 8 ----> 13 [0]: 8 ----> 14 [0]: 9 ----> 0 [0]: 9 ----> 1 [0]: 10 ----> 0 [0]: 10 ----> 2 [0]: 11 ----> 0 [0]: 12 ----> 1 [0]: 13 ----> 1 [0]: 13 ----> 3 [0]: 14 ----> 2 [0]: 14 ----> 3 [0]: 15 ----> 2 [0]: 16 ----> 3 base <-- cap: [0]: 0 <---- 9 (0) [0]: 0 <---- 10 (0) [0]: 0 <---- 11 (0) [0]: 1 <---- 12 (0) [0]: 1 <---- 13 (0) [0]: 1 <---- 9 (-2) [0]: 2 <---- 10 (-2) [0]: 2 <---- 14 (0) [0]: 2 <---- 15 (0) [0]: 3 <---- 14 (-2) [0]: 3 <---- 13 (-2) [0]: 3 <---- 16 (0) [0]: 9 <---- 4 (0) [0]: 9 <---- 8 (0) [0]: 10 <---- 8 (0) [0]: 10 <---- 5 (0) [0]: 11 <---- 5 (0) [0]: 11 <---- 4 (0) [0]: 12 <---- 4 (0) [0]: 12 <---- 7 (0) [0]: 13 <---- 7 (0) [0]: 13 <---- 8 (0) [0]: 14 <---- 8 (0) [0]: 14 <---- 6 (0) [0]: 15 <---- 6 (0) [0]: 15 <---- 5 (0) [0]: 16 <---- 7 (0) [0]: 16 <---- 6 (0) coordinates with 1 fields ? field 0 with 2 components Process 0: ? (?? 4) dim? 2 offset?? 0 0. 0. ? (?? 5) dim? 2 offset?? 2 0. 1. ? (?? 6) dim? 2 offset?? 4 1. 1. ? (?? 7) dim? 2 offset?? 6 1. 0. ? (?? 8) dim? 2 offset?? 8 0.5 0.5 For the records, the mesh is loaded from the (gmsh generated) file ==== square1.msh ===== $MeshFormat 2.2 0 8 $EndMeshFormat $Nodes 5 1 0 0 0 2 0 1 0 3 1 1 0 4 1 0 0 5 0.5 0.5 0 $EndNodes $Elements 12 1 15 2 0 1 1 2 15 2 0 2 2 3 15 2 0 3 3 4 15 2 0 4 4 5 1 2 0 1 4 3 6 1 2 0 2 3 2 7 1 2 0 3 2 1 8 1 2 0 4 1 4 9 2 2 0 6 1 5 2 10 2 2 0 6 1 4 5 11 2 2 0 6 2 5 3 12 2 2 0 6 3 5 4 $EndElements =================== Thanks, ??? Matteo -------------- next part -------------- An HTML attachment was scrubbed... URL: From hbuesing at eonerc.rwth-aachen.de Thu Nov 30 06:05:25 2017 From: hbuesing at eonerc.rwth-aachen.de (Buesing, Henrik) Date: Thu, 30 Nov 2017 12:05:25 +0000 Subject: [petsc-users] Newton methods that converge all the time In-Reply-To: <128FB4CE-01C0-4E1C-A423-01AA215ACB87@mcs.anl.gov> References: <128FB4CE-01C0-4E1C-A423-01AA215ACB87@mcs.anl.gov> Message-ID: Dear Barry, I am using a pressure-enthalpy formulation, which is valid across all phase states, i.e. no variable switching. Nevertheless, I have 1) a truncate function defined with SNESLineSearchSetPreCheck, which keeps pressure and enthalpy values in physical bounds. 2) I have if statements in my FormFunction and FormJacobian. These test the current enthalpy vs. saturated water and gas enthalpies and determine the state. I could discard the SNESLineSearchSetPreCheck. Would this be better for Newton's method? Thank you! Henrik -- Dipl.-Math. Henrik B?sing Institute for Applied Geophysics and Geothermal Energy E.ON Energy Research Center RWTH Aachen University ------------------------------------------------------ Mathieustr. 10 | Tel +49 (0)241 80 49907 52074 Aachen, Germany | Fax +49 (0)241 80 49889 ------------------------------------------------------ http://www.eonerc.rwth-aachen.de/GGE hbuesing at eonerc.rwth-aachen.de ------------------------------------------------------ > -----Urspr?ngliche Nachricht----- > Von: Smith, Barry F. [mailto:bsmith at mcs.anl.gov] > Gesendet: 10 November 2017 05:09 > An: Buesing, Henrik > Cc: petsc-users > Betreff: Re: [petsc-users] Newton methods that converge all the time > > > Henrik, > > Please describe in some detail how you are handling phase change. If > you have if () tests of any sort in your FormFunction() or > FormJacobian() this can kill Newton's method. If you are using "variable > switching" this WILL kill Newtons' method. Are you monkeying with phase > definitions in TSPostStep or with SNESLineSearchSetPostCheck(). This > will also kill Newton's method. > > Barry > > > > On Nov 7, 2017, at 3:19 AM, Buesing, Henrik aachen.de> wrote: > > > > Dear all, > > > > I am solving a system of nonlinear, transient PDEs. I am using > Newton?s method in every time step to solve the nonlinear algebraic > equations. Of course, Newton?s method only converges if the initial > guess is sufficiently close to the solution. > > > > This is often not the case and Newton?s method diverges. Then, I > reduce the time step and try again. This can become prohibitively > costly, if the time steps get very small. I am thus looking for variants > of Newton?s method, which have a bigger convergence radius or ideally > converge all the time. > > > > I tried out the pseudo-timestepping described in > http://www.mcs.anl.gov/petsc/petsc- > current/src/ts/examples/tutorials/ex1f.F.html. > > > > However, this does converge even worse. I am seeing breakdown when I > have phase changes (e.g. liquid to two-phase). > > > > I was under the impression that pseudo-timestepping should converge > better. Thus, my question: > > > > Am I doing something wrong or is it possible that Newton?s method > converges and pseudo-timestepping does not? > > > > Thank you for any insight on this. > > > > Henrik > > > > > > > > > > -- > > Dipl.-Math. Henrik B?sing > > Institute for Applied Geophysics and Geothermal Energy E.ON Energy > > Research Center RWTH Aachen University > > ------------------------------------------------------ > > Mathieustr. 10 | Tel +49 (0)241 80 49907 > > 52074 Aachen, Germany | Fax +49 (0)241 80 49889 > > ------------------------------------------------------ > > http://www.eonerc.rwth-aachen.de/GGE > > hbuesing at eonerc.rwth-aachen.de > > ------------------------------------------------------ From knepley at gmail.com Thu Nov 30 06:10:35 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 30 Nov 2017 06:10:35 -0600 Subject: [petsc-users] Newton methods that converge all the time In-Reply-To: References: <128FB4CE-01C0-4E1C-A423-01AA215ACB87@mcs.anl.gov> Message-ID: On Thu, Nov 30, 2017 at 6:05 AM, Buesing, Henrik < hbuesing at eonerc.rwth-aachen.de> wrote: > Dear Barry, > > I am using a pressure-enthalpy formulation, which is valid across all > phase states, i.e. no variable switching. Nevertheless, I have > > 1) a truncate function defined with SNESLineSearchSetPreCheck, which keeps > pressure and enthalpy values in physical bounds. > You should be able to replace this using http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESVISetVariableBounds.html and use the SNESVI solver. > 2) I have if statements in my FormFunction and FormJacobian. These test > the current enthalpy vs. saturated water and gas enthalpies and determine > the state. > It sounds like the residual function you are using could be non-smooth here. This could give you problems if the solution is near the switch, however sometimes it will still converge but linearly instead of quadratically. Thanks, Matt > I could discard the SNESLineSearchSetPreCheck. Would this be better for > Newton's method? > > Thank you! > > Henrik > > > -- > Dipl.-Math. Henrik B?sing > Institute for Applied Geophysics and Geothermal Energy > E.ON Energy Research Center > RWTH Aachen University > ------------------------------------------------------ > Mathieustr. 10 | Tel +49 (0)241 80 49907 > 52074 Aachen, Germany | Fax +49 (0)241 80 49889 > ------------------------------------------------------ > http://www.eonerc.rwth-aachen.de/GGE > hbuesing at eonerc.rwth-aachen.de > ------------------------------------------------------ > > > -----Urspr?ngliche Nachricht----- > > Von: Smith, Barry F. [mailto:bsmith at mcs.anl.gov] > > Gesendet: 10 November 2017 05:09 > > An: Buesing, Henrik > > Cc: petsc-users > > Betreff: Re: [petsc-users] Newton methods that converge all the time > > > > > > Henrik, > > > > Please describe in some detail how you are handling phase change. If > > you have if () tests of any sort in your FormFunction() or > > FormJacobian() this can kill Newton's method. If you are using "variable > > switching" this WILL kill Newtons' method. Are you monkeying with phase > > definitions in TSPostStep or with SNESLineSearchSetPostCheck(). This > > will also kill Newton's method. > > > > Barry > > > > > > > On Nov 7, 2017, at 3:19 AM, Buesing, Henrik > aachen.de> wrote: > > > > > > Dear all, > > > > > > I am solving a system of nonlinear, transient PDEs. I am using > > Newton?s method in every time step to solve the nonlinear algebraic > > equations. Of course, Newton?s method only converges if the initial > > guess is sufficiently close to the solution. > > > > > > This is often not the case and Newton?s method diverges. Then, I > > reduce the time step and try again. This can become prohibitively > > costly, if the time steps get very small. I am thus looking for variants > > of Newton?s method, which have a bigger convergence radius or ideally > > converge all the time. > > > > > > I tried out the pseudo-timestepping described in > > http://www.mcs.anl.gov/petsc/petsc- > > current/src/ts/examples/tutorials/ex1f.F.html. > > > > > > However, this does converge even worse. I am seeing breakdown when I > > have phase changes (e.g. liquid to two-phase). > > > > > > I was under the impression that pseudo-timestepping should converge > > better. Thus, my question: > > > > > > Am I doing something wrong or is it possible that Newton?s method > > converges and pseudo-timestepping does not? > > > > > > Thank you for any insight on this. > > > > > > Henrik > > > > > > > > > > > > > > > -- > > > Dipl.-Math. Henrik B?sing > > > Institute for Applied Geophysics and Geothermal Energy E.ON Energy > > > Research Center RWTH Aachen University > > > ------------------------------------------------------ > > > Mathieustr. 10 | Tel +49 (0)241 80 49907 > > > 52074 Aachen, Germany | Fax +49 (0)241 80 49889 > > > ------------------------------------------------------ > > > http://www.eonerc.rwth-aachen.de/GGE > > > hbuesing at eonerc.rwth-aachen.de > > > ------------------------------------------------------ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From giacomo.mulas84 at gmail.com Thu Nov 30 06:30:26 2017 From: giacomo.mulas84 at gmail.com (Giacomo Mulas) Date: Thu, 30 Nov 2017 13:30:26 +0100 (CET) Subject: [petsc-users] consistence of PETSC/SLEPC with MPI, BLACS, SCALAPACK calls... In-Reply-To: References: <39CC65C5-03D0-4C5A-A008-C13380DF16AA@dsic.upv.es> Message-ID: On Mon, 27 Nov 2017, Satish Balay wrote: > These questions are more pertinant to MKL - i.e what interface does > MKL ilp64 blacs library provide for Cblacs_get() etc.. I don't know if it's more pertinent, but indeed it is pertinent also to MKL. The pertinence with SLEPC/PETSC is that my code _also_ uses SLEPC/PETSC, and I run it both on an architecture in which SLEPC/PETSC is compiled against the ilp64 mkl and on architectures in which SLEPC/PETSC is compiled against stock standard scalapack/blacs/lapack/blas libs. > > The following url has some info - and some references to ilp64 MPI > https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/474149 thanks for the link. > [PETSc is not tested with ilp64 MPI] well, the version of the code that does not yet use scalapack but only slepc appears to work fine with it. Bye Giacomo -- _________________________________________________________________ Giacomo Mulas _________________________________________________________________ INAF - Osservatorio Astronomico di Cagliari via della scienza 5 - 09047 Selargius (CA) tel. +39 070 71180255 mob. : +39 329 6603810 _________________________________________________________________ "When the storms are raging around you, stay right where you are" (Freddy Mercury) _________________________________________________________________ From knepley at gmail.com Thu Nov 30 06:46:38 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 30 Nov 2017 06:46:38 -0600 Subject: [petsc-users] preallocation after DMCreateMatrix? In-Reply-To: <0b480f6d-d642-e444-e24f-2f5e94743956@unito.it> References: <33f753de-e783-03a0-711a-510a88389cb7@unito.it> <0b480f6d-d642-e444-e24f-2f5e94743956@unito.it> Message-ID: Thanks for finding this bug. I guess no one has been making matrices with FVM. I will fix this internally, but here is a workaround which makes the code go for now. Thanks, Matt On Wed, Nov 29, 2017 at 6:36 AM, Matteo Semplice wrote: > > > On 29/11/2017 12:46, Matthew Knepley wrote: > > On Wed, Nov 29, 2017 at 2:26 AM, Matteo Semplice > wrote: > >> On 25/11/2017 02:05, Matthew Knepley wrote: >> >> On Fri, Nov 24, 2017 at 4:21 PM, Matteo Semplice < >> matteo.semplice at unito.it> wrote: >> >>> Hi. >>> >>> The manual for DMCreateMatrix says "Notes: This properly preallocates >>> the number of nonzeros in the sparse matrix so you do not need to do it >>> yourself", so I got the impression that one does not need to call the >>> preallocation routine for the matrix and indeed in most examples listed in >>> the manual page for DMCreateMatrix this is not done or (KSP tutorial ex4) >>> it is called declaring 0 entries per row. >>> >>> However, if read in a mesh in a DMPlex ore create a DMDA and then call >>> DMCreateMatrix, the resulting matrix errors out when I call MatSetValues. I >>> have currently followed the suggestion of the error message and call >>> MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE), but I'd >>> like to fix this properly. >>> >> >> It sounds like your nonzero pattern does not obey the topology. What >> nonzero pattern are you trying to input? >> >> >> Hi. >> >> The problem with the DMDA was a bug in our code. Sorry for the noise. >> >> On the other hand, with the DMPLex, I still experience problems. It's a >> FV code and I reduced it to the case of the simple case of a laplacian >> operator: I need non-diagonal entries at (i,j) if cell i and cell j have a >> common face. >> Here below is my code that reads in a grid of 4 cells (unit square >> divided by the diagonals), create a section with 1 dof per cell, creates a >> matrix and assembles it. >> > >> ierr = DMPlexCreateFromFile(PETSC_COMM_WORLD, "square1.msh", >> PETSC_TRUE, &dm);CHKERRQ(ierr); >> > > Can you show me the mesh? > > ierr = DMViewFromOptions(dm, NULL, "-dm_view");CHKERRQ(ierr); > > and then run with -dm_view ::ascii_info_detail > > >> ierr = DMPlexSetAdjacencyUseCone(dm, PETSC_TRUE); CHKERRQ(ierr); >> ierr = DMPlexSetAdjacencyUseClosure(dm, PETSC_FALSE); CHKERRQ(ierr); >> > > This looks like the right FVM adjacency, but your matrix is diagonal it > appears below. TS ex11 > has an identical call, but produces the correct matrix, which is why I > want to look at your mesh. > > > I have put the DMViewFromOptions call after the SetAdjacency calls and > this is the output: > > matteo at signalkuppe:~/software/petscMplex$ ./mplexMatrix -dm_view > ::ascii_info_detail > DM Object: 1 MPI processes > type: plex > Mesh 'DM_0x557496ad8380_0': > orientation is missing > cap --> base: > [0] Max sizes cone: 3 support: 4 > [0]: 4 ----> 9 > [0]: 4 ----> 11 > [0]: 4 ----> 12 > [0]: 5 ----> 10 > [0]: 5 ----> 11 > [0]: 5 ----> 15 > [0]: 6 ----> 14 > [0]: 6 ----> 15 > [0]: 6 ----> 16 > [0]: 7 ----> 12 > [0]: 7 ----> 13 > [0]: 7 ----> 16 > [0]: 8 ----> 9 > [0]: 8 ----> 10 > [0]: 8 ----> 13 > [0]: 8 ----> 14 > [0]: 9 ----> 0 > [0]: 9 ----> 1 > [0]: 10 ----> 0 > [0]: 10 ----> 2 > [0]: 11 ----> 0 > [0]: 12 ----> 1 > [0]: 13 ----> 1 > [0]: 13 ----> 3 > [0]: 14 ----> 2 > [0]: 14 ----> 3 > [0]: 15 ----> 2 > [0]: 16 ----> 3 > base <-- cap: > [0]: 0 <---- 9 (0) > [0]: 0 <---- 10 (0) > [0]: 0 <---- 11 (0) > [0]: 1 <---- 12 (0) > [0]: 1 <---- 13 (0) > [0]: 1 <---- 9 (-2) > [0]: 2 <---- 10 (-2) > [0]: 2 <---- 14 (0) > [0]: 2 <---- 15 (0) > [0]: 3 <---- 14 (-2) > [0]: 3 <---- 13 (-2) > [0]: 3 <---- 16 (0) > [0]: 9 <---- 4 (0) > [0]: 9 <---- 8 (0) > [0]: 10 <---- 8 (0) > [0]: 10 <---- 5 (0) > [0]: 11 <---- 5 (0) > [0]: 11 <---- 4 (0) > [0]: 12 <---- 4 (0) > [0]: 12 <---- 7 (0) > [0]: 13 <---- 7 (0) > [0]: 13 <---- 8 (0) > [0]: 14 <---- 8 (0) > [0]: 14 <---- 6 (0) > [0]: 15 <---- 6 (0) > [0]: 15 <---- 5 (0) > [0]: 16 <---- 7 (0) > [0]: 16 <---- 6 (0) > coordinates with 1 fields > field 0 with 2 components > Process 0: > ( 4) dim 2 offset 0 0. 0. > ( 5) dim 2 offset 2 0. 1. > ( 6) dim 2 offset 4 1. 1. > ( 7) dim 2 offset 6 1. 0. > ( 8) dim 2 offset 8 0.5 0.5 > > For the records, the mesh is loaded from the (gmsh generated) file > > ==== square1.msh ===== > $MeshFormat > 2.2 0 8 > $EndMeshFormat > $Nodes > 5 > 1 0 0 0 > 2 0 1 0 > 3 1 1 0 > 4 1 0 0 > 5 0.5 0.5 0 > $EndNodes > $Elements > 12 > 1 15 2 0 1 1 > 2 15 2 0 2 2 > 3 15 2 0 3 3 > 4 15 2 0 4 4 > 5 1 2 0 1 4 3 > 6 1 2 0 2 3 2 > 7 1 2 0 3 2 1 > 8 1 2 0 4 1 4 > 9 2 2 0 6 1 5 2 > 10 2 2 0 6 1 4 5 > 11 2 2 0 6 2 5 3 > 12 2 2 0 6 3 5 4 > $EndElements > =================== > > Thanks, > Matteo > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: testfvm.c Type: text/x-csrc Size: 2063 bytes Desc: not available URL: From hbuesing at eonerc.rwth-aachen.de Thu Nov 30 07:41:19 2017 From: hbuesing at eonerc.rwth-aachen.de (Buesing, Henrik) Date: Thu, 30 Nov 2017 13:41:19 +0000 Subject: [petsc-users] Newton methods that converge all the time In-Reply-To: References: Message-ID: Dear Matt, I wanted to try out NGMRES. I am using options [1] from your tutorial slides. This works fine with SNES ex19, but fails with error [2] when I use my code. When I use my code with SNES options ?-snes_type newtonls -snes_linesearch_type l2? it runs fine. Is this me doing something wrong in FormJacobian or what could this be? Is there a SNES example with MatSetValuesStencil I could try with NGMRES? Thank you! Henrik [1] -snes_type ngmres -npc_snes_max_it 1 -snes_converged_reason -npc_snes_type fas -npc_fas_coarse_snes_converged_reason -npc_fas_levels_snes_type newtonls -npc_fas_levels_snes_max_it 6 -npc_fas_levels_snes_linesearch_type basic -npc_fas_levels_snes_max_linear_solve_fail 30 -npc_fas_levels_ksp_max_it 20 -npc_fas_levels_snes_converged_reason -npc_fas_coarse_snes_linesearch_type basic [2] [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Null argument, when expecting valid pointer [0]PETSC ERROR: Null Object: Parameter # 1 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.8.2, unknown [0]PETSC ERROR: /rwthfs/rz/cluster/work/hb111949/Descramble/1D_Model/steam/JUBE/ngmres/debug/shem_fw64gnu_steam.x on a arch-linux2-c-debug named linuxbmc0004.rz.RWTH-Aachen.DE by hb111949 Thu Nov 30 14:35:56 2017 [0]PETSC ERROR: Configure options --download-fblaslapack --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --download-hypre --download-superlu_dist --download-suitesparse --download-scalapack --download-blacs --download-hdf5 --download-parmetis --download-metis --with-debugging=1 --download-mumps [0]PETSC ERROR: #1 ISLocalToGlobalMappingApply() line 639 in /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/vec/is/utils/isltog.c [0]PETSC ERROR: #2 MatSetValuesLocal() line 2139 in /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #3 MatSetValuesStencil() line 1550 in /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/mat/interface/matrix.c -- Dipl.-Math. Henrik B?sing Institute for Applied Geophysics and Geothermal Energy E.ON Energy Research Center RWTH Aachen University ------------------------------------------------------ Mathieustr. 10 | Tel +49 (0)241 80 49907 52074 Aachen, Germany | Fax +49 (0)241 80 49889 ------------------------------------------------------ http://www.eonerc.rwth-aachen.de/GGE hbuesing at eonerc.rwth-aachen.de ------------------------------------------------------ Von: Matthew Knepley [mailto:knepley at gmail.com] Gesendet: 07 November 2017 12:54 An: Buesing, Henrik Cc: petsc-users Betreff: Re: [petsc-users] Newton methods that converge all the time On Tue, Nov 7, 2017 at 4:19 AM, Buesing, Henrik > wrote: Dear all, I am solving a system of nonlinear, transient PDEs. I am using Newton?s method in every time step to solve the nonlinear algebraic equations. Of course, Newton?s method only converges if the initial guess is sufficiently close to the solution. This is often not the case and Newton?s method diverges. Then, I reduce the time step and try again. This can become prohibitively costly, if the time steps get very small. I am thus looking for variants of Newton?s method, which have a bigger convergence radius or ideally converge all the time. I tried out the pseudo-timestepping described in http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex1f.F.html. However, this does converge even worse. I am seeing breakdown when I have phase changes (e.g. liquid to two-phase). I was under the impression that pseudo-timestepping should converge better. Thus, my question: Am I doing something wrong or is it possible that Newton?s method converges and pseudo-timestepping does not? Thank you for any insight on this. Hi Hendrik, I would try using NGMRES as a nonlinear preconditioner. I have an example in my tutorial slides for using it with SNES ex19. I hope this will work because I suspect that around the phase boundary Newton directions are noisy, since sometimes you step into the other phase. NGMRES takes a few directions (you set the m) and then picks the best one. Hopefully this helps, Matt Henrik -- Dipl.-Math. Henrik B?sing Institute for Applied Geophysics and Geothermal Energy E.ON Energy Research Center RWTH Aachen University ------------------------------------------------------ Mathieustr. 10 | Tel +49 (0)241 80 49907 52074 Aachen, Germany | Fax +49 (0)241 80 49889 ------------------------------------------------------ http://www.eonerc.rwth-aachen.de/GGE hbuesing at eonerc.rwth-aachen.de ------------------------------------------------------ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Nov 30 07:49:05 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 30 Nov 2017 07:49:05 -0600 Subject: [petsc-users] Newton methods that converge all the time In-Reply-To: References: Message-ID: On Thu, Nov 30, 2017 at 7:41 AM, Buesing, Henrik < hbuesing at eonerc.rwth-aachen.de> wrote: > Dear Matt, > > > > I wanted to try out NGMRES. I am using options [1] from your tutorial > slides. This works fine with SNES ex19, but fails with error [2] when I use > my code. When I use my code with SNES options ?-snes_type newtonls > -snes_linesearch_type l2? it runs fine. > > > > Is this me doing something wrong in FormJacobian or what could this be? Is > there a SNES example with MatSetValuesStencil I could try with NGMRES? > 1) There should be more to the stack frame. I assume you call MatSetValuesStencil(). Do you use PetscFunctionBegin/Return() in that function? It would add to the stack 2) I am guessing that you are using a DMDA. There should be an ISL2G there automatically. It does not make sense to me that its gone, and ex19 works that way. The first thing to do to track it down is to get the full stack. Then I think we probably need an example small enough to run here to track through everything, unless you are good with teh debugger. Thanks, Matt > Thank you! > Henrik > > > > > > > > [1] > > > > -snes_type ngmres -npc_snes_max_it 1 -snes_converged_reason -npc_snes_type > fas -npc_fas_coarse_snes_converged_reason -npc_fas_levels_snes_type > newtonls -npc_fas_levels_snes_max_it 6 -npc_fas_levels_snes_linesearch_type > basic -npc_fas_levels_snes_max_linear_solve_fail 30 > -npc_fas_levels_ksp_max_it 20 -npc_fas_levels_snes_converged_reason > -npc_fas_coarse_snes_linesearch_type basic > > > > [2] > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Null argument, when expecting valid pointer > > [0]PETSC ERROR: Null Object: Parameter # 1 > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.8.2, unknown > > [0]PETSC ERROR: /rwthfs/rz/cluster/work/hb111949/Descramble/1D_Model/ > steam/JUBE/ngmres/debug/shem_fw64gnu_steam.x on a arch-linux2-c-debug > named linuxbmc0004.rz.RWTH-Aachen.DE by hb111949 Thu Nov 30 14:35:56 2017 > > [0]PETSC ERROR: Configure options --download-fblaslapack --with-cc=mpicc > --with-fc=mpif90 --with-cxx=mpicxx --download-hypre --download-superlu_dist > --download-suitesparse --download-scalapack --download-blacs > --download-hdf5 --download-parmetis --download-metis --with-debugging=1 > --download-mumps > > [0]PETSC ERROR: #1 ISLocalToGlobalMappingApply() line 639 in > /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/vec/is/utils/isltog.c > > [0]PETSC ERROR: #2 MatSetValuesLocal() line 2139 in > /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/mat/interface/matrix.c > > [0]PETSC ERROR: #3 MatSetValuesStencil() line 1550 in > /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/mat/interface/matrix.c > > > > > > -- > > Dipl.-Math. Henrik B?sing > > Institute for Applied Geophysics and Geothermal Energy > > E.ON Energy Research Center > > RWTH Aachen University > > ------------------------------------------------------ > > Mathieustr. 10 > > | Tel +49 (0)241 80 49907 <+49%20241%208049907> > > 52074 Aachen, Germany | Fax +49 (0)241 80 49889 > <+49%20241%208049889> > > ------------------------------------------------------ > > http://www.eonerc.rwth-aachen.de/GGE > > hbuesing at eonerc.rwth-aachen.de > > ------------------------------------------------------ > > > > *Von:* Matthew Knepley [mailto:knepley at gmail.com] > *Gesendet:* 07 November 2017 12:54 > *An:* Buesing, Henrik > *Cc:* petsc-users > *Betreff:* Re: [petsc-users] Newton methods that converge all the time > > > > On Tue, Nov 7, 2017 at 4:19 AM, Buesing, Henrik < > hbuesing at eonerc.rwth-aachen.de> wrote: > > Dear all, > > > > I am solving a system of nonlinear, transient PDEs. I am using Newton?s > method in every time step to solve the nonlinear algebraic equations. Of > course, Newton?s method only converges if the initial guess is sufficiently > close to the solution. > > This is often not the case and Newton?s method diverges. Then, I reduce > the time step and try again. This can become prohibitively costly, if the > time steps get very small. I am thus looking for variants of Newton?s > method, which have a bigger convergence radius or ideally converge all the > time. > > > > I tried out the pseudo-timestepping described in > http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/ > tutorials/ex1f.F.html. > > > > However, this does converge even worse. I am seeing breakdown when I have > phase changes (e.g. liquid to two-phase). > > > > I was under the impression that pseudo-timestepping should converge > better. Thus, my question: > > Am I doing something wrong or is it possible that Newton?s method > converges and pseudo-timestepping does not? > > Thank you for any insight on this. > > > > Hi Hendrik, > > > > I would try using NGMRES as a nonlinear preconditioner. I have an example > in my tutorial slides for using it with SNES ex19. > > I hope this will work because I suspect that around the phase boundary > Newton directions are noisy, since sometimes you > > step into the other phase. NGMRES takes a few directions (you set the m) > and then picks the best one. > > > > Hopefully this helps, > > > > Matt > > > > Henrik > > > > > > -- > > Dipl.-Math. Henrik B?sing > > Institute for Applied Geophysics and Geothermal Energy > > E.ON Energy Research Center > > RWTH Aachen University > > ------------------------------------------------------ > > Mathieustr. 10 > > | Tel +49 (0)241 80 49907 <+49%20241%208049907> > > 52074 Aachen, Germany | Fax +49 (0)241 80 49889 > <+49%20241%208049889> > > ------------------------------------------------------ > > http://www.eonerc.rwth-aachen.de/GGE > > hbuesing at eonerc.rwth-aachen.de > > ------------------------------------------------------ > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hbuesing at eonerc.rwth-aachen.de Thu Nov 30 08:22:50 2017 From: hbuesing at eonerc.rwth-aachen.de (Buesing, Henrik) Date: Thu, 30 Nov 2017 14:22:50 +0000 Subject: [petsc-users] Newton methods that converge all the time In-Reply-To: References: Message-ID: 1) There should be more to the stack frame. I assume you call MatSetValuesStencil(). Do you use PetscFunctionBegin/Return() in that function? It would add to the stack [Buesing, Henrik] Yes, I call MatSetValuesStencil(). This is the full error. I abort the program after setting MatSetValuesStencil() once. If I do not do this, the program runs on, but the Jacobian is assembled wrong. The program terminates reaching minimum time step size. 2) I am guessing that you are using a DMDA. There should be an ISL2G there automatically. It does not make sense to me that its gone, and ex19 works that way. The first thing to do to track it down is to get the full stack. Then I think we probably need an example small enough to run here to track through everything, unless you are good with teh debugger. [Buesing, Henrik] Yes, I am using a DMDA. The program does not crash abnormally, just does not assemble the Jacobian in a correct way. How would I get the full stack? Henrik Thanks, Matt Thank you! Henrik [1] -snes_type ngmres -npc_snes_max_it 1 -snes_converged_reason -npc_snes_type fas -npc_fas_coarse_snes_converged_reason -npc_fas_levels_snes_type newtonls -npc_fas_levels_snes_max_it 6 -npc_fas_levels_snes_linesearch_type basic -npc_fas_levels_snes_max_linear_solve_fail 30 -npc_fas_levels_ksp_max_it 20 -npc_fas_levels_snes_converged_reason -npc_fas_coarse_snes_linesearch_type basic [2] [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Null argument, when expecting valid pointer [0]PETSC ERROR: Null Object: Parameter # 1 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.8.2, unknown [0]PETSC ERROR: /rwthfs/rz/cluster/work/hb111949/Descramble/1D_Model/steam/JUBE/ngmres/debug/shem_fw64gnu_steam.x on a arch-linux2-c-debug named linuxbmc0004.rz.RWTH-Aachen.DE by hb111949 Thu Nov 30 14:35:56 2017 [0]PETSC ERROR: Configure options --download-fblaslapack --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --download-hypre --download-superlu_dist --download-suitesparse --download-scalapack --download-blacs --download-hdf5 --download-parmetis --download-metis --with-debugging=1 --download-mumps [0]PETSC ERROR: #1 ISLocalToGlobalMappingApply() line 639 in /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/vec/is/utils/isltog.c [0]PETSC ERROR: #2 MatSetValuesLocal() line 2139 in /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #3 MatSetValuesStencil() line 1550 in /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/mat/interface/matrix.c -- Dipl.-Math. Henrik B?sing Institute for Applied Geophysics and Geothermal Energy E.ON Energy Research Center RWTH Aachen University ------------------------------------------------------ Mathieustr. 10 | Tel +49 (0)241 80 49907 52074 Aachen, Germany | Fax +49 (0)241 80 49889 ------------------------------------------------------ http://www.eonerc.rwth-aachen.de/GGE hbuesing at eonerc.rwth-aachen.de ------------------------------------------------------ Von: Matthew Knepley [mailto:knepley at gmail.com] Gesendet: 07 November 2017 12:54 An: Buesing, Henrik > Cc: petsc-users > Betreff: Re: [petsc-users] Newton methods that converge all the time On Tue, Nov 7, 2017 at 4:19 AM, Buesing, Henrik > wrote: Dear all, I am solving a system of nonlinear, transient PDEs. I am using Newton?s method in every time step to solve the nonlinear algebraic equations. Of course, Newton?s method only converges if the initial guess is sufficiently close to the solution. This is often not the case and Newton?s method diverges. Then, I reduce the time step and try again. This can become prohibitively costly, if the time steps get very small. I am thus looking for variants of Newton?s method, which have a bigger convergence radius or ideally converge all the time. I tried out the pseudo-timestepping described in http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex1f.F.html. However, this does converge even worse. I am seeing breakdown when I have phase changes (e.g. liquid to two-phase). I was under the impression that pseudo-timestepping should converge better. Thus, my question: Am I doing something wrong or is it possible that Newton?s method converges and pseudo-timestepping does not? Thank you for any insight on this. Hi Hendrik, I would try using NGMRES as a nonlinear preconditioner. I have an example in my tutorial slides for using it with SNES ex19. I hope this will work because I suspect that around the phase boundary Newton directions are noisy, since sometimes you step into the other phase. NGMRES takes a few directions (you set the m) and then picks the best one. Hopefully this helps, Matt Henrik -- Dipl.-Math. Henrik B?sing Institute for Applied Geophysics and Geothermal Energy E.ON Energy Research Center RWTH Aachen University ------------------------------------------------------ Mathieustr. 10 | Tel +49 (0)241 80 49907 52074 Aachen, Germany | Fax +49 (0)241 80 49889 ------------------------------------------------------ http://www.eonerc.rwth-aachen.de/GGE hbuesing at eonerc.rwth-aachen.de ------------------------------------------------------ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Nov 30 08:35:46 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 30 Nov 2017 08:35:46 -0600 Subject: [petsc-users] Newton methods that converge all the time In-Reply-To: References: Message-ID: On Thu, Nov 30, 2017 at 8:22 AM, Buesing, Henrik < hbuesing at eonerc.rwth-aachen.de> wrote: > 1) There should be more to the stack frame. I assume you call > MatSetValuesStencil(). Do you use PetscFunctionBegin/Return() in that > function? It would add to the stack > > > [Buesing, Henrik] Yes, I call MatSetValuesStencil(). This is the full > error. I abort the program after setting MatSetValuesStencil() once. If I > do not do this, the program runs on, but the Jacobian is assembled wrong. > The program terminates reaching minimum time step size. > > > 2) I am guessing that you are using a DMDA. There should be an ISL2G there > automatically. It does not make sense to me that its gone, > > and ex19 works that way. The first thing to do to track it down is to get > the full stack. Then I think we probably need an example small > > enough to run here to track through everything, unless you are good with > teh debugger. > > [Buesing, Henrik] Yes, I am using a DMDA. The program does not crash > abnormally, just does not assemble the Jacobian in a correct way. How would > I get the full stack? > If you can run in the debugger, like gdb, using -start_in_debugger, just type 'where' when you hit the error. Thanks, Matt > Henrik > > > > Thanks, > > > > Matt > > > > Thank you! > Henrik > > > > > > > > [1] > > > > -snes_type ngmres -npc_snes_max_it 1 -snes_converged_reason -npc_snes_type > fas -npc_fas_coarse_snes_converged_reason -npc_fas_levels_snes_type > newtonls -npc_fas_levels_snes_max_it 6 -npc_fas_levels_snes_linesearch_type > basic -npc_fas_levels_snes_max_linear_solve_fail 30 > -npc_fas_levels_ksp_max_it 20 -npc_fas_levels_snes_converged_reason > -npc_fas_coarse_snes_linesearch_type basic > > > > [2] > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Null argument, when expecting valid pointer > > [0]PETSC ERROR: Null Object: Parameter # 1 > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.8.2, unknown > > [0]PETSC ERROR: /rwthfs/rz/cluster/work/hb111949/Descramble/1D_Model/ > steam/JUBE/ngmres/debug/shem_fw64gnu_steam.x on a arch-linux2-c-debug > named linuxbmc0004.rz.RWTH-Aachen.DE by hb111949 Thu Nov 30 14:35:56 2017 > > [0]PETSC ERROR: Configure options --download-fblaslapack --with-cc=mpicc > --with-fc=mpif90 --with-cxx=mpicxx --download-hypre --download-superlu_dist > --download-suitesparse --download-scalapack --download-blacs > --download-hdf5 --download-parmetis --download-metis --with-debugging=1 > --download-mumps > > [0]PETSC ERROR: #1 ISLocalToGlobalMappingApply() line 639 in > /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/vec/is/utils/isltog.c > > [0]PETSC ERROR: #2 MatSetValuesLocal() line 2139 in > /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/mat/interface/matrix.c > > [0]PETSC ERROR: #3 MatSetValuesStencil() line 1550 in > /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/mat/interface/matrix.c > > > > > > -- > > Dipl.-Math. Henrik B?sing > > Institute for Applied Geophysics and Geothermal Energy > > E.ON Energy Research Center > > RWTH Aachen University > > ------------------------------------------------------ > > Mathieustr. 10 > > | Tel +49 (0)241 80 49907 <+49%20241%208049907> > > 52074 Aachen, Germany | Fax +49 (0)241 80 49889 > <+49%20241%208049889> > > ------------------------------------------------------ > > http://www.eonerc.rwth-aachen.de/GGE > > hbuesing at eonerc.rwth-aachen.de > > ------------------------------------------------------ > > > > *Von:* Matthew Knepley [mailto:knepley at gmail.com] > *Gesendet:* 07 November 2017 12:54 > *An:* Buesing, Henrik > *Cc:* petsc-users > *Betreff:* Re: [petsc-users] Newton methods that converge all the time > > > > On Tue, Nov 7, 2017 at 4:19 AM, Buesing, Henrik < > hbuesing at eonerc.rwth-aachen.de> wrote: > > Dear all, > > > > I am solving a system of nonlinear, transient PDEs. I am using Newton?s > method in every time step to solve the nonlinear algebraic equations. Of > course, Newton?s method only converges if the initial guess is sufficiently > close to the solution. > > This is often not the case and Newton?s method diverges. Then, I reduce > the time step and try again. This can become prohibitively costly, if the > time steps get very small. I am thus looking for variants of Newton?s > method, which have a bigger convergence radius or ideally converge all the > time. > > > > I tried out the pseudo-timestepping described in > http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/ > tutorials/ex1f.F.html. > > > > However, this does converge even worse. I am seeing breakdown when I have > phase changes (e.g. liquid to two-phase). > > > > I was under the impression that pseudo-timestepping should converge > better. Thus, my question: > > Am I doing something wrong or is it possible that Newton?s method > converges and pseudo-timestepping does not? > > Thank you for any insight on this. > > > > Hi Hendrik, > > > > I would try using NGMRES as a nonlinear preconditioner. I have an example > in my tutorial slides for using it with SNES ex19. > > I hope this will work because I suspect that around the phase boundary > Newton directions are noisy, since sometimes you > > step into the other phase. NGMRES takes a few directions (you set the m) > and then picks the best one. > > > > Hopefully this helps, > > > > Matt > > > > Henrik > > > > > > -- > > Dipl.-Math. Henrik B?sing > > Institute for Applied Geophysics and Geothermal Energy > > E.ON Energy Research Center > > RWTH Aachen University > > ------------------------------------------------------ > > Mathieustr. 10 > > | Tel +49 (0)241 80 49907 <+49%20241%208049907> > > 52074 Aachen, Germany | Fax +49 (0)241 80 49889 > <+49%20241%208049889> > > ------------------------------------------------------ > > http://www.eonerc.rwth-aachen.de/GGE > > hbuesing at eonerc.rwth-aachen.de > > ------------------------------------------------------ > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hbuesing at eonerc.rwth-aachen.de Thu Nov 30 08:52:25 2017 From: hbuesing at eonerc.rwth-aachen.de (Buesing, Henrik) Date: Thu, 30 Nov 2017 14:52:25 +0000 Subject: [petsc-users] Newton methods that converge all the time In-Reply-To: References: Message-ID: There should be an ISL2G there automatically. It does not make sense to me that its gone, [Buesing, Henrik] You are right! I was not calling DMGlobaltoLocal. I was operating on the global arrays. Thus, it was not there. Works now! Thank you! Henrik PS: I am using Fortran so no PetscFunctionBegin/Return() there. And yes, I am not good with the debugger as you already figured. Thanks, Matt Thank you! Henrik [1] -snes_type ngmres -npc_snes_max_it 1 -snes_converged_reason -npc_snes_type fas -npc_fas_coarse_snes_converged_reason -npc_fas_levels_snes_type newtonls -npc_fas_levels_snes_max_it 6 -npc_fas_levels_snes_linesearch_type basic -npc_fas_levels_snes_max_linear_solve_fail 30 -npc_fas_levels_ksp_max_it 20 -npc_fas_levels_snes_converged_reason -npc_fas_coarse_snes_linesearch_type basic [2] [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Null argument, when expecting valid pointer [0]PETSC ERROR: Null Object: Parameter # 1 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.8.2, unknown [0]PETSC ERROR: /rwthfs/rz/cluster/work/hb111949/Descramble/1D_Model/steam/JUBE/ngmres/debug/shem_fw64gnu_steam.x on a arch-linux2-c-debug named linuxbmc0004.rz.RWTH-Aachen.DE by hb111949 Thu Nov 30 14:35:56 2017 [0]PETSC ERROR: Configure options --download-fblaslapack --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --download-hypre --download-superlu_dist --download-suitesparse --download-scalapack --download-blacs --download-hdf5 --download-parmetis --download-metis --with-debugging=1 --download-mumps [0]PETSC ERROR: #1 ISLocalToGlobalMappingApply() line 639 in /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/vec/is/utils/isltog.c [0]PETSC ERROR: #2 MatSetValuesLocal() line 2139 in /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #3 MatSetValuesStencil() line 1550 in /rwthfs/rz/cluster/home/hb111949/Code/petsc/src/mat/interface/matrix.c -- Dipl.-Math. Henrik B?sing Institute for Applied Geophysics and Geothermal Energy E.ON Energy Research Center RWTH Aachen University ------------------------------------------------------ Mathieustr. 10 | Tel +49 (0)241 80 49907 52074 Aachen, Germany | Fax +49 (0)241 80 49889 ------------------------------------------------------ http://www.eonerc.rwth-aachen.de/GGE hbuesing at eonerc.rwth-aachen.de ------------------------------------------------------ Von: Matthew Knepley [mailto:knepley at gmail.com] Gesendet: 07 November 2017 12:54 An: Buesing, Henrik > Cc: petsc-users > Betreff: Re: [petsc-users] Newton methods that converge all the time On Tue, Nov 7, 2017 at 4:19 AM, Buesing, Henrik > wrote: Dear all, I am solving a system of nonlinear, transient PDEs. I am using Newton?s method in every time step to solve the nonlinear algebraic equations. Of course, Newton?s method only converges if the initial guess is sufficiently close to the solution. This is often not the case and Newton?s method diverges. Then, I reduce the time step and try again. This can become prohibitively costly, if the time steps get very small. I am thus looking for variants of Newton?s method, which have a bigger convergence radius or ideally converge all the time. I tried out the pseudo-timestepping described in http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex1f.F.html. However, this does converge even worse. I am seeing breakdown when I have phase changes (e.g. liquid to two-phase). I was under the impression that pseudo-timestepping should converge better. Thus, my question: Am I doing something wrong or is it possible that Newton?s method converges and pseudo-timestepping does not? Thank you for any insight on this. Hi Hendrik, I would try using NGMRES as a nonlinear preconditioner. I have an example in my tutorial slides for using it with SNES ex19. I hope this will work because I suspect that around the phase boundary Newton directions are noisy, since sometimes you step into the other phase. NGMRES takes a few directions (you set the m) and then picks the best one. Hopefully this helps, Matt Henrik -- Dipl.-Math. Henrik B?sing Institute for Applied Geophysics and Geothermal Energy E.ON Energy Research Center RWTH Aachen University ------------------------------------------------------ Mathieustr. 10 | Tel +49 (0)241 80 49907 52074 Aachen, Germany | Fax +49 (0)241 80 49889 ------------------------------------------------------ http://www.eonerc.rwth-aachen.de/GGE hbuesing at eonerc.rwth-aachen.de ------------------------------------------------------ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Nov 30 09:59:43 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 30 Nov 2017 09:59:43 -0600 Subject: [petsc-users] consistence of PETSC/SLEPC with MPI, BLACS, SCALAPACK calls... In-Reply-To: References: <39CC65C5-03D0-4C5A-A008-C13380DF16AA@dsic.upv.es> Message-ID: On Thu, 30 Nov 2017, Giacomo Mulas wrote: > On Mon, 27 Nov 2017, Satish Balay wrote: > > > These questions are more pertinant to MKL - i.e what interface does > > MKL ilp64 blacs library provide for Cblacs_get() etc.. > > I don't know if it's more pertinent, but indeed it is pertinent also to MKL. Well PETSc does not use scalapack/blacs. Its primarily a dependency for MUMPS [which does not support 64bit indices anyway] And you are calling [mkl] blacs directly from your code. Hence its primarily pertinant to MKL - and its usage from your code. Sure - you are using PETSc datatypes - but thats secondary to this issue. You have to figure out what datatypes ilp64 mkl blacs expects [and what mpi datatypes it uses] - and thats what you would have to pass in - when calling blacs functions. > The pertinence with SLEPC/PETSC is that my code _also_ uses SLEPC/PETSC, and > I run it both on an architecture in which SLEPC/PETSC is compiled against > the ilp64 mkl and on architectures in which SLEPC/PETSC is compiled against > stock standard scalapack/blacs/lapack/blas libs. > > > > > The following url has some info - and some references to ilp64 MPI > > https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/474149 > > thanks for the link. > > > [PETSc is not tested with ilp64 MPI] > > well, the version of the code that does not yet use scalapack but only slepc > appears to work fine with it. Yes - this is expected to work - as we support using ilp64 blas/lapack from PETSc Satish From a.croucher at auckland.ac.nz Thu Nov 30 14:19:31 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Fri, 1 Dec 2017 09:19:31 +1300 Subject: [petsc-users] Newton methods that converge all the time Message-ID: <209cb0a6-b6ac-a264-1b6a-a15e98949438@auckland.ac.nz> > > Please describe in some detail how you are handling phase change. If > you have if () tests of any sort in your FormFunction() or > FormJacobian() this can kill Newton's method. If you are using "variable > switching" this WILL kill Newtons' method. Are you monkeying with phase > definitions in TSPostStep or with SNESLineSearchSetPostCheck(). This > will also kill Newton's method. I'm doing variable switching (in a geothermal flow application) with Newton's method (in SNESLineSearchSetPostCheck()) and it generally works fine. For pure water (no other components present) my variables are pressure and temperature for single-phase (liquid or vapour) and pressure and vapour saturation for two-phase. You have to be pretty careful how you do the switching though. - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611