[petsc-users] petsc4py sparse matrix construction time

Wed Nov 1 08:34:16 CDT 2017

Should add:

To make sure your matrix assembly is efficient - you should run the
code with the option -info, and make sure there are no mallocs during
assembly.

I'm not sure how petsc4py processes options. One way is to set such
options with PETSC_OPTIONS env variable - and then run the code.

Satish

On Wed, 1 Nov 2017, Matthew Knepley wrote:

> On Tue, Oct 31, 2017 at 10:00 PM, Cetinbas, Cankur Firat <ccetinbas at anl.gov>
> wrote:
> 
> > Hi,
> >
> > Thanks a lot. Based on both of your suggestions I modified the code using
> > Mat.createAIJ() and csr option. The computation time decreased
> > significantly after using this method. Still if there is a better option
> > please let me know after seeing the modified code below.
> >
> > At first trial with 1000x1000 matrix with 96019 non-zeros in the matrix,
> > the computation time did not scale with the number of cores : Single core
> > python @ 0.0035s, single core petsc @ 0.0024s, 2 cores petsc @ 0.0036s, 4
> > cores petsc @ 0.0032, 8 cores petsc @ 0.0030s.
> >
> > Then I tried with larger matrix 181797x181797 with more non-zeros and I
> > got the following results: Single core python @ 0.021, single core petsc @
> > 0.031, 2 cores petsc @ 0.024s, 4 cores petsc @ 0.014, 8 cores petsc @
> > 0.009s, 16 cores petsc @ 0.0087s.
> >
> > I think the optimum number of nodes is highly dependent on matrix size and
> > the number of non-zeros. In the real code matrix size (and so the number of
> > non-zero elements) will grow at every iteration starting with very small
> > matrices growing to very big ones. Is it possible to set the number process
> > from the code dynamically?
> >
> 
> I am not sure how you are interpreting these measurements. Normally, I
> would say
> 
>   1) Everything timed below is "parallel overhead". This is not intended to
> scale with P, instead it will look like a constant, as you observe
> 
>   2) The time to compute the matrix entires should far outstrip the time
> below to figure the nonzero structure. Is this true?
> 
>   3) Solve time is often larger than matrix calculation. Is it?
> 
> Thus, we deciding on parallelism, you need to look at the largest costs,
> and how they scale with P.
> 
>   Thanks,
> 
>      Matt
> 
> Another question is about the data types; mpi4py only let me transfer float
> > type data, and petsc4py only lets me use int32 type indices. Besides keep
> > converting the data, is there any solution for this?
> >
> > The modified code for matrix creation part:
> >
> > comm = MPI.COMM_WORLD
> > rank = comm.Get_rank()
> > size = comm.Get_size()
> >
> > if rank==0:
> >     row = np.loadtxt('row1000.out').astype(dtype='int32')
> >     col = np.loadtxt('col1000.out').astype(dtype='int32')
> >     val = np.loadtxt('val1000.out').astype(dtype='int32')
> >     m = 1000    # 1000  x 1000 matrix
> >     if size>1:
> >         rbc = np.bincount(row)*1.0
> >         ieq = int(np.floor(m/size))
> >         a = [ieq]*size
> >         ix = int(np.mod(m,size))
> >         if ix>0:
> >             for i in range(0,ix):
> >                 a[i]= a[i]+1
> >         a = np.array([0]+a).cumsum()
> >         b = np.zeros(a.shape[0]-1)
> >         for i in range(0,a.shape[0]-1):
> >             b[i]=rbc[a[i]:a[i+1]].sum()      # b is the send counts for
> > Scatterv
> >         row = row.astype(dtype=float)
> >         col = col.astype(dtype=float)
> >         val = val.astype(dtype=float)
> > else:
> >     row=None
> >     col=None
> >     val=None
> >     indpt=None
> >     b=None
> >     m=None
> >
> > if size>1:
> >     ml = comm.bcast(m,root=0)
> >     bl = comm.bcast(b,root=0)
> >     row_lcl = np.zeros(bl[rank])
> >     col_lcl = row_lcl.copy()
> >     val_lcl = row_lcl.copy()
> >
> >     comm.Scatterv([row,b],row_lcl)
> >     comm.Scatterv([col,b],col_lcl)
> >     comm.Scatterv([val,b],val_lcl)
> >     comm.Barrier()
> >
> >     row_lcl = row_lcl.astype(dtype='int32')
> >     col_lcl = col_lcl.astype(dtype='int32')
> >     val_lcl = val_lcl.astype(dtype='int32')
> >
> >     indptr = np.bincount(row_lcl)
> >     indptr = indptr[indptr>0]
> >     indptr = np.insert(indptr,0,0).cumsum()
> >     indptr = indptr.astype(dtype='int32')
> >     comm.Barrier()
> >
> >     pA = PETSc.Mat().createAIJ(size=(ml,ml),csr=(indptr, col_lcl,
> > val_lcl))     # Matrix generation
> >
> > else:
> >     indptr = np.bincount(row)
> >     indptr = np.insert(indptr,0,0).cumsum()
> >     indptr = indptr.astype(dtype='int32')
> >     st=time.time()
> >     pA = PETSc.Mat().createAIJ(size=(m,m),csr=(indptr, col, val))
> >     print('dt:',time.time()-st)
> >
> >
> > Regards,
> >
> > Firat
> >
> >
> > -----Original Message-----
> > From: Smith, Barry F.
> > Sent: Tuesday, October 31, 2017 10:18 AM
> > To: Matthew Knepley
> > Cc: Cetinbas, Cankur Firat; petsc-users at mcs.anl.gov; Ahluwalia, Rajesh K.
> > Subject: Re: [petsc-users] petsc4py sparse matrix construction time
> >
> >
> >    You also need to make sure that most matrix entries are generated on
> > the process that they will belong on.
> >
> >   Barry
> >
> > > On Oct 30, 2017, at 8:01 PM, Matthew Knepley <knepley at gmail.com> wrote:
> > >
> > > On Mon, Oct 30, 2017 at 8:06 PM, Cetinbas, Cankur Firat <
> > ccetinbas at anl.gov> wrote:
> > > Hello,
> > >
> > >
> > >
> > > I am a beginner both in PETSc and mpi4py. I have been working on
> > parallelizing our water transport code (where we solve linear system of
> > equations) and I started with the toy code below.
> > >
> > >
> > >
> > > The toy code reads right hand size (rh), row, column, value vectors to
> > construct sparse coefficient matrix and then scatters them to construct
> > parallel PETSc coefficient matrix and right hand side vector.
> > >
> > >
> > >
> > > The sparse matrix generation time is extremely high in comparison to
> > sps.csr_matrix((val, (row, col)), shape=(n,n)) in python. For instance
> > python generates 181197x181197 sparse matrix in 0.06 seconds and this code
> > with 32 cores:1.19s, 16 cores:6.98s and 8 cores:29.5 s.  I was wondering if
> > I am making a mistake in generating sparse matrix? Is there a more
> > efficient way?
> > >
> > >
> > > It looks like you do not preallocate the matrix. There is a chapter on
> > this in the manual.
> > >
> > >    Matt
> > >
> > > Thanks for your help in advance.
> > >
> > >
> > >
> > > Regards,
> > >
> > >
> > >
> > > Firat
> > >
> > >
> > >
> > > from petsc4py import PETSc
> > >
> > > from mpi4py import MPI
> > >
> > > import numpy as np
> > >
> > > import time
> > >
> > >
> > >
> > > comm = MPI.COMM_WORLD
> > >
> > > rank = comm.Get_rank()
> > >
> > > size = comm.Get_size()
> > >
> > >
> > >
> > > if rank==0:
> > >
> > >     # proc 0 loads tomo image and does fast calculations to append row,
> > col, val, rh lists
> > >
> > >     # in the real code this vectors will be available on proc 0 no txt
> > files are read
> > >
> > >     row = np.loadtxt('row.out')  # indices of non-zero rows
> > >
> > >     col = np.loadtxt('col.out') # indices of non-zero columns
> > >
> > >     val = np.loadtxt('vs.out')  # values in the sparse matrix
> > >
> > >     rh = np.loadtxt('RHS.out') # right hand side vector
> > >
> > >     n = row.shape[0]  #1045699
> > >
> > >     m = rh.shape[0] #181197 square sparse matrix size
> > >
> > > else:
> > >
> > >     n = None
> > >
> > >     m = None
> > >
> > >     row = None
> > >
> > >     col = None
> > >
> > >     val = None
> > >
> > >     rh = None
> > >
> > >     rh_ind = None
> > >
> > >
> > >
> > > m_lcl = comm.bcast(m,root=0)
> > >
> > > n_lcl = comm.bcast(n,root=0)
> > >
> > > neq = n_lcl//size
> > >
> > > meq = m_lcl//size
> > >
> > > nx = np.mod(n_lcl,size)
> > >
> > > mx = np.mod(m_lcl,size)
> > >
> > > row_lcl = np.zeros(neq)
> > >
> > > col_lcl = np.zeros(neq)
> > >
> > > val_lcl = np.zeros(neq)
> > >
> > > rh_lcl = np.zeros(meq)
> > >
> > > a = [neq]*size #send counts for Scatterv
> > >
> > > am = [meq]*size #send counts for Scatterv
> > >
> > >
> > >
> > > if nx>0:
> > >
> > >     for i in range(0,nx):
> > >
> > >         if rank==i:
> > >
> > >             row_lcl = np.zeros(neq+1)
> > >
> > >             col_lcl = np.zeros(neq+1)
> > >
> > >             val_lcl = np.zeros(neq+1)
> > >
> > >         a[i] = a[i]+1
> > >
> > > if mx>0:
> > >
> > >     for ii in range(0,mx):
> > >
> > >         if rank==ii:
> > >
> > >             rh_lcl = np.zeros(meq+1)
> > >
> > >         am[ii] = am[ii]+1
> > >
> > >
> > >
> > > comm.Scatterv([row,a],row_lcl)
> > >
> > > comm.Scatterv([col,a],col_lcl)
> > >
> > > comm.Scatterv([val,a],val_lcl)
> > >
> > > comm.Scatterv([rh,am],rh_lcl)
> > >
> > > comm.Barrier()
> > >
> > >
> > >
> > > A = PETSc.Mat()
> > >
> > > A.create()
> > >
> > > A.setSizes([m_lcl,m_lcl])
> > >
> > > A.setType('aij')
> > >
> > > A.setUp()
> > >
> > > lr = row_lcl.shape[0]
> > >
> > > for i in range(0,lr):
> > >
> > >     A[row_lcl[i],col_lcl[i]] = val_lcl[i]
> > >
> > > A.assemblyBegin()
> > >
> > > A.assemblyEnd()
> > >
> > >
> > >
> > > if size>1: # to get the range for scattered vectors
> > >
> > >     ami = [0]
> > >
> > >     ami = np.array([0]+am).cumsum()
> > >
> > >     for kk in range(0,size):
> > >
> > >         if rank==kk:
> > >
> > >             Is = ami[kk]
> > >
> > >             Ie = ami[kk+1]
> > >
> > > else:
> > >
> > >     Is=0; Ie=m_lcl
> > >
> > >
> > >
> > > b= PETSc.Vec()
> > >
> > > b.create()
> > >
> > > b.setSizes(m_lcl)
> > >
> > > b.setFromOptions()
> > >
> > > b.setUp()
> > >
> > > b.setValues(list(range(Is,Ie)),rh_lcl)
> > >
> > > b.assemblyBegin()
> > >
> > > b.assemblyEnd()
> > >
> > >
> > >
> > > # solution vector
> > >
> > > x = b.duplicate()
> > >
> > > x.assemblyBegin()
> > >
> > > x.assemblyEnd()
> > >
> > >
> > >
> > > # create linear solver
> > >
> > > ksp = PETSc.KSP()
> > >
> > > ksp.create()
> > >
> > > ksp.setOperators(A)
> > >
> > > ksp.setType('cg')
> > >
> > > #ksp.getPC().setType('icc') # only sequential
> > >
> > > ksp.getPC().setType('jacobi')
> > >
> > > print('solving with:', ksp.getType())
> > >
> > >
> > >
> > > #solve
> > >
> > > st=time.time()
> > >
> > > ksp.solve(b,x)
> > >
> > > et=time.time()
> > >
> > > print(et-st)
> > >
> > >
> > >
> > > if size>1:
> > >
> > >     #gather
> > >
> > >     if rank==0:
> > >
> > >         xGthr = np.zeros(m)
> > >
> > >     else:
> > >
> > >         xGthr = None
> > >
> > >     comm.Gatherv(x,[xGthr,am])
> > >
> > >
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which their
> > experiments lead.
> > > -- Norbert Wiener
> > >
> > > https://www.cse.buffalo.edu/~knepley/
> >
> >
> 
> 
>