[petsc-users] PETSC matrix assembling super slow

Mark Adams mfadams at lbl.gov
Tue Feb 5 14:22:03 CST 2019


On Mon, Feb 4, 2019 at 4:17 PM Yaxiong Chen <chen2018 at purdue.edu> wrote:

> Hi Mark,
>
>      Will the parameter MatMPIAIJSetPreallocation in influence the
> following part
>       do i=mystart,nelem,nproc
>             call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)
>             ne=size(idx)
>             idx=idx-1           !-1 since PETSC index starts from zero
>             if (allocated(auxRHSe))  call
> VecSetValues(bvec,ne,idx,auxRHSe,ADD_VALUES,ierr)
>             call MatSetValues(Amat,ne,idx,ne,idx,Ae,ADD_VALUES,ierr)
>       end do
>
> I found this part will impede my assembling process. In my case ,the total
> DOF is 20800. I estimated the upper bound  of number of nonzero entries in
> each row as 594.
>

Then you want to set f9 to 594


> So I set f9 to be 20206 and f6 to be 10103 in the following command:
>
 call
> MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER,
> ierr)
>
>
> Running the code in sequential mode with -info I got the following
> information.And I can get the result even thought the assembling process is
> kind of slow(several times slower than using Mumps).
>
> [0] MatAssemblyBegin_MPIBAIJ(): Stash has 0 entries,uses 0 mallocs.
>
> [0] MatAssemblyBegin_MPIBAIJ(): Block-Stash has 0 entries, uses 0 mallocs.
>
> [0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 20800 X 20800, block size 1;
> storage space: 13847 unneeded, 1030038 used
>
> [0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is
> 62659
>
> [0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 70
>
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 20800) < 0.6. Do not use CompressedRow routines.
>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472
> 140509341618528
>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472
> 140509341618528
>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472
> 140509341618528
>
> [0] VecScatterCreate_Seq(): Special case: sequential copy
>
> [0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 20800 X 0, block size 1;
> storage space: 104000 unneeded, 0 used
>

This looks like your matrix was create with 0 columns.


> [0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 0
>
> [0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 0
>
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 20800)/(num_localrows 20800) > 0.6. Use CompressedRow routines.
>
>  aseembel time   282.54871800000001
>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462888960
> 140509341442256
>
> [0] PCSetUp(): Setting up PC for first time
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> However, when I use the parallel mode,I got the following information:
>
> [0] MatAssemblyBegin_MPIBAIJ(): Stash has 707965 entries,uses 6 mallocs.
>
> [0] MatAssemblyBegin_MPIBAIJ(): Block-Stash has 0 entries, uses 0 mallocs.
>
> it seems it never went to  call
> MatAssemblyEnd(Amat,MAT_FINAL_ASSEMBLY,ierr)
>
> Is there anything I am doing wrong?
>
> Thanks
>
>
> Yaxiong Chen,
> Ph.D. Student
>
> School of Mechanical Engineering, 3171
>
> 585 Purdue Mall
>
> West Lafayette, IN 47907
>
>
>
>
>
> ------------------------------
> *From:* Mark Adams <mfadams at lbl.gov>
> *Sent:* Tuesday, January 29, 2019 10:02 PM
> *To:* Yaxiong Chen; PETSc users list
> *Subject:* Re: [petsc-users] PETSC matrix assembling super slow
>
> Optimized is a configuration flag not a versions.
>
> You need to figure out your number of non-zeros per row of you global
> matrix, or a bound on it, and supply that in MatMPIAIJSetPreallocation.
> Otherwise it has to allocate and copy memory often.
>
> You could increase your f9 on a serial run and see what runs best and
> then  move to parallel with a value in f6 of about 1/2 of f9.
>
> On Tue, Jan 29, 2019 at 9:13 PM Yaxiong Chen <chen2018 at purdue.edu> wrote:
>
> Thanks Mark,
>
>     I use PETSC 3.9.4, is this the optimized version you called?
>
>     Actually f9 and f6 are from the PETSC example. I don't know how to set
> the value correctly so I commend them. The size of my elemental matrix may
> vary. For 2D problem, the size of elemental matrix can be 24*24 or 32*32 or
> some other sizes. And the index is not continuous. In this case, the
> elemental matrix may interlace with each other in the global matrix, and I
> may have thousands of elemental matrix to be assembled. Does the
> preallocating suitable for this?
>
>
> Yaxiong Chen,
> Ph.D. Student
>
> School of Mechanical Engineering, 3171
>
> 585 Purdue Mall
>
> West Lafayette, IN 47907
>
>
>
>
>
> ------------------------------
> *From:* Mark Adams <mfadams at lbl.gov>
> *Sent:* Tuesday, January 29, 2019 8:25 PM
> *To:* Yaxiong Chen
> *Cc:* Song Gao; petsc-users at mcs.anl.gov
> *Subject:* Re: [petsc-users] PETSC matrix assembling super slow
>
> Slow assembly is often from not preallocating correctly. I am guessing
> that you are using Q1 element and f9==9, and thus the preallocation should
> be OK if this is a scalar problem on a regular grid and f6-==6 should be OK
> for the off processor allocation, if my assumptions are correct.
>
> You can run with -info, which will tell you how many allocation were done
> in assembly. Make sure that it is small (eg, 0).
>
> I see you use f90 array stuff 'idx-1'. Compilers can sometimes do crazy
> things with seeming simple code. You could just do this manually if you can
> find anything else.
>
> And I trust you are running an optimized version of PETSc.
>
>
> On Tue, Jan 29, 2019 at 6:22 PM Yaxiong Chen via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hi Song,
>
> I don't quite understand how I can use this command. I don't partition the
> gloabl matrix. If I add my elemental matrix to the global system it will be
> like this. And in my parallel part, I use each core to generate the
> elemental matrix in turn. In this case, I guess each core will be assigned
> the space for global matrix and finally be assembled.But according to the
> manual, it seems each core will store a part of the global matrix. Is the local
> submatrix in the MatMPIAIJSetPreallocation
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html#MatMPIAIJSetPreallocation>
> (Mat
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/Mat.html#Mat>
> B,PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> d_nz,const PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> d_nnz[],PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> o_nz,const PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> o_nnz[])the same as my elemental matrix?
>
>
>
> Thanks
>
>
> Yaxiong Chen,
> Ph.D. Student
>
> School of Mechanical Engineering, 3171
>
> 585 Purdue Mall
>
> West Lafayette, IN 47907
>
>
>
>
>
> ------------------------------
> *From:* Song Gao <song.gao2 at mail.mcgill.ca>
> *Sent:* Tuesday, January 29, 2019 1:22 PM
> *To:* Yaxiong Chen
> *Cc:* Matthew Knepley; petsc-users at mcs.anl.gov
> *Subject:* Re: [petsc-users] PETSC matrix assembling super slow
>
> I think you would prefer to preallocate the matrix
>
> uncomment this line
> ! call
> MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER,
> ierr)
>
>
>
> Le mar. 29 janv. 2019, à 12 h 40, Yaxiong Chen via petsc-users <
> petsc-users at mcs.anl.gov> a écrit :
>
> Hello,
>
>
> I have a 2D system which is assembled by each elemental matrix.  Ae is my
> elemental matrix, auxRHSe(:) and RHSe(:) and corresponding right hand side,
> idx is the global index. My code is as follow, however ,the assembling rate
> is super slow(Marked red in the code). I am not sure whether the assembling
> type is right or not. Since for each element, idx are not continuous
> numbers. Do you have any idea what is the better way to assemble the matrix?
>
>
> Thanks
>
>
> block
>     PetscErrorCode ierr
>      PetscMPIInt rank,nproc, mystart
>      PetscInt  nelem
>      integer,allocatable  ::info(:)
>      real(wp), allocatable :: Ae(:,:), auxRHSe(:),RHSe(:)
>      integer, allocatable      :: idx(:)
>      PetscScalar, pointer :: xx_v(:)
>
>      PC               prec
>      PetscScalar ::   val
>      Vec              xvec,bvec,uvec
>      Mat              Amat
>      KSP              ksp
>      PetscViewer viewer
>      PetscInt  geq,j,k,ne,M,Istart,Iend
>      PetscBool  flg
>      KSPConvergedReason reason
>      Vec         dummyVec, dummyVecs(1)
>      MatNullSpace nullspace
>
>      call PetscInitialize( PETSC_NULL_CHARACTER, ierr )
>
>      if (ierr .ne. 0) then
>      print*,'Unable to initialize PETSc'
>      stop
>      endif
>      call MPI_Comm_size(PETSC_COMM_WORLD,nproc,ierr)
>      call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
>      mystart=rank+1
>
>  !   Parameter set
>      info=ptsystem%getInitInfo()
>      nelem =
> info(Info_EleMatNum)+info(Info_FixedDOFNum)+info(Info_NumConstrain)
>      print*,'nelem',nelem
>  !-------------------------------------
>  !  Create Matrix
>      call MatCreate(PETSC_COMM_WORLD,Amat,ierr)
>      call MatSetSizes( Amat,PETSC_DECIDE, PETSC_DECIDE, info(1), info(1),
> ierr )
>      call MatSetType( Amat, MATMPIBAIJ, ierr )
>      ! call
> MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER,
> ierr)
>      call MatSetFromOptions( Amat, ierr )
>      call MatSetUp( Amat, ierr )
>      call MatGetOwnershipRange( Amat, Istart, Iend, ierr )
>
>       xvec = tVec(0)
>       call MatCreateVecs( Amat, PETSC_NULL_VEC, xvec, ierr )
>       call VecSetFromOptions( xvec, ierr )
>       call VecDuplicate( xvec, bvec, ierr )
>       call VecDuplicate( xvec, uvec, ierr )
>
>    t1 = MPI_WTime();
>
>       do i=mystart,nelem,nproc
>             call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)
>             ne=size(idx)
>             if (allocated(auxRHSe))  call
> VecSetValues(bvec,ne,idx-1,auxRHSe,ADD_VALUES,ierr)
>              call MatSetValues(Amat,ne,idx-1,ne,idx-1,Ae,ADD_VALUES,ierr)
>       end do
>
>       nelem   = info(Info_EleRHSNum)
>       mystart = rank+1
>
>       do i = mystart, nelem, nproc
>         call ptSystem%getElementalRHS(i, RHSe, idx)
>         print*,'idx',idx
>         ne=size(idx)
>         if (allocated(RHSe))  call
> VecSetValues(bvec,ne,idx-1,RHSe,ADD_VALUES,ierr)
>       end do
>       call MatAssemblyBegin(Amat,MAT_FINAL_ASSEMBLY,ierr)
>       call MatAssemblyEnd(Amat,MAT_FINAL_ASSEMBLY,ierr)
>       !  this part is slow, the for loop above is done but here it may get
> stuck
>       call VecAssemblyBegin(bvec,ierr)
>                                 ! For a 2500 DOF system, assembling only
> takes over 2 seconds
>       call VecAssemblyEnd(bvec,ierr)
>                                 ! But for a 10000 DOF system , it gets stuck
>      t2 = MPI_WTime();
>       print*,'assembeling time',t2-t1
>       ! Solve
>       call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
> !  Set operators. Here the matrix that defines the linear system
> !  also serves as the preconditioning matrix.
>       call KSPSetOperators(ksp,Amat,Amat,ierr)
>       call KSPSetFromOptions(ksp,ierr)
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !                      Solve the linear system
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>     call KSPSetType(ksp,KSPCG,ierr)
>     call KSPGetPC(ksp,prec,ierr)
>    ! call KSPSetPCSide(ksp,PC_SYMMETRIC,ierr)
>     call PCSetType(prec,PCJACOBI,ierr)
>      call KSPSolve(ksp,bvec,xvec,ierr)
>      call PetscFinalize(ierr)
>
> end block
>
>
> Yaxiong Chen,
> Ph.D. Student
>
> School of Mechanical Engineering, 3171
>
> 585 Purdue Mall
>
> West Lafayette, IN 47907
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190205/52de4ff3/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pastedImage.png
Type: image/png
Size: 3840 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190205/52de4ff3/attachment-0001.png>


More information about the petsc-users mailing list