[petsc-users] MatMult

Matthew Knepley knepley at gmail.com
Tue May 29 10:56:09 CDT 2012


On Tue, May 29, 2012 at 3:52 PM, Benjamin Sanderse <B.Sanderse at cwi.nl>wrote:

> Hello all,
>
> I have a simple question about using MatMult (or MatMultAdd) in parallel.
>
> I am performing the matrix-vector multiplication
>
> z = A*x + y
>
> in my code by using
>
> call MatMultAdd(A,x,y,z,ierr); CHKERRQ(ierr)
>
> A is a sparse matrix, type MPIAIJ, and x, y, and z have been obtained using
>
> call MatGetVecs(A,x,y,ierr); CHKERRQ(ierr)
> call MatGetVecs(A,PETSC_NULL_OBJECT,z,ierr); CHKERRQ(ierr)
>
> x, y, and z are vecs of type mpi.
>
> The problem is that in the sequential case the MatMultAdd is MUCH faster
> than in the parallel case (at least a factor 100 difference).
>

With any performance question, always always always send the output of
-log_summary to petsc-maint at mcs.anl.gov.

   Matt


> As an example, here is the output with some properties of A when using
> -mat_view_info and -info:
>
> 2 processors:
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374781
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374780
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374782
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374782
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374780
> [0] MatStashScatterBegin_Private(): No of messages: 0
> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 450; storage space: 100
> unneeded,900 used
> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 450; storage space: 100
> unneeded,900 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2
> [0] Mat_CheckInode(): Found 500 nodes out of 500 rows. Not using Inode
> routines
> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2
> [1] Mat_CheckInode(): Found 500 nodes out of 500 rows. Not using Inode
> routines
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374780
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374782
> [0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374780
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374782
> [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
> [0] VecScatterCreate(): General case: MPI to Seq
> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 0; storage space: 0
> unneeded,0 used
> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 0; storage space: 0
> unneeded,0 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0
> Matrix Object: 2 MPI processes
>  type: mpiaij
>  rows=1000, cols=900
>  total: nonzeros=1800, allocated nonzeros=2000
>  total number of mallocs used during MatSetValues calls =0
>    not using I-node (on process 0) routines
>
> 1 processor:
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374783
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1000 X 900; storage space: 200
> unneeded,1800 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2
> [0] Mat_CheckInode(): Found 1000 nodes out of 1000 rows. Not using Inode
> routines
> Matrix Object: 1 MPI processes
>  type: seqaij
>  rows=1000, cols=900
>  total: nonzeros=1800, allocated nonzeros=2000
>  total number of mallocs used during MatSetValues calls =0
>    not using I-node routines
>
> When I look at the partitioning of the vectors, I have the following for
> the parallel case:
> x:
>          0         450
>         450         900
> y:
>           0         500
>         500        1000
> z:
>           0         500
>         500        1000
>
> This seems OK to me.
>
> Certainly I am missing something in performing this matrix-vector
> multiplication efficiently. Any ideas?
>
> Best regards,
>
> Benjamin
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120529/2796f43c/attachment.html>


More information about the petsc-users mailing list