[petsc-users] MatMult

Wed May 30 02:23:43 CDT 2012

Sorry for forgetting -log_summary. Attached are log_summary for 1 and 2 processors, for both a problem with about 1000 unknowns and one with 125000 unknowns. The summary is for a run of the entire code, which involves many MatMults. I hope this still provides insight on what is going on.
As you can see there is an extraordinary use of MatGetRow - I am working to change this - but they should not influence speed of the MatMults. Any thoughts?

Benjamin

----- Original Message -----
From: "Jed Brown" <jedbrown at mcs.anl.gov>
To: "PETSc users list" <petsc-users at mcs.anl.gov>
Sent: Tuesday, May 29, 2012 5:56:51 PM
Subject: Re: [petsc-users] MatMult

On Tue, May 29, 2012 at 10:52 AM, Benjamin Sanderse <B.Sanderse at cwi.nl>wrote:

> Hello all,
>
> I have a simple question about using MatMult (or MatMultAdd) in parallel.
>
> I am performing the matrix-vector multiplication
>
> z = A*x + y
>
> in my code by using
>
> call MatMultAdd(A,x,y,z,ierr); CHKERRQ(ierr)
>
> A is a sparse matrix, type MPIAIJ, and x, y, and z have been obtained using
>
> call MatGetVecs(A,x,y,ierr); CHKERRQ(ierr)
> call MatGetVecs(A,PETSC_NULL_OBJECT,z,ierr); CHKERRQ(ierr)
>
> x, y, and z are vecs of type mpi.
>
> The problem is that in the sequential case the MatMultAdd is MUCH faster
> than in the parallel case (at least a factor 100 difference).
>

1. Send output of -log_summary

2. This matrix is tiny (1000x1000) and very sparse (at most 2 nonzeros per
row) so you should not expect speedup from running in parallel.

>
> As an example, here is the output with some properties of A when using
> -mat_view_info and -info:
>
> 2 processors:
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374781
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374780
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374782
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374782
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374780
> [0] MatStashScatterBegin_Private(): No of messages: 0
> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 450; storage space: 100
> unneeded,900 used
> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 450; storage space: 100
> unneeded,900 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2
> [0] Mat_CheckInode(): Found 500 nodes out of 500 rows. Not using Inode
> routines
> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2
> [1] Mat_CheckInode(): Found 500 nodes out of 500 rows. Not using Inode
> routines
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374780
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374782
> [0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374780
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374782
> [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
> [0] VecScatterCreate(): General case: MPI to Seq
> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 0; storage space: 0
> unneeded,0 used
> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 0; storage space: 0
> unneeded,0 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0
> Matrix Object: 2 MPI processes
>  type: mpiaij
>  rows=1000, cols=900
>  total: nonzeros=1800, allocated nonzeros=2000
>  total number of mallocs used during MatSetValues calls =0
>    not using I-node (on process 0) routines
>
> 1 processor:
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374783
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1000 X 900; storage space: 200
> unneeded,1800 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2
> [0] Mat_CheckInode(): Found 1000 nodes out of 1000 rows. Not using Inode
> routines
> Matrix Object: 1 MPI processes
>  type: seqaij
>  rows=1000, cols=900
>  total: nonzeros=1800, allocated nonzeros=2000
>  total number of mallocs used during MatSetValues calls =0
>    not using I-node routines
>
> When I look at the partitioning of the vectors, I have the following for
> the parallel case:
> x:
>          0         450
>         450         900
> y:
>           0         500
>         500        1000
> z:
>           0         500
>         500        1000
>
> This seems OK to me.
>
> Certainly I am missing something in performing this matrix-vector
> multiplication efficiently. Any ideas?
>
> Best regards,
>
> Benjamin
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: out_large_n1
Type: application/octet-stream
Size: 11031 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120530/d3a76f7f/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: out_large_n2
Type: application/octet-stream
Size: 11908 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120530/d3a76f7f/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: out_small_n1
Type: application/octet-stream
Size: 11030 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120530/d3a76f7f/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: out_small_n2
Type: application/octet-stream
Size: 11908 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120530/d3a76f7f/attachment-0007.obj>