[petsc-users] MatMult
Benjamin Sanderse
B.Sanderse at cwi.nl
Tue May 29 10:52:15 CDT 2012
Hello all,
I have a simple question about using MatMult (or MatMultAdd) in parallel.
I am performing the matrix-vector multiplication
z = A*x + y
in my code by using
call MatMultAdd(A,x,y,z,ierr); CHKERRQ(ierr)
A is a sparse matrix, type MPIAIJ, and x, y, and z have been obtained using
call MatGetVecs(A,x,y,ierr); CHKERRQ(ierr)
call MatGetVecs(A,PETSC_NULL_OBJECT,z,ierr); CHKERRQ(ierr)
x, y, and z are vecs of type mpi.
The problem is that in the sequential case the MatMultAdd is MUCH faster than in the parallel case (at least a factor 100 difference).
As an example, here is the output with some properties of A when using -mat_view_info and -info:
2 processors:
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374781
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780
[0] MatStashScatterBegin_Private(): No of messages: 0
[1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 450; storage space: 100 unneeded,900 used
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 450; storage space: 100 unneeded,900 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2
[0] Mat_CheckInode(): Found 500 nodes out of 500 rows. Not using Inode routines
[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2
[1] Mat_CheckInode(): Found 500 nodes out of 500 rows. Not using Inode routines
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782
[0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782
[0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
[0] VecScatterCreate(): General case: MPI to Seq
[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 0; storage space: 0 unneeded,0 used
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 0; storage space: 0 unneeded,0 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0
Matrix Object: 2 MPI processes
type: mpiaij
rows=1000, cols=900
total: nonzeros=1800, allocated nonzeros=2000
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
1 processor:
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374783
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1000 X 900; storage space: 200 unneeded,1800 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2
[0] Mat_CheckInode(): Found 1000 nodes out of 1000 rows. Not using Inode routines
Matrix Object: 1 MPI processes
type: seqaij
rows=1000, cols=900
total: nonzeros=1800, allocated nonzeros=2000
total number of mallocs used during MatSetValues calls =0
not using I-node routines
When I look at the partitioning of the vectors, I have the following for the parallel case:
x:
0 450
450 900
y:
0 500
500 1000
z:
0 500
500 1000
This seems OK to me.
Certainly I am missing something in performing this matrix-vector multiplication efficiently. Any ideas?
Best regards,
Benjamin
More information about the petsc-users
mailing list