[petsc-users] generate entries on 'wrong' process

Satish Balay balay at mcs.anl.gov
Wed Jan 18 11:07:21 CST 2012


You can do 2 things.

1. allocate sufficient stash space to avoid mallocs.
You can do this with the following runtime command line options
-vecstash_initial_size 
-matstash_initial_size 

2. flush stashed values in stages instead of doing a single
large communication at the end.

<add values to matrix>
MatAssemblyBegin/End(MAT_FLUSH_ASSEMBLY)
<add values to matrix>
MatAssemblyBegin/End(MAT_FLUSH_ASSEMBLY)
...
...

<add values to matrix>
MatAssemblyBegin/End(MAT_FINAL_ASSEMBLY)

Satish


On Wed, 18 Jan 2012, Wen Jiang wrote:

> Hi,
> 
> I am working on FEM codes with spline-based element type. For 3D case, one
> element has 64 nodes and every two neighboring elements share 48 nodes.
> Thus regardless how I partition a mesh,  there are still very large number
> of entries that have to write on the 'wrong' processor. And my code is
> running on clusters, the processes are sending between 550 and 620 Million
> packets per second across the network. My code seems IO-bound at this
> moment and  just get stuck at the matrix assembly stage. A -info file is
> attached. Do I have other options to optimize my codes to be less
> io-intensive?
> 
> Thanks in advance.
> 
> [0] VecAssemblyBegin_MPI(): Stash has 210720 entries, uses 12 mallocs.
> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
> [5] MatAssemblyBegin_MPIAIJ(): Stash has 4806656 entries, uses 8 mallocs.
> [6] MatAssemblyBegin_MPIAIJ(): Stash has 5727744 entries, uses 9 mallocs.
> [4] MatAssemblyBegin_MPIAIJ(): Stash has 5964288 entries, uses 9 mallocs.
> [7] MatAssemblyBegin_MPIAIJ(): Stash has 7408128 entries, uses 9 mallocs.
> [3] MatAssemblyBegin_MPIAIJ(): Stash has 8123904 entries, uses 9 mallocs.
> [2] MatAssemblyBegin_MPIAIJ(): Stash has 11544576 entries, uses 10 mallocs.
> [0] MatStashScatterBegin_Private(): No of messages: 1
> [0] MatStashScatterBegin_Private(): Mesg_to: 1: size: 107888648
> [0] MatAssemblyBegin_MPIAIJ(): Stash has 13486080 entries, uses 10 mallocs.
> [1] MatAssemblyBegin_MPIAIJ(): Stash has 16386048 entries, uses 10 mallocs.
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0
> unneeded,2514537 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [0] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode
> routines
> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0
> unneeded,2525390 used
> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [5] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using Inode
> routines
> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0
> unneeded,2500281 used
> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [3] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode
> routines
> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0
> unneeded,2500281 used
> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [1] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode
> routines
> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0
> unneeded,2500281 used
> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [4] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode
> routines
> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0
> unneeded,2525733 used
> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [2] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode
> routines
> 



More information about the petsc-users mailing list