Time for MatAssembly

Tue May 19 11:51:13 CDT 2009

On Tue, 19 May 2009, Satish Balay wrote:

> On Tue, 19 May 2009, tribur at vision.ee.ethz.ch wrote:
> 
> > Distinguished PETSc experts,
> > 
> > Assuming processor k has defined N entries of a parallel matrix using
> > MatSetValues. The half of the entries are in matrix rows belonging to this
> > processor, but the other half are situated in rows of other processors.
> > 
> > My question:
> > 
> > When does MatAssemblyBegin+MatAssemblyEnd take longer, if the rows where the
> > second half of the entries are situated belong all to one single other
> > processor, e.g. processor k+1, or if these rows are distributed across
> > several, let's say 4, other processors? Is there a significant difference?
> 
> Obviously there will be a difference. But it will depend upon the
> network/MPI behavior.
> 
> A single large one-to-one message vs multiple small all-to-all messages.
> 
> Wrt PETSc part - you might have to make sure enough memory is
> allocated for these buffers. If the default is small - then there
> could be multiple malloc/copies that could slow things down.
> 
> Run with '-info' and look for "stash". The number of mallocs here
> should be 0 for efficient matrix assembly [The stash size can be
> changed with a command line option -matstash_initial_size]

Another note: If you have lot of data movement during matassembly -
you can do a MatAssemblyBegin/End(MAT_FLUSH_ASSEMBLY) - to flush out
the currently accumulated off-proc-data - and continue with more
MatSetValues().

It might help on some network/mpi types [we don't know for sure..]..

Satish