Slow MatSetValues

Barry Smith bsmith at
Thu May 29 16:49:42 CDT 2008

   Partition the elements across the processes,

    then partition the nodes across processes (try to make sure that  
each node is on the same process of at least one of its elements),

       1) three parallel vectors with the number of local owned nodes  
on each process
           call these vectors off and on and owner; fill the on vector  
with a 1 in each location, fill the vector owner with rank for each  
       2) three sequential vectors on each process with the total  
number of nodes of all the elements of that process (this is the  
locally owned plus ghosted nodes)
            call these vectors ghostedoff and ghostedon and ghostedowner
       3) a VecScatter from the "locally owned plus ghosted nodes" to  
the "local owned nodes"
       [you need these anyways for the numerical part of the code when  
you evaluate your nonlinear functions (or right hand side for linear  

   scatter the owner vector to the ghostedowner vector
   now on each process loop over the locally owned ELEMENTS
       for each node1 in that element
            for each node2 in that element (excluding the node1 in the  
outer loop)
                  if node1 and node2 share an edge (face in 3d) and  
that edge (face in 3d) is not a boundary edge (face in 3d)  set t = .5  
(this prevents double counting of these couplings)
                  else set t = 1.0
                  if node1 and node2 are both owned by the same  
process** addt t into ghostedon at both the node1 location and the  
node2 location
                  if node1 and node2 are owned by different processes  
add t into ghostedoff at both the node1 and node2 location

    Do a VecScatter add from the ghostedoff and ghostedon into the off  
and on.

    The off and on now contain exactly the preallocation need for each  
processes preallocation.

    The amount of work required is proportional to the number of  
elements times the (number of nodes on an element)^2, the amount of  
    needed is roughly three global vectors and three local vectors.  
This is much less work and memory then needed in the numerical part of  
    code hence is very efficient. In fact it is likely much cheaper  
than a single nonlinear function evaluation.


** two nodes are owned by the same process if ghostedowner of node1  
matches ghostedowner of node2

On May 29, 2008, at 3:50 PM, Billy Araújo wrote:

> Hi,
> I just want to share my experience with FE assembly.
> I think the problem of preallocation in finite element matrices is  
> that you don't know how many elements are connected to a given node,  
> there can be 5, 20 elements or more. You can build a structure with  
> the number of nodes connected to a node and then preallocate the  
> matrix but this is not very efficient.
> I know UMFPACK has a method of forming triplets with the matrix  
> information and then it has routines to add duplicate entries and  
> compress the data in a compressed matrix format. Although I have  
> never used UMFPACK with PETSC. I also don't know if there are  
> similiar functions in PETSC optimized for FE matrix assembly.
> Regards,
> Billy.
> -----Mensagem original-----
> De: owner-petsc-users at em nome de Barry Smith
> Enviada: qua 28-05-2008 16:03
> Para: petsc-users at
> Assunto: Re: Slow MatSetValues
> Also, slightl less important,  collapse the 4 MatSetValues() below
> into a single call that does the little two by two block
>     Barry
> On May 28, 2008, at 9:07 AM, Lars Rindorf wrote:
> > Hi everybody
> >
> > I have a problem with MatSetValues, since the building of my matrix
> > takes much longer (35 s) than its solution (0.2 s). When the number
> > of degrees of freedom is increased, then the problem worsens. The
> > rate of which the elements of the (sparse) matrix is set also seems
> > to decrease with the number of elements already set. That is, it
> > becomes slower near the end.
> >
> > The structure of my program is something like:
> >
> > for element in finite elements
> >     for dof in element
> >         for equations in FEM formulation
> >             ierr = MatSetValues(M->M,1,&i,1,&j,&tmp,ADD_Values);
> >             ierr = MatSetValues(M->M,1,&k,1,&l,&tmp,ADD_Values);
> >             ierr = MatSetValues(M->M,1,&i,1,&l,&tmp,ADD_Values);
> >             ierr = MatSetValues(M->M,1,&k,1,&j,&tmp,ADD_Values);
> >
> >
> > where i,j,k,l are appropriate integers and tmp is a double value to
> > be added.
> >
> > The code has fine worked with previous version of petsc (not
> > compiled by me). The version of petsc that I use is slightly newer
> > (I think), 2.3.3 vs ~2.3.
> >
> > Is it something of an dynamic allocation problem? I have tried using
> > MatSetValuesBlock, but this is only slightly faster. If I monitor
> > the program's CPU and memory consumption then the CPU is 100 % used
> > and the memory consumption is only 20-30 mb.
> >
> > My computer is a red hat linux with a xeon quad core processor. I
> > use Intel's MKL blas and lapack.
> >
> > What should I do to speed up the petsc?
> >
> > Kind regards
> > Lars
> > _____________________________
> >
> >
> > Lars Rindorf
> > M.Sc., Ph.D.
> >
> >
> >
> > Danish Technological Institute
> > Gregersensvej
> >
> > 2630 Taastrup
> >
> > Denmark
> > Phone +45 72 20 20 00
> >
> >

More information about the petsc-users mailing list