# Slow MatSetValues

Barry Smith bsmith at mcs.anl.gov
Thu May 29 16:49:42 CDT 2008

```   Partition the elements across the processes,

then partition the nodes across processes (try to make sure that
each node is on the same process of at least one of its elements),

create
1) three parallel vectors with the number of local owned nodes
on each process
call these vectors off and on and owner; fill the on vector
with a 1 in each location, fill the vector owner with rank for each
element
2) three sequential vectors on each process with the total
number of nodes of all the elements of that process (this is the
locally owned plus ghosted nodes)
call these vectors ghostedoff and ghostedon and ghostedowner
3) a VecScatter from the "locally owned plus ghosted nodes" to
the "local owned nodes"
[you need these anyways for the numerical part of the code when
you evaluate your nonlinear functions (or right hand side for linear
problems)

scatter the owner vector to the ghostedowner vector
now on each process loop over the locally owned ELEMENTS
for each node1 in that element
for each node2 in that element (excluding the node1 in the
outer loop)
if node1 and node2 share an edge (face in 3d) and
that edge (face in 3d) is not a boundary edge (face in 3d)  set t = .5
(this prevents double counting of these couplings)
else set t = 1.0
if node1 and node2 are both owned by the same
process** addt t into ghostedon at both the node1 location and the
node2 location
if node1 and node2 are owned by different processes
add t into ghostedoff at both the node1 and node2 location

Do a VecScatter add from the ghostedoff and ghostedon into the off
and on.

The off and on now contain exactly the preallocation need for each
processes preallocation.

The amount of work required is proportional to the number of
elements times the (number of nodes on an element)^2, the amount of
memory
needed is roughly three global vectors and three local vectors.
This is much less work and memory then needed in the numerical part of
the
code hence is very efficient. In fact it is likely much cheaper
than a single nonlinear function evaluation.

Barry

** two nodes are owned by the same process if ghostedowner of node1
matches ghostedowner of node2

On May 29, 2008, at 3:50 PM, Billy Araújo wrote:

>
> Hi,
>
> I just want to share my experience with FE assembly.
> I think the problem of preallocation in finite element matrices is
> that you don't know how many elements are connected to a given node,
> there can be 5, 20 elements or more. You can build a structure with
> the number of nodes connected to a node and then preallocate the
> matrix but this is not very efficient.
>
> I know UMFPACK has a method of forming triplets with the matrix
> information and then it has routines to add duplicate entries and
> compress the data in a compressed matrix format. Although I have
> never used UMFPACK with PETSC. I also don't know if there are
> similiar functions in PETSC optimized for FE matrix assembly.
>
> Regards,
>
> Billy.
>
>
>
> -----Mensagem original-----
> De: owner-petsc-users at mcs.anl.gov em nome de Barry Smith
> Para: petsc-users at mcs.anl.gov
> Assunto: Re: Slow MatSetValues
>
>
> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manual.pdf#sec_matsparse
> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html
>
> Also, slightl less important,  collapse the 4 MatSetValues() below
> into a single call that does the little two by two block
>
>     Barry
>
> On May 28, 2008, at 9:07 AM, Lars Rindorf wrote:
>
> > Hi everybody
> >
> > I have a problem with MatSetValues, since the building of my matrix
> > takes much longer (35 s) than its solution (0.2 s). When the number
> > of degrees of freedom is increased, then the problem worsens. The
> > rate of which the elements of the (sparse) matrix is set also seems
> > to decrease with the number of elements already set. That is, it
> > becomes slower near the end.
> >
> > The structure of my program is something like:
> >
> > for element in finite elements
> >     for dof in element
> >         for equations in FEM formulation
> >
> >
> > where i,j,k,l are appropriate integers and tmp is a double value to
> >
> > The code has fine worked with previous version of petsc (not
> > compiled by me). The version of petsc that I use is slightly newer
> > (I think), 2.3.3 vs ~2.3.
> >
> > Is it something of an dynamic allocation problem? I have tried using
> > MatSetValuesBlock, but this is only slightly faster. If I monitor
> > the program's CPU and memory consumption then the CPU is 100 % used
> > and the memory consumption is only 20-30 mb.
> >
> > My computer is a red hat linux with a xeon quad core processor. I
> > use Intel's MKL blas and lapack.
> >
> > What should I do to speed up the petsc?
> >
> > Kind regards
> > Lars
> > _____________________________
> >
> >
> > Lars Rindorf
> > M.Sc., Ph.D.
> >
> > http://www.dti.dk
> >
> > Danish Technological Institute
> > Gregersensvej
> >
> > 2630 Taastrup
> >
> > Denmark
> > Phone +45 72 20 20 00
> >
> >
>
>

```