[petsc-users] Preallocation (Unstructured FE)

Sun May 1 02:40:48 CDT 2011

I guess you are using AIJ matrix.
The inexact preallocation is a killer to performance.
Since the preallcation of AIJ(or CRS) format is organized by row.
Any row has an actual column entries exceeded the preallocated number of this row,
will cause the whole matrix rebuild!

I also suffered much by the this preallocation mechanism.
As a result, I am developing a FAIJ matrix. It uses hash table to hold the matrix entry
which do not has its preallocated location. And the exact matrix nonzero pattern will be calculated
at MatAssemblyEnd, then insert those entries in hash table to their right location. 

Now the serial version tested ok. But parallel version still buggy.
I will commit it to petsc group when I finish it.

> 
> 
> I having some performance issues with preallocation in a fully 
> 
> unstructured FE code. It would be very helpful if those using FE codes 
> 
> can comment.
> 
> 
> 
> For a problem of size 100K nodes and 600K tet elements (on 1 cpu)
> 
> 
> 
> 1. If I calculate the _exact_ number of non-zeros per row (using a 
> 
> running list in Fortran) by looping over nodes & elements, the code 
> 
> takes 17 mins (to calculate nnz's/per row, assemble and solve).
> 
> 2. If I dont use a running list and simply get the average of the max 
> 
> number of nodes a node might be connected to (again by looping over 
> 
> nodes & elements but not using a running list) then it takes 8 mins
> 
> 3. If I just magically guess the right value calculated in 2 and use 
> 
> that as average nnz per row then it only takes 25 secs.
> 
> 
> 
> Basically in all cases Assembly and Solve are very fast (few seconds) 
> 
> but the nnz calculation itself (in 2 and 3) takes a long time. How can 
> 
> this be cut down? Is there a heuristic way to estimate the number (as 
> 
> done in 3) even if it slightly overestimates the nnz's per row or are 
> 
> efficient ways to do step 1 or 2. Right now I have do i=1,num_nodes; do 
> 
> j=1,num_elements ... which obviously is slow for large number of 
> 
> nodes/elements.
> 
> 
> 
> Thanks in advance
> 
> Tabrez
> 
>