[petsc-users] Matrix assembly too slow

Sat Apr 13 13:59:02 CDT 2013

Hello,

I am trying to use PETSc in my code. My numerical scheme is BEM and
requires a dense matrix. I use the mpidense matrix type, and each matrix
entry is populated incrementally. This results in many calls to
matSetValue, for every entry of the matrix. However, I don't need to get
values from the matrix, until all the calculations are done. Moreover, when
the matrix is created I use PETSC_DECIDE for the local rows and columns and
I also do preallocation using MatMPIDenseSetPreallocation.

Each process writes to specific rows of the matrix, and assuming that
mpidense matrices are distributed row-wise to the processes, each process
should write more or less to its own rows. Moreover, to avoid a bottleneck
at the final matrix assembly, I do a matAssemblyBegin-End with
MAT_FLUSH_ASSEMBLY, every time the stash size reaches a critical value (of
1 million).

However, when all operations are done and matAssemblyBegin-End is called
with MAT_FINAL_ASSEMBLY the whole program gets stuck there. It doesn't
crash, but it doesn't gets through the assembly either. When to do 'top',
the processes seem to be in sleep status. I have tried waiting for many
hours, but without any development. Even though the remaining items in the
stash are less than 1million, which had an acceptable time cost for
MAT_FLUSH_ASSEMBLY, it seems as if MAT_FINAL_ASSEMBLY just cannot deal with
it. I would expect that this would take a few seconds but definitely not
hours...

The matrix dimensions are 28356 x 28356. For smaller problems, i.e. ~9000
rows and colums there is no significant delay.

My questions are the following:

1) I know that the general advice is to fill the matrix in large blocks,
but I am trying to avoid it for now. I would expect that doing
matAssemblyBegin-End with MAT_FLUSH_ASSEMBLY every now and then, would
reduce the load during the final assembly. Is my assumption wrong?

2) How is MatAssemblyBegin-End different when called with
MAT_FINAL_ASSEMBLY instead of MAT_FLUSH_ASSEMBLY?

3) If this is the expected behavior, and it takes so long for a 28000 x
28000 linear system, it would be impossible to scale up to millions of
dofs. It seems hard to believe that the cost communicating the matrix with
matAssemblyBegin-End is much bigger or even comparable to the cost of
actually calculating the values with numerical integration.

4) Unfortunately I am not experienced in debugging parallel programs. Is
there a way to see if the processes are blocked waiting for each other?

I apologize for the long email and thank you for taking the time reading it.

Best Regards,
Kassiopik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130413/787f21e4/attachment.html>