[petsc-users] Matrix assembly too slow

Matthew Knepley knepley at gmail.com
Sat Apr 13 14:23:59 CDT 2013


On Sat, Apr 13, 2013 at 1:59 PM, Kassiopi Kassiopi2 <kassiopik at gmail.com>wrote:

> Hello,
>
> I am trying to use PETSc in my code. My numerical scheme is BEM and
> requires a dense matrix. I use the mpidense matrix type, and each matrix
> entry is populated incrementally. This results in many calls to
> matSetValue, for every entry of the matrix. However, I don't need to get
> values from the matrix, until all the calculations are done. Moreover, when
> the matrix is created I use PETSC_DECIDE for the local rows and columns and
> I also do preallocation using MatMPIDenseSetPreallocation.
>
> Each process writes to specific rows of the matrix, and assuming that
> mpidense matrices are distributed row-wise to the processes, each process
> should write more or less to its own rows. Moreover, to avoid a bottleneck
> at the final matrix assembly, I do a matAssemblyBegin-End with
> MAT_FLUSH_ASSEMBLY, every time the stash size reaches a critical value (of
> 1 million).
>

How many off-process values are you writing? This seems like tremendous
overkill.


> However, when all operations are done and matAssemblyBegin-End is called
> with MAT_FINAL_ASSEMBLY the whole program gets stuck there. It doesn't
> crash, but it doesn't gets through the assembly either. When to do 'top',
> the processes seem to be in sleep status. I have tried waiting for many
> hours, but without any development. Even though the remaining items in the
> stash are less than 1million, which had an acceptable time cost for
> MAT_FLUSH_ASSEMBLY, it seems as if MAT_FINAL_ASSEMBLY just cannot deal with
> it. I would expect that this would take a few seconds but definitely not
> hours...
>

This sounds like it goes to virtual (disk) memory for the transfer, which
would explain why it does not happen for smaller sizes.
Flush the assembly more frequently.


> The matrix dimensions are 28356 x 28356. For smaller problems, i.e. ~9000
> rows and colums there is no significant delay.
>
> My questions are the following:
>
> 1) I know that the general advice is to fill the matrix in large blocks,
> but I am trying to avoid it for now. I would expect that doing
> matAssemblyBegin-End with MAT_FLUSH_ASSEMBLY every now and then, would
> reduce the load during the final assembly. Is my assumption wrong?
>

It is not often enough.


> 2) How is MatAssemblyBegin-End different when called with
> MAT_FINAL_ASSEMBLY instead of MAT_FLUSH_ASSEMBLY?
>

Lots of things are setup for sparse matrices, but it should not be
different for MPIDENSE.


> 3) If this is the expected behavior, and it takes so long for a 28000 x
> 28000 linear system, it would be impossible to scale up to millions of
> dofs. It seems hard to believe that the cost communicating the matrix with
> matAssemblyBegin-End is much bigger or even comparable to the cost of
> actually calculating the values with numerical integration.
>

That intuition is exactly wrong. On modern hardware, you can do 1000
floating point operations for each memory reference.


> 4) Unfortunately I am not experienced in debugging parallel programs. Is
> there a way to see if the processes are blocked waiting for each other?
>

gdb should be easy to use. Run with -start_in_debugger, then C-c when it
seems to hang and type 'where' to get the stack trace.

   Matt


> I apologize for the long email and thank you for taking the time reading
> it.
>
> Best Regards,
> Kassiopik
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130413/99095a08/attachment.html>


More information about the petsc-users mailing list