<div dir="ltr">Hello, <div><br></div><div style>I am trying to use PETSc in my code. My numerical scheme is BEM and requires a dense matrix. I use the mpidense matrix type, and each matrix entry is populated incrementally. This results in many calls to matSetValue, for every entry of the matrix. However, I don't need to get values from the matrix, until all the calculations are done. Moreover, when the matrix is created I use PETSC_DECIDE for the local rows and columns and I also do preallocation using MatMPIDenseSetPreallocation.</div>

<div style><br></div><div style>Each process writes to specific rows of the matrix, and assuming that mpidense matrices are distributed row-wise to the processes, each process should write more or less to its own rows. Moreover, to avoid a bottleneck at the final matrix assembly, I do a matAssemblyBegin-End with MAT_FLUSH_ASSEMBLY, every time the stash size reaches a critical value (of 1 million).</div>

<div style><br></div><div style>However, when all operations are done and matAssemblyBegin-End is called with MAT_FINAL_ASSEMBLY the whole program gets stuck there. It doesn't crash, but it doesn't gets through the assembly either. When to do 'top', the processes seem to be in sleep status. I have tried waiting for many hours, but without any development. Even though the remaining items in the stash are less than 1million, which had an acceptable time cost for MAT_FLUSH_ASSEMBLY, it seems as if MAT_FINAL_ASSEMBLY just cannot deal with it. I would expect that this would take a few seconds but definitely not hours...</div>

<div style><br></div><div style>The matrix dimensions are 28356 x 28356. For smaller problems, i.e. ~9000 rows and colums there is no significant delay.</div><div style><br></div><div style>My questions are the following:</div>

<div style><br></div><div style>1) I know that the general advice is to fill the matrix in large blocks, but I am trying to avoid it for now. I would expect that doing matAssemblyBegin-End with MAT_FLUSH_ASSEMBLY every now and then, would reduce the load during the final assembly. Is my assumption wrong? </div>

<div style><br>2) How is MatAssemblyBegin-End different when called with MAT_FINAL_ASSEMBLY instead of MAT_FLUSH_ASSEMBLY?</div><div style><br></div><div style>3) If this is the expected behavior, and it takes so long for a 28000 x 28000 linear system, it would be impossible to scale up to millions of dofs. It seems hard to believe that the cost communicating the matrix with matAssemblyBegin-End is much bigger or even comparable to the cost of actually calculating the values with numerical integration. <br>

</div><div style><br></div><div style>4) Unfortunately I am not experienced in debugging parallel programs. Is there a way to see if the processes are blocked waiting for each other? </div><div style><br></div><div style>

I apologize for the long email and thank you for taking the time reading it.</div><div style><br></div><div style>Best Regards,</div><div style>Kassiopik</div></div>