[petsc-users] MatAssemblyEnd taking too long
Manav Bhatia
bhatiamanav at gmail.com
Thu Aug 20 08:54:37 CDT 2020
Stefano,
I will report the results to these shortly. To your second question, this is the first matrix assembly of the code.
-Manav
> On Aug 20, 2020, at 8:31 AM, Stefano Zampini <stefano.zampini at gmail.com> wrote:
>
> Manav
>
> Can you add a MPI_Barrier before
>
> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr);
>
> Also, in order to assess where the issue is, we need to see the values (per rank) of
>
> ((Mat_SeqAIJ*)aij->B->data)->nonew
> mat->was_assembled
> aij->donotstash
> mat->nooffprocentries
>
> Another question: is this the first matrix assembly of the code?
> If you change to pc_none, do you get the same issue?
>
>> On Aug 20, 2020, at 3:10 PM, Manav Bhatia <bhatiamanav at gmail.com <mailto:bhatiamanav at gmail.com>> wrote:
>>
>>
>>
>>> On Aug 19, 2020, at 9:39 PM, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>>
>>> Jed is more knowledgeable about the communication, but I have a simple question about the FEM method. Normally, the way
>>> we divide unknowns is that the only unknowns which might have entries computed off-process are those on the partition boundary.
>>> However, it sounds like you have a huge number of communicated values. Is it possible that the division of rows in your matrix does
>>> not match the division of the cells you compute element matrices for?
>>
>>
>> I hope that is not the case. I am using libMesh to manage the mesh and creation of sparsity pattern, which uses Parmetis to create the partitions.
>> libMesh ensures that off-process entries are only at the partition boundary (unless an extra set of DoFs are marked for coupling.
>>
>> I also printed and looked at the n_nz and n_oz values on each rank and it does not seem to raise any flags.
>>
>> I will try to dig in a bit further to make sure everything checks out.
>>
>> Looking at the screenshots I had shared yesterday, all processes are in this function:
>>
>> PetscErrorCode MatAssemblyEnd_MPIAIJ(Mat mat,MatAssemblyType mode)
>> {
>> Mat_MPIAIJ *aij = (Mat_MPIAIJ*)mat->data;
>> Mat_SeqAIJ *a = (Mat_SeqAIJ*)aij->A->data;
>> PetscErrorCode ierr;
>> PetscMPIInt n;
>> PetscInt i,j,rstart,ncols,flg;
>> PetscInt *row,*col;
>> PetscBool other_disassembled;
>> PetscScalar *val;
>>
>> /* do not use 'b = (Mat_SeqAIJ*)aij->B->data' as B can be reset in disassembly */
>>
>> PetscFunctionBegin;
>> if (!aij->donotstash && !mat->nooffprocentries) {
>> while (1) {
>> ierr = MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr);
>> if (!flg) break;
>>
>> for (i=0; i<n; ) {
>> /* Now identify the consecutive vals belonging to the same row */
>> for (j=i,rstart=row[j]; j<n; j++) {
>> if (row[j] != rstart) break;
>> }
>> if (j < n) ncols = j-i;
>> else ncols = n-i;
>> /* Now assemble all these values with a single function call */
>> ierr = MatSetValues_MPIAIJ(mat,1,row+i,ncols,col+i,val+i,mat->insertmode);CHKERRQ(ierr);
>>
>> i = j;
>> }
>> }
>> ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr);
>> }
>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr);
>> ierr = MatAssemblyEnd(aij->A,mode);CHKERRQ(ierr);
>>
>> /* determine if any processor has disassembled, if so we must
>> also disassemble ourselfs, in order that we may reassemble. */
>> /*
>> if nonzero structure of submatrix B cannot change then we know that
>> no processor disassembled thus we can skip this stuff
>> */
>> if (!((Mat_SeqAIJ*)aij->B->data)->nonew) {
>> ierr = MPIU_Allreduce(&mat->was_assembled,&other_disassembled,1,MPIU_BOOL,MPI_PROD,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr);
>> if (mat->was_assembled && !other_disassembled) {
>> ierr = MatDisAssemble_MPIAIJ(mat);CHKERRQ(ierr);
>> }
>> }
>> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) {
>> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr);
>> }
>> ierr = MatSetOption(aij->B,MAT_USE_INODES,PETSC_FALSE);CHKERRQ(ierr);
>> ierr = MatAssemblyBegin(aij->B,mode);CHKERRQ(ierr);
>> ierr = MatAssemblyEnd(aij->B,mode);CHKERRQ(ierr);
>>
>> ierr = PetscFree2(aij->rowvalues,aij->rowindices);CHKERRQ(ierr);
>>
>> aij->rowvalues = 0;
>>
>> ierr = VecDestroy(&aij->diag);CHKERRQ(ierr);
>> if (a->inode.size) mat->ops->multdiagonalblock = MatMultDiagonalBlock_MPIAIJ;
>>
>> /* if no new nonzero locations are allowed in matrix then only set the matrix state the first time through */
>> if ((!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) || !((Mat_SeqAIJ*)(aij->A->data))->nonew) {
>> PetscObjectState state = aij->A->nonzerostate + aij->B->nonzerostate;
>> ierr = MPIU_Allreduce(&state,&mat->nonzerostate,1,MPIU_INT64,MPI_SUM,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr);
>> }
>> PetscFunctionReturn(0);
>> }
>>
>>
>> I noticed that of the 8 MPI processes, 2 were stuck at
>> ierr = MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr);
>>
>> Other two were stuck at
>> ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr);
>>
>> And remaining four were under
>> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr);
>>
>> Is it expected for processes to be at different stages in this function?
>>
>> -Manav
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200820/d816f798/attachment.html>
More information about the petsc-users
mailing list