[petsc-users] parallelize matrix assembly process

Barry Smith bsmith at petsc.dev
Tue Dec 13 08:55:46 CST 2022


"MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 73239"

The preallocation is VERY wrong. This is why the computation is so slow; this number should be zero. 



> On Dec 12, 2022, at 10:20 PM, 김성익 <ksi2443 at gmail.com> wrote:
> 
> Following your comments, 
> I checked by using '-info'.
> 
> As you suspected, most elements being computed on wrong MPI rank.
> Also, there are a lot of stashed entries.
> 
> 
> 
> Should I divide the domain from the problem define stage?
> Or is a proper preallocation sufficient?
> 
> 
> 
> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 139687279637472 94370404729840 max tags = 2147483647
> 
> [1] <sys> PetscCommDuplicate(): Duplicating a communicator 139620736898016 94891084133376 max tags = 2147483647
> 
> [0] <mat> MatSetUp(): Warning not preallocating matrix storage
> 
> [1] <sys> PetscCommDuplicate(): Duplicating a communicator 139620736897504 94891083133744 max tags = 2147483647
> 
> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 139687279636960 94370403730224 max tags = 2147483647
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736898016 94891084133376
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279637472 94370404729840
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736898016 94891084133376
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279637472 94370404729840
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736898016 94891084133376
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279637472 94370404729840
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279637472 94370404729840
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736898016 94891084133376
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736898016 94891084133376
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279637472 94370404729840
> 
>  TIME0 : 0.000000
> 
>  TIME0 : 0.000000
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Stash has 661 entries, uses 8 mallocs.
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0 mallocs.
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Stash has 661 entries, uses 5 mallocs.
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0 mallocs.
> 
> [0] <mat> MatAssemblyBegin_MPIAIJ(): Stash has 460416 entries, uses 5 mallocs.
> 
> [1] <mat> MatAssemblyBegin_MPIAIJ(): Stash has 461184 entries, uses 5 mallocs.
> 
> [0] <mat> MatAssemblyEnd_SeqAIJ(): Matrix size: 13892 X 13892; storage space: 180684 unneeded,987406 used
> 
> [0] <mat> MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 73242
> 
> [0] <mat> MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> 
> [0] <mat> MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 13892) < 0.6. Do not use CompressedRow routines.
> 
> [0] <mat> MatSeqAIJCheckInode(): Found 4631 nodes of 13892. Limit used: 5. Using Inode routines
> 
> [1] <mat> MatAssemblyEnd_SeqAIJ(): Matrix size: 13891 X 13891; storage space: 180715 unneeded,987325 used
> 
> [1] <mat> MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 73239
> 
> [1] <mat> MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> 
> [1] <mat> MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 13891) < 0.6. Do not use CompressedRow routines.
> 
> [1] <mat> MatSeqAIJCheckInode(): Found 4631 nodes of 13891. Limit used: 5. Using Inode routines
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [0] <mat> MatAssemblyEnd_SeqAIJ(): Matrix size: 13892 X 1390; storage space: 72491 unneeded,34049 used
> 
> [0] <mat> MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 2472
> 
> [0] <mat> MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 40
> 
> [0] <mat> MatCheckCompressedRow(): Found the ratio (num_zerorows 12501)/(num_localrows 13892) > 0.6. Use CompressedRow routines.
> 
> Assemble Time : 174.079366sec
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [1] <mat> MatAssemblyEnd_SeqAIJ(): Matrix size: 13891 X 1391; storage space: 72441 unneeded,34049 used
> 
> [1] <mat> MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 2469
> 
> [1] <mat> MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 41
> 
> [1] <mat> MatCheckCompressedRow(): Found the ratio (num_zerorows 12501)/(num_localrows 13891) > 0.6. Use CompressedRow routines.
> 
> Assemble Time : 174.141234sec
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Stash has 13891 entries, uses 8 mallocs.
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0 mallocs.
> 
> [1] <mat> MatAssemblyEnd_SeqAIJ(): Matrix size: 13891 X 13891; storage space: 0 unneeded,987325 used
> 
> [1] <mat> MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [1] <mat> MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> 
> [1] <mat> MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 13891) < 0.6. Do not use CompressedRow routines.
> 
> [0] <pc> PCSetUp(): Setting up PC for first time
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <pc> PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <pc> PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [0] <pc> PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> 
> Solving Time : 5.085394sec
> 
> [0] <ksp> KSPConvergedDefault(): Linear solver has converged. Residual norm 1.258030470407e-17 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 2.579617304779e-03 at iteration 1
> 
> Solving Time : 5.089733sec
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Stash has 661 entries, uses 5 mallocs.
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0 mallocs.
> 
> [0] <mat> MatAssemblyBegin_MPIAIJ(): Stash has 460416 entries, uses 0 mallocs.
> 
> [1] <mat> MatAssemblyBegin_MPIAIJ(): Stash has 461184 entries, uses 0 mallocs.
> 
> Assemble Time : 5.242508sec
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> Assemble Time : 5.240863sec
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Stash has 13891 entries, uses 0 mallocs.
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0 mallocs.
> 
>  
>      TIME : 1.000000,     TIME_STEP : 1.000000,      ITER : 2,     RESIDUAL : 2.761615e-03
> 
>  
>      TIME : 1.000000,     TIME_STEP : 1.000000,      ITER : 2,     RESIDUAL : 2.761615e-03
> 
> [0] <pc> PCSetUp(): Setting up PC with same nonzero pattern
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <pc> PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> 
> [0] <pc> PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> 
> [0] <ksp> KSPConvergedDefault(): Linear solver has converged. Residual norm 1.539725065974e-19 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 8.015104666105e-06 at iteration 1
> 
> Solving Time : 4.662785sec
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> Solving Time : 4.664515sec
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Stash has 661 entries, uses 5 mallocs.
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0 mallocs.
> 
> [1] <mat> MatAssemblyBegin_MPIAIJ(): Stash has 461184 entries, uses 0 mallocs.
> 
> [0] <mat> MatAssemblyBegin_MPIAIJ(): Stash has 460416 entries, uses 0 mallocs.
> 
> Assemble Time : 5.238257sec
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> [1] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139620736897504 94891083133744
> 
> Assemble Time : 5.236535sec
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 139687279636960 94370403730224
> 
>  
>      TIME : 1.000000,     TIME_STEP : 1.000000,      ITER : 3,     RESIDUAL : 3.705062e-08
> 
>  TIME0 : 1.000000
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Stash has 13891 entries, uses 0 mallocs.
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0 mallocs.
> 
>  
>      TIME : 1.000000,     TIME_STEP : 1.000000,      ITER : 3,     RESIDUAL : 3.705062e-08
> 
>  TIME0 : 1.000000
> 
> [1] <sys> PetscFinalize(): PetscFinalize() called
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Stash has 661 entries, uses 5 mallocs.
> 
> [0] <vec> VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0 mallocs.
> 
> [0] <sys> PetscFinalize(): PetscFinalize() called
> 
> 
> 2022년 12월 13일 (화) 오전 12:50, Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>님이 작성:
>> 
>>    The problem is possibly due to most elements being computed on "wrong" MPI rank and thus requiring almost all the matrix entries to be "stashed" when computed and then sent off to the owning MPI rank.  Please send ALL the output of a parallel run with -info so we can see how much communication is done in the matrix assembly.
>> 
>>   Barry
>> 
>> 
>> > On Dec 12, 2022, at 6:16 AM, 김성익 <ksi2443 at gmail.com <mailto:ksi2443 at gmail.com>> wrote:
>> > 
>> > Hello,
>> > 
>> > 
>> > I need some keyword or some examples for parallelizing matrix assemble process.
>> > 
>> > My current state is as below.
>> > - Finite element analysis code for Structural mechanics.
>> > - problem size : 3D solid hexa element (number of elements : 125,000), number of degree of freedom : 397,953
>> > - Matrix type : seqaij, matrix set preallocation by using MatSeqAIJSetPreallocation
>> > - Matrix assemble time by using 1 core : 120 sec
>> >    for (int i=0; i<125000; i++) {
>> >     ~~ element matrix calculation}
>> >    matassemblybegin
>> >    matassemblyend
>> > - Matrix assemble time by using 8 core : 70,234sec
>> >   int start, end;
>> >   VecGetOwnershipRange( element_vec, &start, &end);
>> >   for (int i=start; i<end; i++){
>> >    ~~ element matrix calculation
>> >    matassemblybegin
>> >    matassemblyend
>> > 
>> > 
>> > As you see the state, the parallel case spent a lot of time than sequential case..
>> > How can I speed up in this case?
>> > Can I get some keyword or examples for parallelizing assembly of matrix in finite element analysis ?
>> > 
>> > Thanks,
>> > Hyung Kim
>> > 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221213/80285e4b/attachment-0001.html>


More information about the petsc-users mailing list