[petsc-users] Why my petsc program get stuck and hang there when assembling the matrix?
Barry Smith
bsmith at mcs.anl.gov
Sun Feb 26 22:15:58 CST 2017
> On Feb 26, 2017, at 10:04 PM, Fangbo Wang <fangbowa at buffalo.edu> wrote:
>
> My problem is a solid mechanics problem using finite element method to discretize the model ( a 30mX30mX30m soil domain with a building structure on top).
>
> I am not manually deciding which MPI process compute which matrix enties. Because I know Petsc can automaticaly communicate between these processors.
> I am just asking each MPI process generate certain number of matrix entries regardless of which process will finally store them.
The standard way to handle this for finite elements is to partition the elements among the processes and then partition the nodes (rows of the degrees of freedom) subservient to the partitioning of the elements. Otherwise most of the matrix (or vector) entries must be communicated and this is not scalable.
So how are you partitioning the elements (for matrix stiffness computations) and the nodes between processes?
>
> Actually, I constructed another matrix with same size but generating much less entries, and the code worked. However, it gets stuck when I generate more matrix entries.
>
> thank you very much! Any suggestion is highly appreciated.
>
> BTW, what is the meaning of "[4] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines."? I know compressed row format is commonly used for sparse matrix, why don't use compressed row routines here?
This is not important.
>
>
> Thanks,
>
>
> Fangbo Wang
>
>
>
> On Sun, Feb 26, 2017 at 10:42 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> How are you generating the matrix entries in parallel? In general you can generate any matrix entries on any MPI process and they will be automatically transferred to the MPI process that owns the entries automatically. BUT if a huge number of matrix entries are computed on one process and need to be communicated to another process this may cause gridlock with MPI. Based on the huge size of messages from process 12 it looks like this is what is happening in your code.
>
> Ideally most matrix entries are generated on the process they are stored and hence this gridlock does not happen.
>
> What type of discretization are you using? Finite differences, finite element, finite volume, spectral, something else? How are you deciding which MPI process should compute which matrix entries? Once we understand this we may be able to suggest a better way to compute the entries.
>
> Barry
>
> Under normally circumstances 1.3 million unknowns is not a large parallel matrix, there may be special features of your matrix that is making this difficult.
>
>
>
> > On Feb 26, 2017, at 9:30 PM, Fangbo Wang <fangbowa at buffalo.edu> wrote:
> >
> > Hi,
> >
> > I construct a big matrix which is 1.3million by 1.3million which is using approximatly 100GB memory. I have a computer with 500GB memory.
> >
> > I run the Petsc program and it get stuck when finally assembling the matrix. The program is using around 200GB memory only. However, the program just get stuck there. Here is the output message when it gets stuck.
> > .
> > .
> > previous outputs not shown here
> > .
> > [12] MatStashScatterBegin_Ref(): No of messages: 15
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 328581552 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 163649328 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 95512224 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 317711616 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 170971776 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 254000064 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 163146720 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 345150048 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 163411584 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 739711296 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 13: size: 435247344 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 435136752 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 346167552 bytes
> > [14] MatAssemblyBegin_MPIAIJ(): Stash has 263158893 entries, uses 14 mallocs.
> > [8] MatAssemblyBegin_MPIAIJ(): Stash has 286768572 entries, uses 14 mallocs.
> > [12] MatAssemblyBegin_MPIAIJ(): Stash has 291181818 entries, uses 14 mallocs.
> > [13] MatStashScatterBegin_Ref(): No of messages: 15
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 271636416 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 220594464 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 51041952 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 276201408 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 256952256 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 198489024 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 218657760 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 219686880 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 288874752 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 172579968 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 12: size: 639835680 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 270060144 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 511244160 bytes
> > [13] MatAssemblyBegin_MPIAIJ(): Stash has 268522881 entries, uses 14 mallocs.
> > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage space: 89786788 unneeded,7025212 used
> > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
> > [5] MatSeqAIJCheckInode(): Found 32271 nodes of 96812. Limit used: 5. Using Inode routines
> > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage space: 89841924 unneeded,6970076 used
> > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
> > [4] MatSeqAIJCheckInode(): Found 32272 nodes of 96812. Limit used: 5. Using Inode routines
> >
> > stuck here!!!!
> >
> >
> > Any one have ideas on this? Thank you very much!
> >
> >
> >
> > Fangbo Wang
> >
> >
> >
> > --
> > Fangbo Wang, PhD student
> > Stochastic Geomechanics Research Group
> > Department of Civil, Structural and Environmental Engineering
> > University at Buffalo
> > Email: fangbowa at buffalo.edu
>
>
>
>
> --
> Fangbo Wang, PhD student
> Stochastic Geomechanics Research Group
> Department of Civil, Structural and Environmental Engineering
> University at Buffalo
> Email: fangbowa at buffalo.edu
More information about the petsc-users
mailing list