[petsc-users] Why my petsc program get stuck and hang there when assembling the matrix?

Fangbo Wang fangbowa at buffalo.edu
Sun Feb 26 22:04:46 CST 2017


My problem is a solid mechanics problem using finite element method to
discretize the model ( a 30mX30mX30m soil domain with a building structure
on top).

I am not manually deciding which MPI process compute which matrix enties.
Because I know Petsc can automaticaly communicate between these processors.
I am just asking each MPI process generate certain number of matrix entries
regardless of which process will finally store them.

Actually, I constructed another matrix with same size but generating much
less entries, and the code worked. However, it gets stuck when I generate
more matrix entries.

thank you very much! Any suggestion is highly appreciated.

BTW, what is the meaning of "[4] MatCheckCompressedRow(): Found the ratio
(num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow
routines."? I know compressed row format is commonly used for sparse
matrix, why don't use compressed row routines here?


Thanks,


Fangbo Wang



On Sun, Feb 26, 2017 at 10:42 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   How are you generating the matrix entries in parallel? In general you
> can generate any matrix entries on any MPI process and they will be
> automatically transferred to the MPI process that owns the entries
> automatically. BUT if a huge number of matrix entries are computed on one
> process and need to be communicated to another process this may cause
> gridlock with MPI. Based on the huge size of messages from process 12 it
> looks like this is what is happening in your code.
>
>  Ideally most matrix entries are generated on the process they are stored
> and hence this gridlock does not happen.
>
> What type of discretization are you using? Finite differences, finite
> element, finite volume, spectral, something else? How are you deciding
> which MPI process should compute which matrix entries? Once we understand
> this we may be able to suggest a better way to compute the entries.
>
>   Barry
>
> Under normally circumstances 1.3 million unknowns is not a large parallel
> matrix, there may be special features of your matrix that is making this
> difficult.
>
>
>
> > On Feb 26, 2017, at 9:30 PM, Fangbo Wang <fangbowa at buffalo.edu> wrote:
> >
> > Hi,
> >
> > I construct a big matrix which is 1.3million by 1.3million which is
> using approximatly 100GB memory. I have a computer with 500GB memory.
> >
> > I run the Petsc program and it get stuck when finally assembling the
> matrix. The program is using around 200GB memory only. However, the program
> just get stuck there. Here is the output message when it gets stuck.
> > .
> > .
> > previous outputs not shown here
> > .
> > [12] MatStashScatterBegin_Ref(): No of messages: 15
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 328581552 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 163649328 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 95512224 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 317711616 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 170971776 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 254000064 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 163146720 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 345150048 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 163411584 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 739711296 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 13: size: 435247344 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 435136752 bytes
> > [12] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 346167552 bytes
> > [14] MatAssemblyBegin_MPIAIJ(): Stash has 263158893 entries, uses 14
> mallocs.
> > [8] MatAssemblyBegin_MPIAIJ(): Stash has 286768572 entries, uses 14
> mallocs.
> > [12] MatAssemblyBegin_MPIAIJ(): Stash has 291181818 entries, uses 14
> mallocs.
> > [13] MatStashScatterBegin_Ref(): No of messages: 15
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 271636416 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 220594464 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 51041952 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 276201408 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 256952256 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 198489024 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 218657760 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 219686880 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 288874752 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 172579968 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 12: size: 639835680 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 270060144 bytes
> > [13] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 511244160 bytes
> > [13] MatAssemblyBegin_MPIAIJ(): Stash has 268522881 entries, uses 14
> mallocs.
> > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage space:
> 89786788 unneeded,7025212 used
> > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
> > [5] MatSeqAIJCheckInode(): Found 32271 nodes of 96812. Limit used: 5.
> Using Inode routines
> > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage space:
> 89841924 unneeded,6970076 used
> > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
> > [4] MatSeqAIJCheckInode(): Found 32272 nodes of 96812. Limit used: 5.
> Using Inode routines
> >
> > stuck here!!!!
> >
> >
> > Any one have ideas on this? Thank you very much!
> >
> >
> >
> > Fangbo Wang
> >
> >
> >
> > --
> > Fangbo Wang, PhD student
> > Stochastic Geomechanics Research Group
> > Department of Civil, Structural and Environmental Engineering
> > University at Buffalo
> > Email: fangbowa at buffalo.edu
>
>


-- 
Fangbo Wang, PhD student
Stochastic Geomechanics Research Group
Department of Civil, Structural and Environmental Engineering
University at Buffalo
Email: *fangbowa at buffalo.edu <fangbowa at buffalo.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170226/5b8e2624/attachment-0001.html>


More information about the petsc-users mailing list