[petsc-users] Why my petsc program get stuck and hang there when assembling the matrix?

Barry Smith bsmith at mcs.anl.gov
Sun Feb 26 21:42:02 CST 2017


  How are you generating the matrix entries in parallel? In general you can generate any matrix entries on any MPI process and they will be automatically transferred to the MPI process that owns the entries automatically. BUT if a huge number of matrix entries are computed on one process and need to be communicated to another process this may cause gridlock with MPI. Based on the huge size of messages from process 12 it looks like this is what is happening in your code.

 Ideally most matrix entries are generated on the process they are stored and hence this gridlock does not happen.

What type of discretization are you using? Finite differences, finite element, finite volume, spectral, something else? How are you deciding which MPI process should compute which matrix entries? Once we understand this we may be able to suggest a better way to compute the entries.

  Barry

Under normally circumstances 1.3 million unknowns is not a large parallel matrix, there may be special features of your matrix that is making this difficult.



> On Feb 26, 2017, at 9:30 PM, Fangbo Wang <fangbowa at buffalo.edu> wrote:
> 
> Hi, 
> 
> I construct a big matrix which is 1.3million by 1.3million which is using approximatly 100GB memory. I have a computer with 500GB memory.
> 
> I run the Petsc program and it get stuck when finally assembling the matrix. The program is using around 200GB memory only. However, the program just get stuck there. Here is the output message when it gets stuck.
> .
> .
> previous outputs not shown here
> .
> [12] MatStashScatterBegin_Ref(): No of messages: 15 
> [12] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 328581552 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 163649328 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 95512224 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 317711616 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 170971776 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 254000064 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 163146720 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 345150048 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 163411584 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 739711296 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 13: size: 435247344 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 435136752 bytes
> [12] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 346167552 bytes
> [14] MatAssemblyBegin_MPIAIJ(): Stash has 263158893 entries, uses 14 mallocs.
> [8] MatAssemblyBegin_MPIAIJ(): Stash has 286768572 entries, uses 14 mallocs.
> [12] MatAssemblyBegin_MPIAIJ(): Stash has 291181818 entries, uses 14 mallocs.
> [13] MatStashScatterBegin_Ref(): No of messages: 15 
> [13] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 271636416 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 220594464 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 51041952 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 276201408 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 256952256 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 198489024 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 218657760 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 219686880 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 288874752 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 172579968 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 12: size: 639835680 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 270060144 bytes
> [13] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 511244160 bytes
> [13] MatAssemblyBegin_MPIAIJ(): Stash has 268522881 entries, uses 14 mallocs.
> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage space: 89786788 unneeded,7025212 used
> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
> [5] MatSeqAIJCheckInode(): Found 32271 nodes of 96812. Limit used: 5. Using Inode routines
> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage space: 89841924 unneeded,6970076 used
> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
> [4] MatSeqAIJCheckInode(): Found 32272 nodes of 96812. Limit used: 5. Using Inode routines
> 
> stuck here!!!!
> 
> 
> Any one have ideas on this? Thank you very much!
> 
> 
> 
> Fangbo Wang
> 
> 
> 
> -- 
> Fangbo Wang, PhD student
> Stochastic Geomechanics Research Group
> Department of Civil, Structural and Environmental Engineering
> University at Buffalo
> Email: fangbowa at buffalo.edu



More information about the petsc-users mailing list