[petsc-users] Why my petsc program get stuck and hang there when assembling the matrix?

Mon Feb 27 08:37:50 CST 2017

Another approach that might be simple, if you have the metadata for the
entire mesh locally, is set up a list of elements that your local matrix
block-rows/vertices touch but going over all the elements and test if any
of its vertices i are: if (i >= start && i < end) list.append(i). Just
compute and assemble those elements and tell PETSc to
ignore-off-processor-entries. No communication, redundant local work, some
setup code and cost.

On Sun, Feb 26, 2017 at 11:37 PM, Fangbo Wang <fangbowa at buffalo.edu> wrote:

> I got my finite element mesh from a commercial finite element software
> ABAQUS. I simply draw the geometry of the model in the graphical interface
> and assign element types and material properties to different parts of the
> model, ABAQUS will automatically output the element and node information of
> the model.
>
> Suppose I have 1000 elements in my model and 10 MPI processes,
> #1 to #100 local element matrices will be computed in MPI process 0;
> #101 to #200 local element matrices will be computed in MPI process 1;
> #201 to #300 local element matrices will be computed in MPI process 2;
> ..........
> #901 to #1000 local element matrices will be computed in MPI process 9;
>
>
> However, I might get a lot of global matrix indices which I need to send
> to other processors due to the degree of freedom ordering in the finite
> element model.
>
> This is what I did according to my understanding of finite element and
> what I have seen.
> Do you have some nice libraries or packages that can be easily used in
> scientific computing environment?
>
> Thank you very much!
>
>
>
> Fangbo Wang
>
>
>
>
> On Sun, Feb 26, 2017 at 11:15 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>> > On Feb 26, 2017, at 10:04 PM, Fangbo Wang <fangbowa at buffalo.edu> wrote:
>> >
>> > My problem is a solid mechanics problem using finite element method to
>> discretize the model ( a 30mX30mX30m soil domain with a building structure
>> on top).
>> >
>> > I am not manually deciding which MPI process compute which matrix
>> enties. Because I know Petsc can automaticaly communicate between these
>> processors.
>> > I am just asking each MPI process generate certain number of matrix
>> entries regardless of which process will finally store them.
>>
>>   The standard way to handle this for finite elements is to partition the
>> elements among the processes and then partition the nodes (rows of the
>> degrees of freedom) subservient to the partitioning of the elements.
>> Otherwise most of the matrix (or vector) entries must be communicated and
>> this is not scalable.
>>
>>    So how are you partitioning the elements (for matrix stiffness
>> computations) and the nodes between processes?
>> >
>> > Actually, I constructed another matrix with same size but generating
>> much less entries, and the code worked. However, it gets stuck when I
>> generate more matrix entries.
>> >
>> > thank you very much! Any suggestion is highly appreciated.
>> >
>> > BTW, what is the meaning of "[4] MatCheckCompressedRow(): Found the
>> ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use
>> CompressedRow routines."? I know compressed row format is commonly used for
>> sparse matrix, why don't use compressed row routines here?
>>
>>   This is not important.
>>
>> >
>> >
>> > Thanks,
>> >
>> >
>> > Fangbo Wang
>> >
>> >
>> >
>> > On Sun, Feb 26, 2017 at 10:42 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >
>> >   How are you generating the matrix entries in parallel? In general you
>> can generate any matrix entries on any MPI process and they will be
>> automatically transferred to the MPI process that owns the entries
>> automatically. BUT if a huge number of matrix entries are computed on one
>> process and need to be communicated to another process this may cause
>> gridlock with MPI. Based on the huge size of messages from process 12 it
>> looks like this is what is happening in your code.
>> >
>> >  Ideally most matrix entries are generated on the process they are
>> stored and hence this gridlock does not happen.
>> >
>> > What type of discretization are you using? Finite differences, finite
>> element, finite volume, spectral, something else? How are you deciding
>> which MPI process should compute which matrix entries? Once we understand
>> this we may be able to suggest a better way to compute the entries.
>> >
>> >   Barry
>> >
>> > Under normally circumstances 1.3 million unknowns is not a large
>> parallel matrix, there may be special features of your matrix that is
>> making this difficult.
>> >
>> >
>> >
>> > > On Feb 26, 2017, at 9:30 PM, Fangbo Wang <fangbowa at buffalo.edu>
>> wrote:
>> > >
>> > > Hi,
>> > >
>> > > I construct a big matrix which is 1.3million by 1.3million which is
>> using approximatly 100GB memory. I have a computer with 500GB memory.
>> > >
>> > > I run the Petsc program and it get stuck when finally assembling the
>> matrix. The program is using around 200GB memory only. However, the program
>> just get stuck there. Here is the output message when it gets stuck.
>> > > .
>> > > .
>> > > previous outputs not shown here
>> > > .
>> > > [12] MatStashScatterBegin_Ref(): No of messages: 15
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 328581552 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 163649328 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 95512224 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 317711616 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 170971776 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 254000064 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 163146720 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 345150048 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 163411584 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 739711296 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 13: size: 435247344 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 435136752 bytes
>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 346167552 bytes
>> > > [14] MatAssemblyBegin_MPIAIJ(): Stash has 263158893 entries, uses 14
>> mallocs.
>> > > [8] MatAssemblyBegin_MPIAIJ(): Stash has 286768572 entries, uses 14
>> mallocs.
>> > > [12] MatAssemblyBegin_MPIAIJ(): Stash has 291181818 entries, uses 14
>> mallocs.
>> > > [13] MatStashScatterBegin_Ref(): No of messages: 15
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 271636416 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 220594464 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 51041952 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 276201408 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 256952256 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 198489024 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 218657760 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 219686880 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 288874752 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 172579968 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 12: size: 639835680 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 270060144 bytes
>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 511244160 bytes
>> > > [13] MatAssemblyBegin_MPIAIJ(): Stash has 268522881 entries, uses 14
>> mallocs.
>> > > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage
>> space: 89786788 unneeded,7025212 used
>> > > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>> is 0
>> > > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
>> > > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
>> > > [5] MatSeqAIJCheckInode(): Found 32271 nodes of 96812. Limit used: 5.
>> Using Inode routines
>> > > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage
>> space: 89841924 unneeded,6970076 used
>> > > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>> is 0
>> > > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
>> > > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
>> > > [4] MatSeqAIJCheckInode(): Found 32272 nodes of 96812. Limit used: 5.
>> Using Inode routines
>> > >
>> > > stuck here!!!!
>> > >
>> > >
>> > > Any one have ideas on this? Thank you very much!
>> > >
>> > >
>> > >
>> > > Fangbo Wang
>> > >
>> > >
>> > >
>> > > --
>> > > Fangbo Wang, PhD student
>> > > Stochastic Geomechanics Research Group
>> > > Department of Civil, Structural and Environmental Engineering
>> > > University at Buffalo
>> > > Email: fangbowa at buffalo.edu
>> >
>> >
>> >
>> >
>> > --
>> > Fangbo Wang, PhD student
>> > Stochastic Geomechanics Research Group
>> > Department of Civil, Structural and Environmental Engineering
>> > University at Buffalo
>> > Email: fangbowa at buffalo.edu
>>
>>
>
>
> --
> Fangbo Wang, PhD student
> Stochastic Geomechanics Research Group
> Department of Civil, Structural and Environmental Engineering
> University at Buffalo
> Email: *fangbowa at buffalo.edu <fangbowa at buffalo.edu>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170227/6ec49da1/attachment.html>