[petsc-users] Why my petsc program get stuck and hang there when assembling the matrix?

Mon Feb 27 09:06:09 CST 2017

Spreading the elements over the processors by sheer number is not
automatically a safe method, depending on the mesh. Especially with
irregular meshes, such as created by Triangle of Gmsh, such a
distribution will not reduce the amount of communication, maybe even
increase it.

There are mature and well-tested partitioning tools available that can
divide your mesh into regional partitions. We use Metis/ParMetis. I
believe PETSc uses PTScotch. This is an extra step, but it will reduce
the communication volume considerably.

Cheers
Lukas

On 2/27/17, Mark Adams <mfadams at lbl.gov> wrote:
> Another approach that might be simple, if you have the metadata for the
> entire mesh locally, is set up a list of elements that your local matrix
> block-rows/vertices touch but going over all the elements and test if any
> of its vertices i are: if (i >= start && i < end) list.append(i). Just
> compute and assemble those elements and tell PETSc to
> ignore-off-processor-entries. No communication, redundant local work, some
> setup code and cost.
>
> On Sun, Feb 26, 2017 at 11:37 PM, Fangbo Wang <fangbowa at buffalo.edu> wrote:
>
>> I got my finite element mesh from a commercial finite element software
>> ABAQUS. I simply draw the geometry of the model in the graphical
>> interface
>> and assign element types and material properties to different parts of
>> the
>> model, ABAQUS will automatically output the element and node information
>> of
>> the model.
>>
>> Suppose I have 1000 elements in my model and 10 MPI processes,
>> #1 to #100 local element matrices will be computed in MPI process 0;
>> #101 to #200 local element matrices will be computed in MPI process 1;
>> #201 to #300 local element matrices will be computed in MPI process 2;
>> ..........
>> #901 to #1000 local element matrices will be computed in MPI process 9;
>>
>>
>> However, I might get a lot of global matrix indices which I need to send
>> to other processors due to the degree of freedom ordering in the finite
>> element model.
>>
>> This is what I did according to my understanding of finite element and
>> what I have seen.
>> Do you have some nice libraries or packages that can be easily used in
>> scientific computing environment?
>>
>> Thank you very much!
>>
>>
>>
>> Fangbo Wang
>>
>>
>>
>>
>> On Sun, Feb 26, 2017 at 11:15 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>>
>>> > On Feb 26, 2017, at 10:04 PM, Fangbo Wang <fangbowa at buffalo.edu>
>>> > wrote:
>>> >
>>> > My problem is a solid mechanics problem using finite element method to
>>> discretize the model ( a 30mX30mX30m soil domain with a building
>>> structure
>>> on top).
>>> >
>>> > I am not manually deciding which MPI process compute which matrix
>>> enties. Because I know Petsc can automaticaly communicate between these
>>> processors.
>>> > I am just asking each MPI process generate certain number of matrix
>>> entries regardless of which process will finally store them.
>>>
>>>   The standard way to handle this for finite elements is to partition
>>> the
>>> elements among the processes and then partition the nodes (rows of the
>>> degrees of freedom) subservient to the partitioning of the elements.
>>> Otherwise most of the matrix (or vector) entries must be communicated
>>> and
>>> this is not scalable.
>>>
>>>    So how are you partitioning the elements (for matrix stiffness
>>> computations) and the nodes between processes?
>>> >
>>> > Actually, I constructed another matrix with same size but generating
>>> much less entries, and the code worked. However, it gets stuck when I
>>> generate more matrix entries.
>>> >
>>> > thank you very much! Any suggestion is highly appreciated.
>>> >
>>> > BTW, what is the meaning of "[4] MatCheckCompressedRow(): Found the
>>> ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use
>>> CompressedRow routines."? I know compressed row format is commonly used
>>> for
>>> sparse matrix, why don't use compressed row routines here?
>>>
>>>   This is not important.
>>>
>>> >
>>> >
>>> > Thanks,
>>> >
>>> >
>>> > Fangbo Wang
>>> >
>>> >
>>> >
>>> > On Sun, Feb 26, 2017 at 10:42 PM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> >
>>> >   How are you generating the matrix entries in parallel? In general
>>> > you
>>> can generate any matrix entries on any MPI process and they will be
>>> automatically transferred to the MPI process that owns the entries
>>> automatically. BUT if a huge number of matrix entries are computed on
>>> one
>>> process and need to be communicated to another process this may cause
>>> gridlock with MPI. Based on the huge size of messages from process 12 it
>>> looks like this is what is happening in your code.
>>> >
>>> >  Ideally most matrix entries are generated on the process they are
>>> stored and hence this gridlock does not happen.
>>> >
>>> > What type of discretization are you using? Finite differences, finite
>>> element, finite volume, spectral, something else? How are you deciding
>>> which MPI process should compute which matrix entries? Once we
>>> understand
>>> this we may be able to suggest a better way to compute the entries.
>>> >
>>> >   Barry
>>> >
>>> > Under normally circumstances 1.3 million unknowns is not a large
>>> parallel matrix, there may be special features of your matrix that is
>>> making this difficult.
>>> >
>>> >
>>> >
>>> > > On Feb 26, 2017, at 9:30 PM, Fangbo Wang <fangbowa at buffalo.edu>
>>> wrote:
>>> > >
>>> > > Hi,
>>> > >
>>> > > I construct a big matrix which is 1.3million by 1.3million which is
>>> using approximatly 100GB memory. I have a computer with 500GB memory.
>>> > >
>>> > > I run the Petsc program and it get stuck when finally assembling the
>>> matrix. The program is using around 200GB memory only. However, the
>>> program
>>> just get stuck there. Here is the output message when it gets stuck.
>>> > > .
>>> > > .
>>> > > previous outputs not shown here
>>> > > .
>>> > > [12] MatStashScatterBegin_Ref(): No of messages: 15
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 328581552 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 163649328 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 95512224 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 317711616 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 170971776 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 254000064 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 163146720 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 345150048 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 163411584 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 739711296 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 13: size: 435247344 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 435136752 bytes
>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 346167552 bytes
>>> > > [14] MatAssemblyBegin_MPIAIJ(): Stash has 263158893 entries, uses 14
>>> mallocs.
>>> > > [8] MatAssemblyBegin_MPIAIJ(): Stash has 286768572 entries, uses 14
>>> mallocs.
>>> > > [12] MatAssemblyBegin_MPIAIJ(): Stash has 291181818 entries, uses 14
>>> mallocs.
>>> > > [13] MatStashScatterBegin_Ref(): No of messages: 15
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 271636416 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 220594464 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 51041952 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 276201408 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 256952256 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 198489024 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 218657760 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 219686880 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 288874752 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 172579968 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 12: size: 639835680 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 270060144 bytes
>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 511244160 bytes
>>> > > [13] MatAssemblyBegin_MPIAIJ(): Stash has 268522881 entries, uses 14
>>> mallocs.
>>> > > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage
>>> space: 89786788 unneeded,7025212 used
>>> > > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
>>> > > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
>>> > > [5] MatSeqAIJCheckInode(): Found 32271 nodes of 96812. Limit used:
>>> > > 5.
>>> Using Inode routines
>>> > > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage
>>> space: 89841924 unneeded,6970076 used
>>> > > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
>>> > > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
>>> > > [4] MatSeqAIJCheckInode(): Found 32272 nodes of 96812. Limit used:
>>> > > 5.
>>> Using Inode routines
>>> > >
>>> > > stuck here!!!!
>>> > >
>>> > >
>>> > > Any one have ideas on this? Thank you very much!
>>> > >
>>> > >
>>> > >
>>> > > Fangbo Wang
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Fangbo Wang, PhD student
>>> > > Stochastic Geomechanics Research Group
>>> > > Department of Civil, Structural and Environmental Engineering
>>> > > University at Buffalo
>>> > > Email: fangbowa at buffalo.edu
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Fangbo Wang, PhD student
>>> > Stochastic Geomechanics Research Group
>>> > Department of Civil, Structural and Environmental Engineering
>>> > University at Buffalo
>>> > Email: fangbowa at buffalo.edu
>>>
>>>
>>
>>
>> --
>> Fangbo Wang, PhD student
>> Stochastic Geomechanics Research Group
>> Department of Civil, Structural and Environmental Engineering
>> University at Buffalo
>> Email: *fangbowa at buffalo.edu <fangbowa at buffalo.edu>*
>>
>