[petsc-users] Why my petsc program get stuck and hang there when assembling the matrix?

Matthew Knepley knepley at gmail.com
Mon Feb 27 09:10:40 CST 2017


On Mon, Feb 27, 2017 at 9:06 AM, Lukas van de Wiel <
lukas.drinkt.thee at gmail.com> wrote:

> Spreading the elements over the processors by sheer number is not
> automatically a safe method, depending on the mesh. Especially with
> irregular meshes, such as created by Triangle of Gmsh, such a
> distribution will not reduce the amount of communication, maybe even
> increase it.
>
> There are mature and well-tested partitioning tools available that can
> divide your mesh into regional partitions. We use Metis/ParMetis. I
> believe PETSc uses PTScotch.


We have interfaces to Chaco, Metis, ParMetis, Party, and PTScotch

   Matt


> This is an extra step, but it will reduce
> the communication volume considerably.
>
> Cheers
> Lukas
>
> On 2/27/17, Mark Adams <mfadams at lbl.gov> wrote:
> > Another approach that might be simple, if you have the metadata for the
> > entire mesh locally, is set up a list of elements that your local matrix
> > block-rows/vertices touch but going over all the elements and test if any
> > of its vertices i are: if (i >= start && i < end) list.append(i). Just
> > compute and assemble those elements and tell PETSc to
> > ignore-off-processor-entries. No communication, redundant local work,
> some
> > setup code and cost.
> >
> > On Sun, Feb 26, 2017 at 11:37 PM, Fangbo Wang <fangbowa at buffalo.edu>
> wrote:
> >
> >> I got my finite element mesh from a commercial finite element software
> >> ABAQUS. I simply draw the geometry of the model in the graphical
> >> interface
> >> and assign element types and material properties to different parts of
> >> the
> >> model, ABAQUS will automatically output the element and node information
> >> of
> >> the model.
> >>
> >> Suppose I have 1000 elements in my model and 10 MPI processes,
> >> #1 to #100 local element matrices will be computed in MPI process 0;
> >> #101 to #200 local element matrices will be computed in MPI process 1;
> >> #201 to #300 local element matrices will be computed in MPI process 2;
> >> ..........
> >> #901 to #1000 local element matrices will be computed in MPI process 9;
> >>
> >>
> >> However, I might get a lot of global matrix indices which I need to send
> >> to other processors due to the degree of freedom ordering in the finite
> >> element model.
> >>
> >> This is what I did according to my understanding of finite element and
> >> what I have seen.
> >> Do you have some nice libraries or packages that can be easily used in
> >> scientific computing environment?
> >>
> >> Thank you very much!
> >>
> >>
> >>
> >> Fangbo Wang
> >>
> >>
> >>
> >>
> >> On Sun, Feb 26, 2017 at 11:15 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> >>
> >>>
> >>> > On Feb 26, 2017, at 10:04 PM, Fangbo Wang <fangbowa at buffalo.edu>
> >>> > wrote:
> >>> >
> >>> > My problem is a solid mechanics problem using finite element method
> to
> >>> discretize the model ( a 30mX30mX30m soil domain with a building
> >>> structure
> >>> on top).
> >>> >
> >>> > I am not manually deciding which MPI process compute which matrix
> >>> enties. Because I know Petsc can automaticaly communicate between these
> >>> processors.
> >>> > I am just asking each MPI process generate certain number of matrix
> >>> entries regardless of which process will finally store them.
> >>>
> >>>   The standard way to handle this for finite elements is to partition
> >>> the
> >>> elements among the processes and then partition the nodes (rows of the
> >>> degrees of freedom) subservient to the partitioning of the elements.
> >>> Otherwise most of the matrix (or vector) entries must be communicated
> >>> and
> >>> this is not scalable.
> >>>
> >>>    So how are you partitioning the elements (for matrix stiffness
> >>> computations) and the nodes between processes?
> >>> >
> >>> > Actually, I constructed another matrix with same size but generating
> >>> much less entries, and the code worked. However, it gets stuck when I
> >>> generate more matrix entries.
> >>> >
> >>> > thank you very much! Any suggestion is highly appreciated.
> >>> >
> >>> > BTW, what is the meaning of "[4] MatCheckCompressedRow(): Found the
> >>> ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use
> >>> CompressedRow routines."? I know compressed row format is commonly used
> >>> for
> >>> sparse matrix, why don't use compressed row routines here?
> >>>
> >>>   This is not important.
> >>>
> >>> >
> >>> >
> >>> > Thanks,
> >>> >
> >>> >
> >>> > Fangbo Wang
> >>> >
> >>> >
> >>> >
> >>> > On Sun, Feb 26, 2017 at 10:42 PM, Barry Smith <bsmith at mcs.anl.gov>
> >>> wrote:
> >>> >
> >>> >   How are you generating the matrix entries in parallel? In general
> >>> > you
> >>> can generate any matrix entries on any MPI process and they will be
> >>> automatically transferred to the MPI process that owns the entries
> >>> automatically. BUT if a huge number of matrix entries are computed on
> >>> one
> >>> process and need to be communicated to another process this may cause
> >>> gridlock with MPI. Based on the huge size of messages from process 12
> it
> >>> looks like this is what is happening in your code.
> >>> >
> >>> >  Ideally most matrix entries are generated on the process they are
> >>> stored and hence this gridlock does not happen.
> >>> >
> >>> > What type of discretization are you using? Finite differences, finite
> >>> element, finite volume, spectral, something else? How are you deciding
> >>> which MPI process should compute which matrix entries? Once we
> >>> understand
> >>> this we may be able to suggest a better way to compute the entries.
> >>> >
> >>> >   Barry
> >>> >
> >>> > Under normally circumstances 1.3 million unknowns is not a large
> >>> parallel matrix, there may be special features of your matrix that is
> >>> making this difficult.
> >>> >
> >>> >
> >>> >
> >>> > > On Feb 26, 2017, at 9:30 PM, Fangbo Wang <fangbowa at buffalo.edu>
> >>> wrote:
> >>> > >
> >>> > > Hi,
> >>> > >
> >>> > > I construct a big matrix which is 1.3million by 1.3million which is
> >>> using approximatly 100GB memory. I have a computer with 500GB memory.
> >>> > >
> >>> > > I run the Petsc program and it get stuck when finally assembling
> the
> >>> matrix. The program is using around 200GB memory only. However, the
> >>> program
> >>> just get stuck there. Here is the output message when it gets stuck.
> >>> > > .
> >>> > > .
> >>> > > previous outputs not shown here
> >>> > > .
> >>> > > [12] MatStashScatterBegin_Ref(): No of messages: 15
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 328581552 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 163649328 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 95512224 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 317711616 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 170971776 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 254000064 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 163146720 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 345150048 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 163411584 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 739711296 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 13: size: 435247344 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 435136752 bytes
> >>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 346167552 bytes
> >>> > > [14] MatAssemblyBegin_MPIAIJ(): Stash has 263158893 entries, uses
> 14
> >>> mallocs.
> >>> > > [8] MatAssemblyBegin_MPIAIJ(): Stash has 286768572 entries, uses 14
> >>> mallocs.
> >>> > > [12] MatAssemblyBegin_MPIAIJ(): Stash has 291181818 entries, uses
> 14
> >>> mallocs.
> >>> > > [13] MatStashScatterBegin_Ref(): No of messages: 15
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 271636416 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 220594464 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 51041952 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 276201408 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 256952256 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 198489024 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 218657760 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 219686880 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 288874752 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 172579968 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 12: size: 639835680 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 270060144 bytes
> >>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 511244160 bytes
> >>> > > [13] MatAssemblyBegin_MPIAIJ(): Stash has 268522881 entries, uses
> 14
> >>> mallocs.
> >>> > > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage
> >>> space: 89786788 unneeded,7025212 used
> >>> > > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
> MatSetValues()
> >>> is 0
> >>> > > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> >>> > > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
> >>> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
> >>> > > [5] MatSeqAIJCheckInode(): Found 32271 nodes of 96812. Limit used:
> >>> > > 5.
> >>> Using Inode routines
> >>> > > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage
> >>> space: 89841924 unneeded,6970076 used
> >>> > > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
> MatSetValues()
> >>> is 0
> >>> > > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> >>> > > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
> >>> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.
> >>> > > [4] MatSeqAIJCheckInode(): Found 32272 nodes of 96812. Limit used:
> >>> > > 5.
> >>> Using Inode routines
> >>> > >
> >>> > > stuck here!!!!
> >>> > >
> >>> > >
> >>> > > Any one have ideas on this? Thank you very much!
> >>> > >
> >>> > >
> >>> > >
> >>> > > Fangbo Wang
> >>> > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Fangbo Wang, PhD student
> >>> > > Stochastic Geomechanics Research Group
> >>> > > Department of Civil, Structural and Environmental Engineering
> >>> > > University at Buffalo
> >>> > > Email: fangbowa at buffalo.edu
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Fangbo Wang, PhD student
> >>> > Stochastic Geomechanics Research Group
> >>> > Department of Civil, Structural and Environmental Engineering
> >>> > University at Buffalo
> >>> > Email: fangbowa at buffalo.edu
> >>>
> >>>
> >>
> >>
> >> --
> >> Fangbo Wang, PhD student
> >> Stochastic Geomechanics Research Group
> >> Department of Civil, Structural and Environmental Engineering
> >> University at Buffalo
> >> Email: *fangbowa at buffalo.edu <fangbowa at buffalo.edu>*
> >>
> >
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170227/38dd19c1/attachment.html>


More information about the petsc-users mailing list