<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Feb 27, 2017 at 9:06 AM, Lukas van de Wiel <span dir="ltr"><<a href="mailto:lukas.drinkt.thee@gmail.com" target="_blank">lukas.drinkt.thee@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Spreading the elements over the processors by sheer number is not<br>

automatically a safe method, depending on the mesh. Especially with<br>

irregular meshes, such as created by Triangle of Gmsh, such a<br>

distribution will not reduce the amount of communication, maybe even<br>

increase it.<br>

<br>

There are mature and well-tested partitioning tools available that can<br>

divide your mesh into regional partitions. We use Metis/ParMetis. I<br>

believe PETSc uses PTScotch.</blockquote><div><br></div><div>We have interfaces to Chaco, Metis, ParMetis, Party, and PTScotch</div><div><br></div><div>   Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> This is an extra step, but it will reduce<br>

the communication volume considerably.<br>

<br>

Cheers<br>

Lukas<br>

<br>

On 2/27/17, Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br>

> Another approach that might be simple, if you have the metadata for the<br>

> entire mesh locally, is set up a list of elements that your local matrix<br>

> block-rows/vertices touch but going over all the elements and test if any<br>

> of its vertices i are: if (i >= start && i < end) list.append(i). Just<br>

> compute and assemble those elements and tell PETSc to<br>

> ignore-off-processor-entries. No communication, redundant local work, some<br>

> setup code and cost.<br>

><br>

> On Sun, Feb 26, 2017 at 11:37 PM, Fangbo Wang <<a href="mailto:fangbowa@buffalo.edu">fangbowa@buffalo.edu</a>> wrote:<br>

><br>

>> I got my finite element mesh from a commercial finite element software<br>

>> ABAQUS. I simply draw the geometry of the model in the graphical<br>

>> interface<br>

>> and assign element types and material properties to different parts of<br>

>> the<br>

>> model, ABAQUS will automatically output the element and node information<br>

>> of<br>

>> the model.<br>

>><br>

>> Suppose I have 1000 elements in my model and 10 MPI processes,<br>

>> #1 to #100 local element matrices will be computed in MPI process 0;<br>

>> #101 to #200 local element matrices will be computed in MPI process 1;<br>

>> #201 to #300 local element matrices will be computed in MPI process 2;<br>

>> ..........<br>

>> #901 to #1000 local element matrices will be computed in MPI process 9;<br>

>><br>

>><br>

>> However, I might get a lot of global matrix indices which I need to send<br>

>> to other processors due to the degree of freedom ordering in the finite<br>

>> element model.<br>

>><br>

>> This is what I did according to my understanding of finite element and<br>

>> what I have seen.<br>

>> Do you have some nice libraries or packages that can be easily used in<br>

>> scientific computing environment?<br>

>><br>

>> Thank you very much!<br>

>><br>

>><br>

>><br>

>> Fangbo Wang<br>

>><br>

>><br>

>><br>

>><br>

>> On Sun, Feb 26, 2017 at 11:15 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>

>><br>

>>><br>

>>> > On Feb 26, 2017, at 10:04 PM, Fangbo Wang <<a href="mailto:fangbowa@buffalo.edu">fangbowa@buffalo.edu</a>><br>

>>> > wrote:<br>

>>> ><br>

>>> > My problem is a solid mechanics problem using finite element method to<br>

>>> discretize the model ( a 30mX30mX30m soil domain with a building<br>

>>> structure<br>

>>> on top).<br>

>>> ><br>

>>> > I am not manually deciding which MPI process compute which matrix<br>

>>> enties. Because I know Petsc can automaticaly communicate between these<br>

>>> processors.<br>

>>> > I am just asking each MPI process generate certain number of matrix<br>

>>> entries regardless of which process will finally store them.<br>

>>><br>

>>>   The standard way to handle this for finite elements is to partition<br>

>>> the<br>

>>> elements among the processes and then partition the nodes (rows of the<br>

>>> degrees of freedom) subservient to the partitioning of the elements.<br>

>>> Otherwise most of the matrix (or vector) entries must be communicated<br>

>>> and<br>

>>> this is not scalable.<br>

>>><br>

>>>    So how are you partitioning the elements (for matrix stiffness<br>

>>> computations) and the nodes between processes?<br>

>>> ><br>

>>> > Actually, I constructed another matrix with same size but generating<br>

>>> much less entries, and the code worked. However, it gets stuck when I<br>

>>> generate more matrix entries.<br>

>>> ><br>

>>> > thank you very much! Any suggestion is highly appreciated.<br>

>>> ><br>

>>> > BTW, what is the meaning of "[4] MatCheckCompressedRow(): Found the<br>

>>> ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use<br>

>>> CompressedRow routines."? I know compressed row format is commonly used<br>

>>> for<br>

>>> sparse matrix, why don't use compressed row routines here?<br>

>>><br>

>>>   This is not important.<br>

>>><br>

>>> ><br>

>>> ><br>

>>> > Thanks,<br>

>>> ><br>

>>> ><br>

>>> > Fangbo Wang<br>

>>> ><br>

>>> ><br>

>>> ><br>

>>> > On Sun, Feb 26, 2017 at 10:42 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>><br>

>>> wrote:<br>

>>> ><br>

>>> >   How are you generating the matrix entries in parallel? In general<br>

>>> > you<br>

>>> can generate any matrix entries on any MPI process and they will be<br>

>>> automatically transferred to the MPI process that owns the entries<br>

>>> automatically. BUT if a huge number of matrix entries are computed on<br>

>>> one<br>

>>> process and need to be communicated to another process this may cause<br>

>>> gridlock with MPI. Based on the huge size of messages from process 12 it<br>

>>> looks like this is what is happening in your code.<br>

>>> ><br>

>>> >  Ideally most matrix entries are generated on the process they are<br>

>>> stored and hence this gridlock does not happen.<br>

>>> ><br>

>>> > What type of discretization are you using? Finite differences, finite<br>

>>> element, finite volume, spectral, something else? How are you deciding<br>

>>> which MPI process should compute which matrix entries? Once we<br>

>>> understand<br>

>>> this we may be able to suggest a better way to compute the entries.<br>

>>> ><br>

>>> >   Barry<br>

>>> ><br>

>>> > Under normally circumstances 1.3 million unknowns is not a large<br>

>>> parallel matrix, there may be special features of your matrix that is<br>

>>> making this difficult.<br>

>>> ><br>

>>> ><br>

>>> ><br>

>>> > > On Feb 26, 2017, at 9:30 PM, Fangbo Wang <<a href="mailto:fangbowa@buffalo.edu">fangbowa@buffalo.edu</a>><br>

>>> wrote:<br>

>>> > ><br>

>>> > > Hi,<br>

>>> > ><br>

>>> > > I construct a big matrix which is 1.3million by 1.3million which is<br>

>>> using approximatly 100GB memory. I have a computer with 500GB memory.<br>

>>> > ><br>

>>> > > I run the Petsc program and it get stuck when finally assembling the<br>

>>> matrix. The program is using around 200GB memory only. However, the<br>

>>> program<br>

>>> just get stuck there. Here is the output message when it gets stuck.<br>

>>> > > .<br>

>>> > > .<br>

>>> > > previous outputs not shown here<br>

>>> > > .<br>

>>> > > [12] MatStashScatterBegin_Ref(): No of messages: 15<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 328581552 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 163649328 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 95512224 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 317711616 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 170971776 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 254000064 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 163146720 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 345150048 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 163411584 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 739711296 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 13: size: 435247344 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 435136752 bytes<br>

>>> > > [12] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 346167552 bytes<br>

>>> > > [14] MatAssemblyBegin_MPIAIJ(): Stash has 263158893 entries, uses 14<br>

>>> mallocs.<br>

>>> > > [8] MatAssemblyBegin_MPIAIJ(): Stash has 286768572 entries, uses 14<br>

>>> mallocs.<br>

>>> > > [12] MatAssemblyBegin_MPIAIJ(): Stash has 291181818 entries, uses 14<br>

>>> mallocs.<br>

>>> > > [13] MatStashScatterBegin_Ref(): No of messages: 15<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 271636416 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 220594464 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 51041952 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 276201408 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 256952256 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 198489024 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 218657760 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 219686880 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 288874752 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 172579968 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 12: size: 639835680 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 270060144 bytes<br>

>>> > > [13] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 511244160 bytes<br>

>>> > > [13] MatAssemblyBegin_MPIAIJ(): Stash has 268522881 entries, uses 14<br>

>>> mallocs.<br>

>>> > > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage<br>

>>> space: 89786788 unneeded,7025212 used<br>

>>> > > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()<br>

>>> is 0<br>

>>> > > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81<br>

>>> > > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows<br>

>>> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.<br>

>>> > > [5] MatSeqAIJCheckInode(): Found 32271 nodes of 96812. Limit used:<br>

>>> > > 5.<br>

>>> Using Inode routines<br>

>>> > > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage<br>

>>> space: 89841924 unneeded,6970076 used<br>

>>> > > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()<br>

>>> is 0<br>

>>> > > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81<br>

>>> > > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows<br>

>>> 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.<br>

>>> > > [4] MatSeqAIJCheckInode(): Found 32272 nodes of 96812. Limit used:<br>

>>> > > 5.<br>

>>> Using Inode routines<br>

>>> > ><br>

>>> > > stuck here!!!!<br>

>>> > ><br>

>>> > ><br>

>>> > > Any one have ideas on this? Thank you very much!<br>

>>> > ><br>

>>> > ><br>

>>> > ><br>

>>> > > Fangbo Wang<br>

>>> > ><br>

>>> > ><br>

>>> > ><br>

>>> > > --<br>

>>> > > Fangbo Wang, PhD student<br>

>>> > > Stochastic Geomechanics Research Group<br>

>>> > > Department of Civil, Structural and Environmental Engineering<br>

>>> > > University at Buffalo<br>

>>> > > Email: <a href="mailto:fangbowa@buffalo.edu">fangbowa@buffalo.edu</a><br>

>>> ><br>

>>> ><br>

>>> ><br>

>>> ><br>

>>> > --<br>

>>> > Fangbo Wang, PhD student<br>

>>> > Stochastic Geomechanics Research Group<br>

>>> > Department of Civil, Structural and Environmental Engineering<br>

>>> > University at Buffalo<br>

>>> > Email: <a href="mailto:fangbowa@buffalo.edu">fangbowa@buffalo.edu</a><br>

>>><br>

>>><br>

>><br>

<span class="HOEnZb"><font color="#888888">>><br>

>> --<br>

>> Fangbo Wang, PhD student<br>

>> Stochastic Geomechanics Research Group<br>

>> Department of Civil, Structural and Environmental Engineering<br>

>> University at Buffalo<br>

>> Email: *<a href="mailto:fangbowa@buffalo.edu">fangbowa@buffalo.edu</a> <<a href="mailto:fangbowa@buffalo.edu">fangbowa@buffalo.edu</a>>*<br>

>><br>

><br>

</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>

</div></div>