<div dir="ltr">Another approach that might be simple, if you have the metadata for the entire mesh locally, is set up a list of elements that your local matrix block-rows/vertices touch but going over all the elements and test if any of its vertices i are: if (i >= start && i < end) list.append(i). Just compute and assemble those elements and tell PETSc to ignore-off-processor-entries. No communication, redundant local work, some setup code and cost.</div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Feb 26, 2017 at 11:37 PM, Fangbo Wang <span dir="ltr"><<a href="mailto:fangbowa@buffalo.edu" target="_blank">fangbowa@buffalo.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I got my finite element mesh from a commercial finite element software ABAQUS. I simply draw the geometry of the model in the graphical interface and assign element types and material properties to different parts of the model, ABAQUS will automatically output the element and node information of the model.<div><br></div><div>Suppose I have 1000 elements in my model and 10 MPI processes, </div><div>#1 to #100 local element matrices will be computed in MPI process 0;</div><div>#101 to #200 local element matrices will be computed in MPI process 1;</div><div>#201 to #300 local element matrices will be computed in MPI process 2;<br></div><div>..........</div><div>#901 to #1000 local element matrices will be computed in MPI process 9;<br></div><div><br></div><div><br></div><div>However, I might get a lot of global matrix indices which I need to send to other processors due to the degree of freedom ordering in the finite element model.</div><div><br></div><div>This is what I did according to my understanding of finite element and what I have seen. </div><div>Do you have some nice libraries or packages that can be easily used in scientific computing environment?</div><span class=""><div><br></div><div>Thank you very much!</div><div><br></div><div><br></div><div><br></div><div>Fangbo Wang</div><div><br></div><div><br></div><div><br></div></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Feb 26, 2017 at 11:15 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><br>

> On Feb 26, 2017, at 10:04 PM, Fangbo Wang <<a href="mailto:fangbowa@buffalo.edu" target="_blank">fangbowa@buffalo.edu</a>> wrote:<br>

><br>

> My problem is a solid mechanics problem using finite element method to discretize the model ( a 30mX30mX30m soil domain with a building structure on top).<br>

><br>

> I am not manually deciding which MPI process compute which matrix enties. Because I know Petsc can automaticaly communicate between these processors.<br>

> I am just asking each MPI process generate certain number of matrix entries regardless of which process will finally store them.<br>

<br>

</span>  The standard way to handle this for finite elements is to partition the elements among the processes and then partition the nodes (rows of the degrees of freedom) subservient to the partitioning of the elements. Otherwise most of the matrix (or vector) entries must be communicated and this is not scalable.<br>

<br>

   So how are you partitioning the elements (for matrix stiffness computations) and the nodes between processes?<br>

<span>><br>

> Actually, I constructed another matrix with same size but generating much less entries, and the code worked. However, it gets stuck when I generate more matrix entries.<br>

><br>

> thank you very much! Any suggestion is highly appreciated.<br>

><br>

> BTW, what is the meaning of "[4] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines."? I know compressed row format is commonly used for sparse matrix, why don't use compressed row routines here?<br>

<br>

</span>  This is not important.<br>

<div class="m_5018623940652783117HOEnZb"><div class="m_5018623940652783117h5"><br>

><br>

><br>

> Thanks,<br>

><br>

><br>

> Fangbo Wang<br>

><br>

><br>

><br>

> On Sun, Feb 26, 2017 at 10:42 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>

><br>

>   How are you generating the matrix entries in parallel? In general you can generate any matrix entries on any MPI process and they will be automatically transferred to the MPI process that owns the entries automatically. BUT if a huge number of matrix entries are computed on one process and need to be communicated to another process this may cause gridlock with MPI. Based on the huge size of messages from process 12 it looks like this is what is happening in your code.<br>

><br>

>  Ideally most matrix entries are generated on the process they are stored and hence this gridlock does not happen.<br>

><br>

> What type of discretization are you using? Finite differences, finite element, finite volume, spectral, something else? How are you deciding which MPI process should compute which matrix entries? Once we understand this we may be able to suggest a better way to compute the entries.<br>

><br>

>   Barry<br>

><br>

> Under normally circumstances 1.3 million unknowns is not a large parallel matrix, there may be special features of your matrix that is making this difficult.<br>

><br>

><br>

><br>

> > On Feb 26, 2017, at 9:30 PM, Fangbo Wang <<a href="mailto:fangbowa@buffalo.edu" target="_blank">fangbowa@buffalo.edu</a>> wrote:<br>

> ><br>

> > Hi,<br>

> ><br>

> > I construct a big matrix which is 1.3million by 1.3million which is using approximatly 100GB memory. I have a computer with 500GB memory.<br>

> ><br>

> > I run the Petsc program and it get stuck when finally assembling the matrix. The program is using around 200GB memory only. However, the program just get stuck there. Here is the output message when it gets stuck.<br>

> > .<br>

> > .<br>

> > previous outputs not shown here<br>

> > .<br>

> > [12] MatStashScatterBegin_Ref(): No of messages: 15<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 328581552 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 163649328 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 95512224 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 317711616 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 170971776 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 254000064 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 163146720 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 345150048 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 163411584 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 739711296 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 13: size: 435247344 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 435136752 bytes<br>

> > [12] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 346167552 bytes<br>

> > [14] MatAssemblyBegin_MPIAIJ(): Stash has 263158893 entries, uses 14 mallocs.<br>

> > [8] MatAssemblyBegin_MPIAIJ(): Stash has 286768572 entries, uses 14 mallocs.<br>

> > [12] MatAssemblyBegin_MPIAIJ(): Stash has 291181818 entries, uses 14 mallocs.<br>

> > [13] MatStashScatterBegin_Ref(): No of messages: 15<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 271636416 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 1: size: 271636416 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 2: size: 220594464 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 3: size: 51041952 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 4: size: 276201408 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 5: size: 256952256 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 6: size: 198489024 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 7: size: 218657760 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 8: size: 219686880 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 9: size: 288874752 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 10: size: 428874816 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 11: size: 172579968 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 12: size: 639835680 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 14: size: 270060144 bytes<br>

> > [13] MatStashScatterBegin_Ref(): Mesg_to: 15: size: 511244160 bytes<br>

> > [13] MatAssemblyBegin_MPIAIJ(): Stash has 268522881 entries, uses 14 mallocs.<br>

> > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage space: 89786788 unneeded,7025212 used<br>

> > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>

> > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81<br>

> > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.<br>

> > [5] MatSeqAIJCheckInode(): Found 32271 nodes of 96812. Limit used: 5. Using Inode routines<br>

> > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 96812 X 96812; storage space: 89841924 unneeded,6970076 used<br>

> > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>

> > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81<br>

> > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 96812) < 0.6. Do not use CompressedRow routines.<br>

> > [4] MatSeqAIJCheckInode(): Found 32272 nodes of 96812. Limit used: 5. Using Inode routines<br>

> ><br>

> > stuck here!!!!<br>

> ><br>

> ><br>

> > Any one have ideas on this? Thank you very much!<br>

> ><br>

> ><br>

> ><br>

> > Fangbo Wang<br>

> ><br>

> ><br>

> ><br>

> > --<br>

> > Fangbo Wang, PhD student<br>

> > Stochastic Geomechanics Research Group<br>

> > Department of Civil, Structural and Environmental Engineering<br>

> > University at Buffalo<br>

> > Email: <a href="mailto:fangbowa@buffalo.edu" target="_blank">fangbowa@buffalo.edu</a><br>

><br>

><br>

><br>

><br>

> --<br>

> Fangbo Wang, PhD student<br>

> Stochastic Geomechanics Research Group<br>

> Department of Civil, Structural and Environmental Engineering<br>

> University at Buffalo<br>

> Email: <a href="mailto:fangbowa@buffalo.edu" target="_blank">fangbowa@buffalo.edu</a><br>

<br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="m_5018623940652783117gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><font size="1" color="#9900ff">Fangbo Wang, PhD student</font></div><div><font size="1" color="#9900ff">Stochastic Geomechanics Research Group</font></div><div><span style="color:rgb(153,0,255);font-size:x-small">Department of Civil, Structural and Environmental Engineering</span></div><div><font size="1" color="#9900ff">University at Buffalo</font></div><div><font size="1" color="#9900ff">Email: <u><a href="mailto:fangbowa@buffalo.edu" target="_blank">fangbowa@buffalo.edu</a></u></font></div></div></div></div></div>

</div>

</div></div></blockquote></div><br></div>