<font><span style="font-family:arial,helvetica,sans-serif;color:rgb(31,73,125)"><font color="#000000">Hi,<br><br>I am working on FEM codes with spline-based element type. For 3D case, one element has 64 nodes and every two neighboring elements share 48 nodes. Thus regardless how I partition a mesh, there are still very large number of entries that have to write on the 'wrong' processor. And my code is running on clusters, the processes are sending between 550 and 620 Million packets per second across the network. My code seems IO-bound at this moment and just get stuck at the matrix assembly stage. A -info file is attached. </font></span><span style="font-family:arial,helvetica,sans-serif"> Do I have other options to optimize my codes to be less io-intensive? <br>
<br>Thanks in advance. <br></span></font><br>[0] VecAssemblyBegin_MPI(): Stash has 210720 entries, uses 12 mallocs.<br>[0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.<br>[5] MatAssemblyBegin_MPIAIJ(): Stash has 4806656 entries, uses 8 mallocs.<br>
[6] MatAssemblyBegin_MPIAIJ(): Stash has 5727744 entries, uses 9 mallocs.<br>[4] MatAssemblyBegin_MPIAIJ(): Stash has 5964288 entries, uses 9 mallocs.<br>[7] MatAssemblyBegin_MPIAIJ(): Stash has 7408128 entries, uses 9 mallocs.<br>
[3] MatAssemblyBegin_MPIAIJ(): Stash has 8123904 entries, uses 9 mallocs.<br>[2] MatAssemblyBegin_MPIAIJ(): Stash has 11544576 entries, uses 10 mallocs.<br>[0] MatStashScatterBegin_Private(): No of messages: 1 <br>[0] MatStashScatterBegin_Private(): Mesg_to: 1: size: 107888648 <br>
[0] MatAssemblyBegin_MPIAIJ(): Stash has 13486080 entries, uses 10 mallocs.<br>[1] MatAssemblyBegin_MPIAIJ(): Stash has 16386048 entries, uses 10 mallocs.<br>[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 unneeded,2514537 used<br>
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>[0] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode routines<br>
[5] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0 unneeded,2525390 used<br>[5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>
[5] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using Inode routines<br>[3] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 unneeded,2500281 used<br>[3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
[3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>[3] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode routines<br>[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 unneeded,2500281 used<br>
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>[1] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode routines<br>
[4] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 unneeded,2500281 used<br>[4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>
[4] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode routines<br>[2] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 unneeded,2525733 used<br>[2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
[2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>[2] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode routines<br><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"></span>