<div dir="ltr">Hello Mark,<br><br>Following your comments,<br>I did run with '-info' and the outputs are as below <br><img src="cid:ii_lbkwgimt0" alt="image.png" width="535" height="75" style="margin-right: 0px;"><br><div>Global matrix seem to have preallocated well enough</div><div>And, ass I said earlier in the former email, If I run this code with mpi , It will be 70,000secs..</div><div>In this case, What is the problem?</div><div><br></div><div>And my loops already iterates over elements. </div><div>element_vec is just 1~125,000 array. for getting proper element index in each process.</div><div>That example code is just simple schematic of my code.</div><div><br></div><div>Thanks,</div><div>Hyung Kim</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">2022년 12월 12일 (월) 오후 10:24, Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>>님이 작성:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Hyung,<div><br></div><div>First, verify that you are preallocating correctly.</div><div>Run with '-info' and grep on "alloc" in the large output that you get.</div><div>You will see lines like "number of mallocs in assembly: 0". You want 0.</div><div>Do this with one processor and the 8.</div><div><br></div><div>I don't understand your loop. You are iterating over vertices. You want to iterate over elements. </div><div><br></div><div>Mark</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Dec 12, 2022 at 6:16 AM 김성익 <<a href="mailto:ksi2443@gmail.com" target="_blank">ksi2443@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hello,<div><br></div><div><br></div><div>I need some keyword or some examples for parallelizing matrix assemble process.</div><div><br></div><div>My current state is as below.</div><div>- Finite element analysis code for Structural mechanics.</div><div>- problem size : 3D solid hexa element (number of elements : 125,000), number of degree of freedom : 397,953</div><div>- Matrix type : seqaij, matrix set preallocation by using MatSeqAIJSetPreallocation</div><div>- Matrix assemble time by using 1 core : 120 sec<br>   for (int i=0; i<125000; i++) {</div><div>    ~~ element matrix calculation}</div><div>   matassemblybegin</div><div>   matassemblyend</div><div>- Matrix assemble time by using 8 core : 70,234sec</div><div>  int start, end;</div><div>  VecGetOwnershipRange( element_vec, &start, &end);</div><div>  for (int i=start; i<end; i++){</div><div>   ~~ element matrix calculation</div><div>   matassemblybegin</div><div>   matassemblyend</div><div><br></div><div><br></div><div>As you see the state, the parallel case spent a lot of time than sequential case..</div><div>How can I speed up in this case?</div><div>Can I get some keyword or examples for parallelizing assembly of matrix in finite element analysis ?</div><div><br></div><div>Thanks,</div><div>Hyung Kim</div><div><br></div></div>

</blockquote></div>

</blockquote></div>