<div dir="ltr"><div dir="ltr">Ian:</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div>Thanks for you suggestion. I have not implemented it yet as the re-distribution might involve some code changes to other part of the code, and I am not sure if that worth it. If the computation cost is mainly dominated by the distribution of the dense matrix and the efficiency won’t gain much, we might just avoid introducing this change. Currently, I printed out the nnzs owned by each processor, and the most one owns 60000 nnzs, and the least own owns 10000 nnzs, where the dimension of the matrix is 350000*350000. Do you have suggestions on the best approach?</div></div></blockquote><div> </div><div>The job balance also depends on many other factors, e.g., the connectivity of the matrix elements. </div><div>I would start from the existing code, run it with '-log_view' and see the performance. </div><div>Hong</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><blockquote type="cite"><div><a href="mailto:hong@aspiritech.org" target="_blank">hong@aspiritech.org</a> 於 2019年7月1日 21:56 寫道:</div><br class="gmail-m_1199215854730841668Apple-interchange-newline"><div><div dir="ltr"><div dir="ltr">Ian:<br></div><div>PETSc implementation of C = A*B requires C has same row ownership as A.</div><div>I believe the distribution will be dominated by the dense matrices B and C, not sparse matrices A. Have you implemented C = A*B and logged performance?</div><div>Hong</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
I am recently trying to do matrix multiplication for C = A*B, where A is a sparse matrix MPIAIJ, C and B are created as dense matrices MPIDENSE.<br>
<br>
In matrix A, the nonzeros are not distributed evenly across the processor, meaning that if using the default setting to let each processor own similar number of rows, the number of nonzeros owned by each processor will be significantly different. So I want to use different number of local rows for each processor. In this case, does the MPIDense matrices B and C need to be in the same row-layout as A?<br>
<br>
I mean, is something like the following is doable (A owns 3 rows and B, C own 2 rows)<br>
<br>
<br>
A B C<br>
P0 o o o o | o. o o. o o<br>
o o o o | o o o o o<br>
o o o o | o. *. ----- = ----<br>
--------------- o o o o<br>
P1 o o o o | o o o o o<br>
<br>
In this case, the entries can be evenly distributed for B and C thus more memory efficient.<br>
<br>
But I am not sure would this make communication more complicated thus slow down the overall wall time. How would you recommend to do? <br>
a) Let rows of A and B be both evenly distributed<br>
b) Let A have different rows layout, and B, C evenly distributed <br>
c) Let A have different rows layout, and B, C follow A<br>
<br>
Or maybe other better way that I did not think about.<br>
<br>
Thanks a lot for your help, <br>
Ian</blockquote></div></div>
</div></blockquote></div><br></div></blockquote></div></div>