[petsc-users] Efficiency of different choice of local rows for MPIAIJ * MPIDENSE

hong at aspiritech.org hong at aspiritech.org
Tue Jul 2 09:09:21 CDT 2019


Ian:

> Thanks for you suggestion. I have not implemented it yet as the
> re-distribution might involve some code changes to other part of the code,
> and I am not sure if that worth it. If the computation cost is mainly
> dominated by the distribution of the dense matrix and the efficiency won’t
> gain much, we might just avoid introducing this change. Currently, I
> printed out the nnzs owned by each processor, and the most one owns 60000
> nnzs, and the least own owns 10000 nnzs, where the dimension of the matrix
> is 350000*350000. Do you have suggestions on the best approach?
>

The job balance also depends on many other factors, e.g., the connectivity
of the matrix elements.
I would start from the existing code, run it with '-log_view' and see the
performance.
Hong

> hong at aspiritech.org 於 2019年7月1日 21:56 寫道:
>
> Ian:
> PETSc implementation of C = A*B requires C has same row ownership as A.
> I believe the distribution will be dominated by the dense matrices B and
> C, not sparse matrices A. Have you implemented C = A*B and logged
> performance?
> Hong
>
> Hi,
>>
>> I am recently trying to do matrix multiplication for C = A*B, where A is
>> a sparse matrix MPIAIJ, C and B are created as dense matrices MPIDENSE.
>>
>> In matrix A, the nonzeros are not distributed evenly across the
>> processor, meaning that if using the default setting to let each processor
>> own similar number of rows, the number of nonzeros owned by each processor
>> will be significantly different. So I want to use different number of local
>> rows for each processor. In this case, does the MPIDense matrices B and C
>> need to be in the same row-layout as A?
>>
>> I mean, is something like the following is doable (A owns 3 rows and B, C
>> own 2 rows)
>>
>>
>>            A                    B          C
>> P0  o o o o | o.         o o.       o o
>>       o o o o | o          o o        o o
>>       o o o o | o.    *.   -----  =   ----
>>      ---------------         o o        o o
>> P1  o o o o | o          o o        o o
>>
>> In this case, the entries can be evenly distributed for B and C thus more
>> memory efficient.
>>
>> But I am not sure would this make communication more complicated thus
>> slow down the overall wall time. How would you recommend to do?
>> a) Let rows of A and B be both evenly distributed
>> b) Let A have different rows layout, and B, C evenly distributed
>> c) Let A have different rows layout, and B, C follow A
>>
>> Or maybe other better way that I did not think about.
>>
>> Thanks a lot for your help,
>> Ian
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190702/f405a76e/attachment-0001.html>


More information about the petsc-users mailing list