[petsc-users] Various Questions Regarding PETSC
Mohammed Mostafa
mo7ammedmostafa at gmail.com
Thu Jul 18 02:01:56 CDT 2019
Hello everyone,
Since I already established a baseline to compare the cost of inserting
values in PetscMatrix.
And based on the hint of the number of values inserted in the matrix each
time
> 2) Can you tell me how many values are inserted?
I took a look at the source code for "MatSetValues_MPIAIJ" and found that
it seems to be designed for
FEM assembly of the global matrix from element matrices(from what I
remember from undergrad) since it requires for the setting of multiple rows
that the col indices should be the same
So to increase the number of inserted values I need to modify the
implementation of MatSetValues_MPIAIJ to allow for more values to be
inserted
So I made a copy of the function "MatSetValues_MPIAIJ" in "
src/mat/impls/aij/mpi/mpiaij.c
<https://github.com/petsc/petsc/blob/a8158fb5a3692ffc34efd85261445929d2a1359c/src/mat/impls/aij/mpi/mpiaij.c>"
and named it "MatSetValues2_MPIAIJ"
I made some minor changes to allow for inserting multiple rows regardless
of whether they have the same col indices
So what I do now is store the data for multiple rows and then insert them
all together and I figured I would see how the performance would be.
I tried different number of rows to be inserted i.e.* nrow_buffer = [2, 5,
10, 20, 50, 100]*
So now instead of calling "MatSetValues" for every row in the matrix , I
call *"MatSetValues2_MPIAIJ"* every (nrow_buffer)th which should allow for
some performance improvement
the results are as follows
First Remember that before
1-computation and insertion into petsc matrix
> FillPetscMat_with_MatSetValues 100 1.0 3.8820e+00 1.1
> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23 0 0 0 0 * 96 * 0 0 0 0
> 0
>
2-computation and insertion into eigen matrix
> FilEigenMat 100 1.0
> 2.8727e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18 0 0 0 0 88 0
> 0 0 0 0
Now
nrow_buffer = 2
> FillPetscMat_with_MatSetValues2 100 1.0 3.3321e+00 1.1
> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 20 0 0 0 0 95 0 0 0 0 0
nrow_buffer = 5
> FillPetscMat_with_MatSetValues2 100 1.0 2.8842e+00 1.1
> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17 0 0 0 0 94 0 0 0 0 0
nrow_buffer = 10
> FillPetscMat_with_MatSetValues2 100 1.0 2.7669e+00 1.1
> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17 0 0 0 0 93 0 0 0 0 0
nrow_buffer = 20
> FillPetscMat_with_MatSetValues2 100 1.0 2.6834e+00 1.0
> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 0 93 0 0 0 0 0
nrow_buffer = 50
> FillPetscMat_with_MatSetValues2 100 1.0 2.6862e+00 1.1
> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17 0 0 0 0 93 0 0 0 0 0
nrow_buffer = 100
> FillPetscMat_with_MatSetValues2 100 1.0 2.6170e+00 1.0
> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 0 93 0 0 0 0 0
*As to be expected,* with increasing the number of rows to be inserted the
overhead reduces until it basically stagnates somewhere between 20-50
The modifications I made based on *MatSetValues_MPIAIJ* are very small but
the effect is significant ( drop in insertion cost by 33%) and it is now
even faster than Eigen(baseline) at insertion with my naive usage.
For now I am quite satisfied with the outcome. There is probably some room
for improvement but for now this is enough.
Thanks,
Kamra
On Thu, Jul 18, 2019 at 12:34 AM Mohammed Mostafa <mo7ammedmostafa at gmail.com>
wrote:
> Regarding the first point
>>
>> 1) Are you timing only the insertion of values, or computation and
>> insertion?
>
> I am timing both, the computation and insertion of values but as I said I
> timed three scenarios
> 1-computation only and no insertion
> Computation_no_insertion 100 1.0 1.6747e-01 1.2
> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 22 0 0 0 0 0
> 2-computation and insertion into petsc matrix
>
>> FillPetscMat_with_MatSetValues 100 1.0 3.8820e+00 1.1
>> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23 0 0 0 0 * 96 * 0 0 0 0
>> 0
>>
> 3-computation and insertion into eigen matrix
>
>> FilEigenMat 100 1.0
>> 2.8727e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18 0 0 0 0 88 0
>> 0 0 0 0
>
> I timed 100 times to get a reasonably accurate timings
>
> as for the second point
>
>> 2) Can you tell me how many values are inserted?
>
> For a total of nearly 186062 rows per process (with 6 processes in total,
> the matrix global size is 1116376)
> In most rows ( about 99.35%) 4 non-zeros per rows and in the remaining
> 0.35% 2 or 3 non-zeros per row
> the number of off-diagonal onnz in total is 648 nnz
> So I insert nearly 4 values 186062 times ~= 744248 times per mpi process
>
>
> Thanks,
> Kamra
>
> On Wed, Jul 17, 2019 at 11:59 PM Matthew Knepley <knepley at gmail.com>
> wrote:
>
>> On Wed, Jul 17, 2019 at 8:51 AM Mohammed Mostafa <
>> mo7ammedmostafa at gmail.com> wrote:
>>
>>> Sorry for the confusion
>>> First I fully acknowledge that setting Matrix non-zeros or copying in
>>> general is not cheap and memory access pattern can play an important role.
>>> So to establish a baseline to compare with, I tried setting the same
>>> matrix but in an Eigen Sparse Matrix and the timings are as follows
>>> FillPetscMat_with_MatSetValues 100 1.0 3.8820e+00 1.1
>>> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23 0 0 0 0 * 96 * 0 0 0 0
>>> 0
>>> FilEigenMat 100 1.0
>>> 2.8727e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18 0 0 0 0 88 0
>>> 0 0 0 0
>>>
>>
>> Great. This helps. Two things would help me narrow down what is happening.
>>
>> 1) Are you timing only the insertion of values, or computation and
>> insertion?
>>
>> 2) Can you tell me how many values are inserted?
>>
>> Thanks,
>>
>> Matt
>>
>>
>>> I used the same code but simply filled a different Matrix something like
>>>
>>> for ( i =0; i < nRows;i++)
>>> {
>>> //
>>> .......
>>> // Some code to get j_index, coefvalues
>>> // Method1
>>> MatSetValues(A, 1, &cell_global_index, nnz_per_row, j_index, coefvalues,
>>> INSERT_VALUES);
>>>
>>> //Method2
>>> for ( int k = 0;k < nnz_per_row; k++)
>>> EigenMat.coeffRef(i, j_index[k] ) = coefvalues[k];
>>>
>>> }
>>> Please note that only one of the two methods is being used at a time.
>>> Also, I separately time the code section used to < j_index, coefvalues>
>>> but simpling disabling both Method1 and Method2.
>>> I found the cost to be trivial in comparison to when either one of the
>>> methods is used.
>>> I used Eigen out of convenience since I used for some vector and tensor
>>> arithmetics somewhere else in the code and it may not be the best choice.
>>> Since in PetscMatrix we technically fill two matrices: diagonal and
>>> off-diagonal so I expected some difference but is that normal or am I
>>> missing something. ?
>>> Maybe some setting or MatOption I should be using so far this what I
>>> have been using
>>>
>>> MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD, local_size, local_size,
>>> PETSC_DETERMINE,
>>> PETSC_DETERMINE, ptr, j , v, A);
>>> MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE);
>>> MatSetOption(A,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE);
>>> MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE);
>>> MatSetOption(A,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE);
>>> MatSetOption(A,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);
>>> MatSetOption(A,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE);
>>> MatSetUp(A);
>>> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>>> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>>>
>>> Thanks,
>>> Kamra
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190718/46cdbc8e/attachment-0001.html>
More information about the petsc-users
mailing list