[petsc-users] Various Questions Regarding PETSC

Smith, Barry F. bsmith at mcs.anl.gov
Thu Jul 18 19:39:10 CDT 2019


   There are a lot of moving parts here so I'm going to ask a few questions to make sure I understand the situation

1)  Times
     a) You run the code that generates the matrix entries and simply comment out the calls to MatSetValues() and it takes 
          .16747 seconds. 
      b) When you include the MatSetValue calls it takes 3.8820. 
      c)  When you instead insert into an eigen matrix (which is also parallel?)
          it takes 2.8727 secs.
      d)  When you in your code put all the values into a CSR matrix and then call MatUpdateMPIAIJWithArrays() it takes
           2.8727 secs + .20326 secs.
      e) When you modify MatSetValues_MPIAIJ into MatSetValues_MPIAIJ2() that allows multiple rows that do not need matching 
           columns it takes 2.6170 secs.

1e) When you use MatSetValues_MPIAIJ2() are you directly calling MatSetValues_MPIAIJ2() or are you still calling MatSetValues() and
       having it call MatSetValues_MPIAIJ2()? That is did you replace the setvalues() pointer in the function table? 

1*) For the cases b,c, and e are calling MatSetValues*() multiple times for the same row or exactly once for a given row?

2) You have done the initial matrix preallocation with MatCreateMPIAIJWithArrays(); and so have perfect preallocation, that is when you run with
    -info and grep for malloc it says 0 mallocs are performed?

3) Are you using ADD_VALUES or INSERT_VALUES?

4) In each MatSetValues() call are you setting values into all nonzero locations of that row, or only a subset of nonzero locations in that row? 
    Are the column indices for that row you pass in monotonically increasing or not?

5) Did you ./configure with --with-debugging=0?

 Based on this information I think we will understand the situation better and may have suggestions on how to proceed.

  Thanks, and thanks for your patience as you work to get the MatSetValues() time under control

  Barry





> On Jul 18, 2019, at 2:01 AM, Mohammed Mostafa via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Hello everyone,
> Since I already established a baseline to compare the cost of inserting values in PetscMatrix.
> And based on the hint of the number of values inserted in the matrix each time
>  2) Can you tell me how many values are inserted? 
> I took a look at the source code for "MatSetValues_MPIAIJ" and found that it seems to be designed for 
> FEM assembly of the global matrix from element matrices(from what I remember from undergrad) since it requires for the setting of multiple rows that the col indices should be the same
> 
> So to increase the number of inserted values I need to modify the implementation of  MatSetValues_MPIAIJ to allow for more values to be inserted 
> So I made a copy of the function "MatSetValues_MPIAIJ" in "src/mat/impls/aij/mpi/mpiaij.c" and named it "MatSetValues2_MPIAIJ"
> I made some minor changes to allow for inserting multiple rows regardless of whether they have the same col indices
> 
> So what I do now is store the data for multiple rows and then insert them all together and I figured I would see how the performance would be.
> I tried different number of rows to be inserted i.e. nrow_buffer = [2, 5, 10, 20, 50, 100]
> So now instead of calling   "MatSetValues" for every row in the matrix , I call "MatSetValues2_MPIAIJ" every  (nrow_buffer)th which should allow for some performance improvement
> the results are as follows
> First Remember that before
> 1-computation and insertion into petsc matrix
> FillPetscMat_with_MatSetValues              100 1.0 3.8820e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23  0  0  0  0  96  0  0  0  0     0 
> 2-computation and insertion into eigen matrix 
> FilEigenMat                                               100 1.0 2.8727e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18  0  0  0  0  88  0  0  0  0     0    
> 
> Now
> nrow_buffer = 2
> FillPetscMat_with_MatSetValues2                  100 1.0 3.3321e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 20  0  0  0  0  95  0  0  0  0     0
> nrow_buffer = 5
> FillPetscMat_with_MatSetValues2                  100 1.0 2.8842e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17  0  0  0  0  94  0  0  0  0     0
> nrow_buffer = 10
> FillPetscMat_with_MatSetValues2                  100 1.0 2.7669e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17  0  0  0  0  93  0  0  0  0     0
> nrow_buffer = 20
> FillPetscMat_with_MatSetValues2                  100 1.0 2.6834e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 16  0  0  0  0  93  0  0  0  0     0
> nrow_buffer = 50
> FillPetscMat_with_MatSetValues2                  100 1.0 2.6862e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17  0  0  0  0  93  0  0  0  0     0
> nrow_buffer = 100
> FillPetscMat_with_MatSetValues2                  100 1.0 2.6170e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 16  0  0  0  0  93  0  0  0  0     0
> 
> As to be expected, with increasing the number of rows to be inserted the overhead reduces until it basically stagnates somewhere between 20-50 
> The modifications I made based on MatSetValues_MPIAIJ are very small but the effect is significant ( drop in insertion cost by 33%) and it is now even faster than Eigen(baseline) at insertion with my naive usage.
> For now I am quite satisfied with the outcome. There is probably some room for improvement but for now this is enough.
>  
> Thanks,
> Kamra
> 
> On Thu, Jul 18, 2019 at 12:34 AM Mohammed Mostafa <mo7ammedmostafa at gmail.com> wrote:
> Regarding the first point
> 1) Are you timing only the insertion of values, or computation and insertion?  
> I am timing both, the computation and insertion of values but as I said I timed three scenarios
> 1-computation only and no insertion
> Computation_no_insertion                            100 1.0 1.6747e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0  22  0  0  0  0     0
> 2-computation and insertion into petsc matrix
> FillPetscMat_with_MatSetValues              100 1.0 3.8820e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23  0  0  0  0  96  0  0  0  0     0 
> 3-computation and insertion into eigen matrix 
> FilEigenMat                                               100 1.0 2.8727e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18  0  0  0  0  88  0  0  0  0     0    
> I timed 100 times to get a reasonably accurate timings
> 
> as for the second point
>  2) Can you tell me how many values are inserted? 
> For a total of nearly 186062 rows per process (with  6 processes in total, the matrix global size is 1116376)
> In most rows ( about 99.35%)  4 non-zeros per rows and in the remaining 0.35% 2 or 3 non-zeros per row
> the number of off-diagonal onnz in total is 648 nnz 
> So I insert nearly 4 values 186062 times ~= 744248 times per mpi process
> 
> 
> Thanks,
> Kamra
> 
> On Wed, Jul 17, 2019 at 11:59 PM Matthew Knepley <knepley at gmail.com> wrote:
> On Wed, Jul 17, 2019 at 8:51 AM Mohammed Mostafa <mo7ammedmostafa at gmail.com> wrote:
> Sorry for the confusion 
> First I fully acknowledge that setting Matrix non-zeros or copying in general is not cheap and memory access pattern can play an important role.
> So to establish a baseline to compare with, I tried setting the same matrix but in an Eigen Sparse Matrix  and the timings are as follows
> FillPetscMat_with_MatSetValues              100 1.0 3.8820e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23  0  0  0  0  96  0  0  0  0     0 
> FilEigenMat                                               100 1.0 2.8727e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18  0  0  0  0  88  0  0  0  0     0  
> 
> Great. This helps. Two things would help me narrow down what is happening.
> 
>   1) Are you timing only the insertion of values, or computation and insertion?
> 
>   2) Can you tell me how many values are inserted?
> 
>   Thanks,
> 
>     Matt
>  
> I used the same code but simply filled a different Matrix something like
> 
> for ( i =0; i < nRows;i++)
> {
> //
> .......
> // Some code to get  j_index, coefvalues
> // Method1
> MatSetValues(A, 1, &cell_global_index, nnz_per_row, j_index, coefvalues, INSERT_VALUES); 
> 
> //Method2
> for ( int k = 0;k < nnz_per_row; k++)
>      EigenMat.coeffRef(i, j_index[k] ) = coefvalues[k];
> 
>  }
> Please note that only one of the two methods is being used at a time. Also, I separately time the code section used to <  j_index,   coefvalues> but simpling disabling both Method1 and Method2.
> I found the cost to be trivial in comparison to when either one of the methods is used.
> I used Eigen out of convenience since I used for some vector and tensor arithmetics somewhere else in the code and it may not be the best choice.
> Since in PetscMatrix we technically fill two matrices: diagonal and off-diagonal so I expected some difference but is that normal or am I missing something. ?
> Maybe some setting or MatOption I should be using so far this what I have been using
> 
> MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD, local_size, local_size, PETSC_DETERMINE,
> PETSC_DETERMINE, ptr, j , v, A);
>  MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE);
> MatSetOption(A,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE);
> MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE);
> MatSetOption(A,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE);
> MatSetOption(A,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);
> MatSetOption(A,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE);
> MatSetUp(A);
> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);  
> 
> Thanks,
> Kamra
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/



More information about the petsc-users mailing list