[petsc-users] Various Questions Regarding PETSC

Smith, Barry F. bsmith at mcs.anl.gov
Fri Jul 19 18:59:45 CDT 2019


  Thanks, and special thanks for answering every question. 

  In runs did you zero the matrix before call the MatSetValues() initially?


> MatSetValues() 
> FillPetscMat     3.6594e+00
>  MatSetValues2_MPIAIJ() with an nrow_buffer = 1
> FillPetscMat     13.3920e+00
> nrow_buffer = 2
> FillPetscMat_with_MatSetValues2                  3.3321e+00
> nrow_buffer = 5
> FillPetscMat_with_MatSetValues2                  2.8842e+00
> nrow_buffer = 10
> FillPetscMat_with_MatSetValues2                  2.7669e+00
> nrow_buffer = 20
> FillPetscMat_with_MatSetValues2                  2.6834e+00
> nrow_buffer = 50
> FillPetscMat_with_MatSetValues2                  2.6862e+00
> nrow_buffer = 100
> FillPetscMat_with_MatSetValues2                  2.6170e+00
> 

The call to MatSetValues() has a little bit of checking and then another call to MatSetValues_MPIAIJ() so it is not surprising that going directly 
to MatSetValues_MPIAIJ2() saves you a bit but it is a large savings than I would expect.

I am greatly puzzled by the dramatic savings you get as you pass more rows to MatSetValues2. As far as I can see all you are saving is a function call, not much of anything else and that would NOT explain the huge time savings (functions calls are extremely cheap compared to .5 seconds. The multi row MatSetValues2() still has to do the same processing as with one call per row so why so much faster?

What kind of a processor is this? 

Would you be able to run the code under gprof, vtune, instruments or something profiling package that gives line by line information about time being spent. In particular I'd like to see the results for MatSetValues(), MatSetValues2() with 1 row, with 2, rows and with 4 rows. 



Thanks

   Barry




> On Jul 19, 2019, at 1:34 AM, Mohammed Mostafa <mo7ammedmostafa at gmail.com> wrote:
> 
> 1)  Times
>      a) You run the code that generates the matrix entries and simply comment out the calls to MatSetValues() and it takes 
>           .16747 seconds.
> Yes 
>  
>       b) When you include the MatSetValue calls it takes 3.8820. 
> Yes 
>       c)  When you instead insert into an eigen matrix (which is also parallel?)
> No, The Eigen Matrix is just a local matrix to each process and it doesn't support MPI ( at least how I use it ), I just used to establish a baseline for the insertion cost in CSR matrix ( since I don't set off-processor rows anyway)
> And it both cases it is a local insertion until MatAssembly where communication and inter-processor copy should happen
> After I insert in Eigen and I copy again the Eigen matrix to the  Petsc Matrix, using MatUpdateMPIAIJWithArrays() which costs me an additional 0.20326 secs., which is what I intend to use for linear system solution using ksp, pc
>           it takes 2.8727 secs.
>       d)  When you in your code put all the values into a CSR matrix and then call MatUpdateMPIAIJWithArrays() it takes
>            2.8727 secs + .20326 secs.
> Yes 
>       e) When you modify MatSetValues_MPIAIJ into MatSetValues_MPIAIJ2() that allows multiple rows that do not need matching 
>            columns it takes 2.6170 secs.
> Yes 
> 1e) When you use MatSetValues_MPIAIJ2() are you directly calling MatSetValues_MPIAIJ2() or are you still calling MatSetValues() and
>        having it call MatSetValues_MPIAIJ2()? That is did you replace the setvalues() pointer in the function table? 
> Yes, I am calling  MatSetValues2_MPIAIJ() directly  not MatSetValues(), I just wanted to confirm the behavior first and if I am getting any performance gain
> I wanted to be as minimally intrusive to the code as possible right now to avoid causing problems elsewhere since I have just started learning how it works.
> 
> 1*) For the cases b,c, and e are calling MatSetValues*() multiple times for the same row or exactly once for a given row?
> I am only calling MatSetValues exactly once per row and each time I set 4 nonzero values although for the future  I may set as many as 7 non-zero values
> however, in MatSetValues2_MPIAIJ() I call it once every nrow_buffer
> The call is like this
> 		PetscLogEventBegin(FillPetscMatEvent, 0, 0, 0, 0);
> 		auto mat = &PhiEqnSolver.mat;
> 
> 		
> int buffer_size = 100, n_inner, Rows_left,nj_per_row_i;
> 
> 		
> int n_outer = int(ceil(double(nCells) / double(buffer_size)));
> 
> 		cell 
> = 0;
> 
> 		Rows_left 
> = nCells;
> 
> 		
> for (int i_outer = 0; i_outer < n_outer; i_outer++) {
> 
> 			n_inner 
> = min(buffer_size,Rows_left );
> 
> 			k 
> = 0;
> 
> 			
> for (i = 0; i < n_inner; i++) {
> 
> 				cell_gindex 
> = cellGIndex[cell];
> 
> 				i_index
> [i] = cell_gindex;
> 
> 				
> // set the system RHS = f * vol
> 
> 				Eqn_rhs_ptr
> [cell] += cellVol[cell] * rhsFieldptr[cell];
> 
> 
> 				istart 
> = cellIFace[cell];
> 
> 				iend 
> = cellIFace[cell + 1];
> 
> 
> 				diagCOEF 
> = 0.0;
> 
> 				nj_per_row_i 
> =0;
> 
> 				
> for (fptr = istart; fptr < iend; fptr++) {
> 
> 					face 
> = cellFace[fptr];
> 
> 					nghbr 
> = faceNeighbor[face];
> 
> 					COEF 
> = Coef[face]; // flux coef computed on the faces of each cell in the domain
> 
> 					
> if (nghbr < 0) {
> 
> 						bc_type 
> = faceBCType[face];
> 
> 						
> if (bc_type == PROCESSOR) {
> 
> 							neighbor_cell 
> = facePhysTag[face];
> 
> 							j_index
> [k] = neighbor_cell;
> 
> 							coefvalues
> [k] = COEF;
> 
> 							diagCOEF 
> -= COEF;
> 
> 							k
> ++;
> 
> 							nj_per_row_i
> ++;
> 
> 						
> } else {
> 
> 							diagCOEF 
> -= COEF;
> 
> 						
> }
> 
> 					
> } else {
> 
> 						owner 
> = faceOwner[face];
> 
> 						neighbor_cell 
> = owner + nghbr - cell;
> 
> 						j_index
> [k] = cellGIndex[neighbor_cell];
> 
> 						coefvalues
> [k] = COEF;
> 
> 						diagCOEF 
> -= COEF;
> 
> 						k
> ++;
> 
> 						nj_per_row_i
> ++;
> 
> 					
> }
> 
> 				
> }
> 
> 				j_index
> [k] = cell_gindex;
> 
> 				coefvalues
> [k] = diagCOEF;
> 
> 				k
> ++;
> 
> 				nj_per_row_i
> ++;
> 
> 				nj_perRow
> [i] = nj_per_row_i;
> 
> 				cell
> ++;
> 
> 				Rows_left
> --;
> 
> 			
> }
> 
> 			MatSetValues2_MPIAIJ
> (*A, n_inner, i_index, nj_perRow, j_index, coefvalues, INSERT_VALUES);
> 
> 		
> }
> 
> 		PetscLogEventEnd
> (FillPetscMatEvent, 0, 0, 0, 0);
> 
> 2) You have done the initial matrix preallocation with MatCreateMPIAIJWithArrays(); and so have perfect preallocation, that is when you run with
>     -info and grep for malloc it says 0 mallocs are performed?
> Yes no mallocs and no unused non-zeros
> check the attached file <info_log_petsc_setValues.txt>
> 
>  3) Are you using ADD_VALUES or INSERT_VALUES?
> 
> here are the times for nrow_buffer = 50
> With InsertValues
> FillCSTMat_with_MatSetValues     100 1.0 2.5512e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 16  0  0  0  0  93  0  0  0  0     0
> WithAddValues
> FillPetscMat     100 1.0 2.6654e+00 1.0 7.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 17 53  0  0  0  90100  0  0  0   167
> Additionally I have to zero the matrix which costs
> MatZeroEntries       100 1.0 9.5259e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   3  0  0  0  0     0
>  INSERT_VALUES is abit faster than ADD_VALUES the difference I saw can be attributed to the cost computation ( addition)
> 4) In each MatSetValues() call are you setting values into all nonzero locations of that row, or only a subset of nonzero locations in that row? 
>     Are the column indices for that row you pass in monotonically increasing or not?
> Yes I setting new values for all non-zero values in that row, not a subset of it 
> No, the column indices are Not monotonically increasing as you can see from the above code the column indices come from the mesh connectivity come.
> I suppose I can sort them and make line an ordering map for each row and store it and use that to rearrange the values before inserting them. 
> But would it really matter that much ?! From memory access point, I suppose it could matter.
> 
> 5) Did you ./configure with --with-debugging=0?
> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-metis --download-hypre
> 
> As for the second Message 
> 1) Could you please send nrow_buffer = 1 
> Yes, Of course.
> As I said I made very small changes and now I trying to dig deeper into the code to see if there is more to do 
>  Should I attach the modified < mpiaij.c and petscmat.h > to this message thread or send it directly to you 
> 2) What time do you get if you use MatSetValues2_MPIAIJ() with an nrow_buffer = 1 ?
> Honestly, I didn't try that before.
> First I replaced the call in the code above to MatSetValues() and set nrow_buffer = 1 so it should be equivalent to before and here is the time 
> FillPetscMat     100 1.0 3.6594e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 22  0  0  0  0  96  0  0  0  0     0
> Now I used  MatSetValues2_MPIAIJ() with an nrow_buffer = 1
> FillPetscMat     100 1.0 3.3920e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21  0  0  0  0  96  0  0  0  0     0
> 
> For some reason, MatSetValues2_MPIAIJ is a bit faster.
> Sorry for the long reply.
> 
> Thanks,
> Kamra
> 
> 
> On Fri, Jul 19, 2019 at 9:48 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>  
>   1) Could you please send MatSetValues2_MPIAIJ()?
> 
>    2) What time do you get if you use MatSetValues2_MPIAIJ() with an nrow_buffer = 1 ?
> 
>    Thanks
> 
>    Barry
> 
> 
> > On Jul 18, 2019, at 2:01 AM, Mohammed Mostafa via petsc-users <petsc-users at mcs.anl.gov> wrote:
> > 
> > Hello everyone,
> > Since I already established a baseline to compare the cost of inserting values in PetscMatrix.
> > And based on the hint of the number of values inserted in the matrix each time
> >  2) Can you tell me how many values are inserted? 
> > I took a look at the source code for "MatSetValues_MPIAIJ" and found that it seems to be designed for 
> > FEM assembly of the global matrix from element matrices(from what I remember from undergrad) since it requires for the setting of multiple rows that the col indices should be the same
> > 
> > So to increase the number of inserted values I need to modify the implementation of  MatSetValues_MPIAIJ to allow for more values to be inserted 
> > So I made a copy of the function "MatSetValues_MPIAIJ" in "src/mat/impls/aij/mpi/mpiaij.c" and named it "MatSetValues2_MPIAIJ"
> > I made some minor changes to allow for inserting multiple rows regardless of whether they have the same col indices
> > 
> > So what I do now is store the data for multiple rows and then insert them all together and I figured I would see how the performance would be.
> > I tried different number of rows to be inserted i.e. nrow_buffer = [2, 5, 10, 20, 50, 100]
> > So now instead of calling   "MatSetValues" for every row in the matrix , I call "MatSetValues2_MPIAIJ" every  (nrow_buffer)th which should allow for some performance improvement
> > the results are as follows
> > First Remember that before
> > 1-computation and insertion into petsc matrix
> > FillPetscMat_with_MatSetValues              100 1.0 3.8820e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23  0  0  0  0  96  0  0  0  0     0 
> > 2-computation and insertion into eigen matrix 
> > FilEigenMat                                               100 1.0 2.8727e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18  0  0  0  0  88  0  0  0  0     0    
> > 
> > Now
> > nrow_buffer = 2
> > FillPetscMat_with_MatSetValues2                  100 1.0 3.3321e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 20  0  0  0  0  95  0  0  0  0     0
> > nrow_buffer = 5
> > FillPetscMat_with_MatSetValues2                  100 1.0 2.8842e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17  0  0  0  0  94  0  0  0  0     0
> > nrow_buffer = 10
> > FillPetscMat_with_MatSetValues2                  100 1.0 2.7669e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17  0  0  0  0  93  0  0  0  0     0
> > nrow_buffer = 20
> > FillPetscMat_with_MatSetValues2                  100 1.0 2.6834e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 16  0  0  0  0  93  0  0  0  0     0
> > nrow_buffer = 50
> > FillPetscMat_with_MatSetValues2                  100 1.0 2.6862e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17  0  0  0  0  93  0  0  0  0     0
> > nrow_buffer = 100
> > FillPetscMat_with_MatSetValues2                  100 1.0 2.6170e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 16  0  0  0  0  93  0  0  0  0     0
> > 
> > As to be expected, with increasing the number of rows to be inserted the overhead reduces until it basically stagnates somewhere between 20-50 
> > The modifications I made based on MatSetValues_MPIAIJ are very small but the effect is significant ( drop in insertion cost by 33%) and it is now even faster than Eigen(baseline) at insertion with my naive usage.
> > For now I am quite satisfied with the outcome. There is probably some room for improvement but for now this is enough.
> >  
> > Thanks,
> > Kamra
> > 
> > On Thu, Jul 18, 2019 at 12:34 AM Mohammed Mostafa <mo7ammedmostafa at gmail.com> wrote:
> > Regarding the first point
> > 1) Are you timing only the insertion of values, or computation and insertion?  
> > I am timing both, the computation and insertion of values but as I said I timed three scenarios
> > 1-computation only and no insertion
> > Computation_no_insertion                            100 1.0 1.6747e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0  22  0  0  0  0     0
> > 2-computation and insertion into petsc matrix
> > FillPetscMat_with_MatSetValues              100 1.0 3.8820e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23  0  0  0  0  96  0  0  0  0     0 
> > 3-computation and insertion into eigen matrix 
> > FilEigenMat                                               100 1.0 2.8727e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18  0  0  0  0  88  0  0  0  0     0    
> > I timed 100 times to get a reasonably accurate timings
> > 
> > as for the second point
> >  2) Can you tell me how many values are inserted? 
> > For a total of nearly 186062 rows per process (with  6 processes in total, the matrix global size is 1116376)
> > In most rows ( about 99.35%)  4 non-zeros per rows and in the remaining 0.35% 2 or 3 non-zeros per row
> > the number of off-diagonal onnz in total is 648 nnz 
> > So I insert nearly 4 values 186062 times ~= 744248 times per mpi process
> > 
> > 
> > Thanks,
> > Kamra
> > 
> > On Wed, Jul 17, 2019 at 11:59 PM Matthew Knepley <knepley at gmail.com> wrote:
> > On Wed, Jul 17, 2019 at 8:51 AM Mohammed Mostafa <mo7ammedmostafa at gmail.com> wrote:
> > Sorry for the confusion 
> > First I fully acknowledge that setting Matrix non-zeros or copying in general is not cheap and memory access pattern can play an important role.
> > So to establish a baseline to compare with, I tried setting the same matrix but in an Eigen Sparse Matrix  and the timings are as follows
> > FillPetscMat_with_MatSetValues              100 1.0 3.8820e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23  0  0  0  0  96  0  0  0  0     0 
> > FilEigenMat                                               100 1.0 2.8727e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18  0  0  0  0  88  0  0  0  0     0  
> > 
> > Great. This helps. Two things would help me narrow down what is happening.
> > 
> >   1) Are you timing only the insertion of values, or computation and insertion?
> > 
> >   2) Can you tell me how many values are inserted?
> > 
> >   Thanks,
> > 
> >     Matt
> >  
> > I used the same code but simply filled a different Matrix something like
> > 
> > for ( i =0; i < nRows;i++)
> > {
> > //
> > .......
> > // Some code to get  j_index, coefvalues
> > // Method1
> > MatSetValues(A, 1, &cell_global_index, nnz_per_row, j_index, coefvalues, INSERT_VALUES); 
> > 
> > //Method2
> > for ( int k = 0;k < nnz_per_row; k++)
> >      EigenMat.coeffRef(i, j_index[k] ) = coefvalues[k];
> > 
> >  }
> > Please note that only one of the two methods is being used at a time. Also, I separately time the code section used to <  j_index,   coefvalues> but simpling disabling both Method1 and Method2.
> > I found the cost to be trivial in comparison to when either one of the methods is used.
> > I used Eigen out of convenience since I used for some vector and tensor arithmetics somewhere else in the code and it may not be the best choice.
> > Since in PetscMatrix we technically fill two matrices: diagonal and off-diagonal so I expected some difference but is that normal or am I missing something. ?
> > Maybe some setting or MatOption I should be using so far this what I have been using
> > 
> > MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD, local_size, local_size, PETSC_DETERMINE,
> > PETSC_DETERMINE, ptr, j , v, A);
> >  MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE);
> > MatSetOption(A,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE);
> > MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE);
> > MatSetOption(A,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE);
> > MatSetOption(A,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);
> > MatSetOption(A,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE);
> > MatSetUp(A);
> > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
> > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);  
> > 
> > Thanks,
> > Kamra
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/
> 
> <info_log_petsc_setValues.txt>



More information about the petsc-users mailing list