[petsc-users] Various Questions Regarding PETSC

Wed Jul 17 07:28:21 CDT 2019

On Wed, Jul 17, 2019 at 7:21 AM Mohammed Mostafa <mo7ammedmostafa at gmail.com>
wrote:

> Hello,
> I didn't say the communication or assembly takes 90%, only matsetvalues
> the setting of values in the petsc matrix takes 90% of times as shown here.
>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                                                    Count      Time
>> (sec)     Flop                              --- Global ---  --- Stage ----
>>  Total
>>                                                         Max Ratio  Max
>>   Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R
>> Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>
> VecAssemblyBegin                          100 1.0 1.1079e-03 9.6 0.00e+00
>> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd                             100 1.0 1.5783e-04 1.5
>> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyBegin                          100 1.0 1.4114e-04 1.5 0.00e+00
>> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd                             100 1.0 4.8351e-04 1.4
>> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> AssembleMats                                 100 1.0 2.9211e-03 1.8
>> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> FillMat_with_MatSetValues              100 1.0 3.8820e+00 1.1 0.00e+00
>> 0.0 0.0e+00 0.0e+00 0.0e+00 23  0  0  0  0 * 96 * 0  0  0  0     0
>> callScheme                                       100 1.0 1.4874e-01 1.1
>> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   4  0  0  0  0
>> 0
>
> As you can see, no messages are exchanged, the time sink comes from copy
> and repeated function call since it is being called once for every row in
> the matrix.
>

That is not clear since I cannot see what your timer is capturing. It could
include computation of values, or load imbalance.

> There is no linear solver being used in the code that I am profiling right
> now.
> For now, I am focusing on profiling construction of the Matrix and the
> only time sink is the overhead from MatSetValues
>

There are two problems I see with the analysis. First, there is no model
(even very simple) to tell you how much time
you expect it to take. All proposals for optimization or bottlenecks are
just guesses. Second, if there is no solve, there
does not seem to be anything else going on, so how would you not expect it
to take all the time. The idea would be to
evaluate in the scenario in which you expect it to be used.

  Thanks,

     Matt

> Regards,
> Kamra
>
> On Wed, Jul 17, 2019 at 8:42 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Wed, Jul 17, 2019 at 2:49 AM Mohammed Mostafa <
>> mo7ammedmostafa at gmail.com> wrote:
>>
>>> Hello everyone,
>>> Based on the many advices that you have provided me so far here is what
>>> I have tried.
>>> First I have a create tmp sparse matrix in CSR format based on my
>>> algorithm and then I created the MPIAIJ matrix as follows
>>>
>>> MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD, local_size, local_size,
>>> PETSC_DETERMINE,
>>> PETSC_DETERMINE, ptr, j , v, A);
>>>  MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE);
>>> MatSetOption(A,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE);
>>> MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE);
>>> MatSetOption(A,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE);
>>> MatSetOption(A,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);
>>> MatSetOption(A,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE);
>>> MatSetUp(A);
>>> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>>> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>>>
>>> and the RHS vector
>>>  VecCreateMPI(PETSC_COMM_WORLD, this->local_size, PETSC_DETERMINE, &RHS);
>>> VecSetOption(RHS,VEC_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE);
>>> VecAssemblyBegin(RHS);
>>> VecAssemblyEnd(RHS);
>>>
>>> Next, I calculated my finite volume face fluxes and stored in
>>> face-centered c-array which is accessed repeatly during matrix construction
>>> to add the contribution of different terms in PDE.
>>> After that I assembled each row using the cell-face connectivity and I
>>> set the non-zero entries for each cell which corresponds to a row using one
>>> of two approaches:
>>> (1)
>>> MatSetValues(A, 1, &cell_global_index, nnz_per_row, j_index, coefvalues,
>>> INSERT_VALUES);
>>> where j_index is the global indices of cols in row "cell_global_index"
>>> and coefvalues are nnz values in that row
>>>
>>> (2)
>>> Set each row in the CSR first then dump the whole matrix into PETSC mat
>>> using
>>> MatUpdateMPIAIJWithArrays(*A,local_size,   local_size,
>>>  PETSC_DETERMINE,PETSC_DETERMINE   , ptr, j , v ); from the master branch
>>> I did some numerical test and I found the following
>>> #1#   Setting up the matrix in that way dropped the MatAssembly cost by
>>> two orders of magnitude
>>> #2#  I applied similar options to the right hand side vector and got
>>> similar results as shown next
>>>
>>> This stage log is for running the routine that constructs the matrix 100
>>> times to get consistent timings
>>> Approach (1)
>>> VecAssemblyBegin     100 1.0 1.1079e-03 9.6 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyEnd       100 1.0 1.5783e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyBegin     100 1.0 1.4114e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd       100 1.0 4.8351e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> AssembleMats         100 1.0 2.9211e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> FillMat_with_MatSetValues     100 1.0 3.8820e+00 1.1 0.00e+00 0.0
>>> 0.0e+00 0.0e+00 0.0e+00 23  0  0  0  0  96  0  0  0  0     0
>>> callScheme           100 1.0 1.4874e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   4  0  0  0  0     0
>>>
>>>
>>> Approach (2)
>>> VecAssemblyBegin     100 1.0 1.2069e-03 7.6 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyEnd       100 1.0 2.1696e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyBegin     200 1.0 2.0931e-03 4.7 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd       200 1.0 1.0748e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> AssembleMats         100 1.0 2.5523e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> FillCSRMat_with_MatSetValues     100 1.0 2.8727e+00 1.0 0.00e+00 0.0
>>> 0.0e+00 0.0e+00 0.0e+00 18  0  0  0  0  88  0  0  0  0     0
>>> FillMat_with_MatSetValues     100 1.0 2.0326e-01 1.1 0.00e+00 0.0
>>> 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   6  0  0  0  0     0
>>> callScheme           100 1.0 1.8507e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   5  0  0  0  0     0
>>>
>>>
>>> Now as you can see the first approach is more expensive by nearly 15%,
>>> however there is the additional cost memory of of CSR in approach(2)
>>> I believe I could directly use the CSR matrix to store scheme fluxes, I
>>> don't thick it is built for repeated access since it involves a binary
>>> search cost of O(log(nnz_per_row))
>>>
>>> Based on these results, I believe that MatSetVlaues is very expensive,
>>> so is there like a stripped down version of the function that is suitable
>>> for my application because right now it occupies more than 90% of execution
>>> time
>>>
>>
>> 1) First, it should be easy to quickly isolate communication time. If you
>> run on 1 process, is assembly 90% of your time?
>>
>> 2) How many MatVecs are you comparing against to get your 90% figure? The
>> MatMult and KSPSolve events are not shown above,
>>     so we cannot see this 90%.
>>
>>     Assembled matrices are only useful for solves, where you need to
>> compute a preconditioner or for low order methods which do
>>     many MatMults. If you do only a few MatMult operations per assembly
>> or have high order, or are not solving anything, you are
>>     better off with an unassembled application.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Thanks,
>>> Kamra
>>>
>>> On Sun, Jul 14, 2019 at 4:17 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>>  MAT_NO_OFF_PROC_ENTRIES - you know each process will only set values
>>>> for its own rows, will generate an error if any process sets values for
>>>> another process. This avoids all reductions in the MatAssembly routines and
>>>> thus improves performance for very large process counts.
>>>> OK, so I am seeing the whole matrix creation, including your flux calcs
>>>> and your intermediate data structure, as taking 0.1 sec (10/100). That is
>>>> about 7% of the solve time (but this looks like it could use some
>>>> attention) or about 25 Mat-vecs (the standard work unit of an iterative
>>>> solver).
>>>>
>>>> Now I see 10% of the matrix creation time going to MatAssembly end,
>>>> which is annoying because there is no communication.
>>>>
>>>> I don't see any big problems here, except for the solver maybe, if this
>>>> is a nice friendly Laplacian.
>>>>
>>>> * I would try PETSc's Set Value routines directly.
>>>> * You might as well try
>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetOption.html
>>>>
>>>> with
>>>>
>>>>  MAT_NO_OFF_PROC_ENTRIES - you know each process will only set values
>>>> for its own rows, will generate an error if any process sets values for
>>>> another process. This avoids all reductions in the MatAssembly routines and
>>>> thus improves performance for very large process counts.
>>>>
>>>> This should eliminate the MatAssemblyEnd cost.
>>>>
>>>>
>>>>
>>>> On Sat, Jul 13, 2019 at 2:43 PM Mohammed Mostafa <
>>>> mo7ammedmostafa at gmail.com> wrote:
>>>>
>>>>> Sorry about that
>>>>> I wanted to see if the assembly cost would drop with subsequent time
>>>>> steps but it was taking too long to run so I set it to solve only once
>>>>> since I was only interested in profiling the matrix builder.
>>>>> Again sorry for that
>>>>> Kamra
>>>>>
>>>>> On Sun, Jul 14, 2019 at 3:33 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> Ok, I only see one all to KSPSolve.
>>>>>>
>>>>>> On Sat, Jul 13, 2019 at 2:08 PM Mohammed Mostafa <
>>>>>> mo7ammedmostafa at gmail.com> wrote:
>>>>>>
>>>>>>> This log is for 100 time-steps, not a single time step
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Jul 14, 2019 at 3:01 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>>> You call the assembly stuff a lot (200). BuildTwoSidedF is a global
>>>>>>>> thing and is taking a lot of time. You should just call these once per time
>>>>>>>> step (it looks like you are just doing one time step).
>>>>>>>>
>>>>>>>>
>>>>>>>> --- Event Stage 1: Matrix Construction
>>>>>>>>
>>>>>>>> BuildTwoSidedF       400 1.0 6.5222e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
>>>>>>>> VecSet                 1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>> VecAssemblyBegin     200 1.0 6.2633e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
>>>>>>>> VecAssemblyEnd       200 1.0 6.7163e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>> VecScatterBegin      200 1.0 5.9373e-03 2.2 0.00e+00 0.0 3.6e+03 2.1e+03 0.0e+00  0  0 79  2  0   0  0 99100  0     0
>>>>>>>> VecScatterEnd        200 1.0 2.7236e-0223.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>> MatAssemblyBegin     200 1.0 3.2747e-02 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>> MatAssemblyEnd       200 1.0 9.0972e-01 1.0 0.00e+00 0.0 3.6e+01 5.3e+02 8.0e+00  4  0  1  0  6   9  0  1  0100     0
>>>>>>>> AssembleMats         200 1.0 1.5568e+00 1.2 0.00e+00 0.0 3.6e+03 2.1e+03 8.0e+00  6  0 79  2  6  14  0100100100     0
>>>>>>>> myMatSetValues       200 1.0 2.5367e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11  0  0  0  0  25  0  0  0  0     0
>>>>>>>> setNativeMat         100 1.0 2.8223e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  28  0  0  0  0     0
>>>>>>>> setNativeMatII       100 1.0 3.2174e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  31  0  0  0  0     0
>>>>>>>> callScheme           100 1.0 2.0700e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 12, 2019 at 11:56 PM Mohammed Mostafa via petsc-users <
>>>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>>>
>>>>>>>>> Hello Matt,
>>>>>>>>> Attached is the dumped entire log output using -log_view and -info.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Kamra
>>>>>>>>>
>>>>>>>>> On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
>>>>>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello all,
>>>>>>>>>>> I have a few question regarding Petsc,
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please send the entire output of a run with all the logging
>>>>>>>>>> turned on, using -log_view and -info.
>>>>>>>>>>
>>>>>>>>>>   Thanks,
>>>>>>>>>>
>>>>>>>>>>     Matt
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Question 1:
>>>>>>>>>>> For the profiling , is it possible to only show the user defined
>>>>>>>>>>> log events in the breakdown of each stage in Log-view.
>>>>>>>>>>> I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
>>>>>>>>>>>  PetscLogEventExcludeClass(MAT_CLASSID);
>>>>>>>>>>> PetscLogEventExcludeClass(VEC_CLASSID);
>>>>>>>>>>> PetscLogEventExcludeClass(KSP_CLASSID);
>>>>>>>>>>> PetscLogEventExcludeClass(PC_CLASSID);
>>>>>>>>>>> which should "Deactivates event logging for a PETSc object class
>>>>>>>>>>> in every stage" according to the manual.
>>>>>>>>>>> however I still see them in the stage breakdown
>>>>>>>>>>> --- Event Stage 1: Matrix Construction
>>>>>>>>>>>
>>>>>>>>>>> BuildTwoSidedF         4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>>>>>>>>> VecSet                 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00
>>>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>>>>> VecAssemblyBegin       2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>>>>>>>>> VecAssemblyEnd         2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00
>>>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>>>>> VecScatterBegin        2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01
>>>>>>>>>>> 2.1e+03 0.0e+00  0  0  3  0  0   0  0 50 80  0     0
>>>>>>>>>>> VecScatterEnd          2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00
>>>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>>>>> MatAssemblyBegin       2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00
>>>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>>>>> MatAssemblyEnd         2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01
>>>>>>>>>>> 5.3e+02 8.0e+00  0  0  3  0  6  10  0 50 20100     0
>>>>>>>>>>> AssembleMats           2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01
>>>>>>>>>>> 1.3e+03 8.0e+00  0  0  7  0  6  28  0100100100     0  # USER EVENT
>>>>>>>>>>> myMatSetValues         2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00
>>>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  19  0  0  0  0     0   # USER EVENT
>>>>>>>>>>> setNativeMat           1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00
>>>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  24  0  0  0  0     0   # USER EVENT
>>>>>>>>>>> setNativeMatII         1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00
>>>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  28  0  0  0  0     0   # USER EVENT
>>>>>>>>>>> callScheme             1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00
>>>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0   # USER EVENT
>>>>>>>>>>>
>>>>>>>>>>> Also is possible to clear the logs so that I can write a
>>>>>>>>>>> separate profiling output file for each timestep ( since I am solving a
>>>>>>>>>>> transient problem and I want to know the change in performance as time goes
>>>>>>>>>>> by )
>>>>>>>>>>>
>>>>>>>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>>>>> Question 2:
>>>>>>>>>>> Regarding MatSetValues
>>>>>>>>>>> Right now, I writing a finite volume code, due to algorithm
>>>>>>>>>>> requirement I have to write the matrix into local native format ( array of
>>>>>>>>>>> arrays) and then loop through rows and use MatSetValues to set the elements
>>>>>>>>>>> in "Mat A"
>>>>>>>>>>> MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
>>>>>>>>>>> but it is very slow and it is killing my performance
>>>>>>>>>>> although the matrix was properly set using
>>>>>>>>>>> MatCreateAIJ(PETSC_COMM_WORLD, this->local_size,
>>>>>>>>>>> this->local_size, PETSC_DETERMINE,
>>>>>>>>>>> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
>>>>>>>>>>> with d_nnz,and  o_nnz properly assigned so no mallocs occur
>>>>>>>>>>> during matsetvalues and all inserted values are local so no off-processor
>>>>>>>>>>> values
>>>>>>>>>>> So my question is it possible to set multiple rows at once
>>>>>>>>>>> hopefully all, I checked the manual and MatSetValues can only set dense
>>>>>>>>>>> matrix block because it seems that row by row is expensive
>>>>>>>>>>> Or perhaps is it possible to copy all rows to the underlying
>>>>>>>>>>> matrix data, as I mentioned all values are local and no off-processor
>>>>>>>>>>> values ( stash is 0 )
>>>>>>>>>>> [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0
>>>>>>>>>>> mallocs.
>>>>>>>>>>> [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses
>>>>>>>>>>> 0 mallocs.
>>>>>>>>>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0
>>>>>>>>>>> mallocs.
>>>>>>>>>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0
>>>>>>>>>>> mallocs.
>>>>>>>>>>> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0
>>>>>>>>>>> mallocs.
>>>>>>>>>>> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0
>>>>>>>>>>> mallocs.
>>>>>>>>>>> [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0
>>>>>>>>>>> mallocs.
>>>>>>>>>>> [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0
>>>>>>>>>>> mallocs.
>>>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064;
>>>>>>>>>>> storage space: 0 unneeded,743028 used
>>>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062;
>>>>>>>>>>> storage space: 0 unneeded,742972 used
>>>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
>>>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063;
>>>>>>>>>>> storage space: 0 unneeded,743093 used
>>>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062;
>>>>>>>>>>> storage space: 0 unneeded,743036 used
>>>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062;
>>>>>>>>>>> storage space: 0 unneeded,742938 used
>>>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063;
>>>>>>>>>>> storage space: 0 unneeded,743049 used
>>>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage
>>>>>>>>>>> space: 0 unneeded,685 used
>>>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage
>>>>>>>>>>> space: 0 unneeded,649 used
>>>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
>>>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage
>>>>>>>>>>> space: 0 unneeded,1011 used
>>>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage
>>>>>>>>>>> space: 0 unneeded,1137 used
>>>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage
>>>>>>>>>>> space: 0 unneeded,658 used
>>>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage
>>>>>>>>>>> space: 0 unneeded,648 used
>>>>>>>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>>>> MatSetValues() is 0
>>>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>>>>> Question 3:
>>>>>>>>>>> If all matrix and vector inserted data are local, what part of
>>>>>>>>>>> the vec/mat assembly consumes time because matsetvalues and matassembly
>>>>>>>>>>> consume more time than matrix builder
>>>>>>>>>>> Also this is not just for the first time MAT_FINAL_ASSEMBLY
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> For context the matrix in the above is nearly 1Mx1M partitioned
>>>>>>>>>>> over six processes and it was NOT built using DM
>>>>>>>>>>>
>>>>>>>>>>> Finally the configure options are:
>>>>>>>>>>>
>>>>>>>>>>> Configure options:
>>>>>>>>>>> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3
>>>>>>>>>>> -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native"
>>>>>>>>>>> FOPTFLAGS="-O3 -march=native -mtune=native" --with-cc=mpicc
>>>>>>>>>>> --with-cxx=mpicxx --with-fc=mpif90 --download-metis --download-hypre
>>>>>>>>>>>
>>>>>>>>>>> Sorry for such long question and thanks in advance
>>>>>>>>>>> Thanks
>>>>>>>>>>> M. Kamra
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>>>> experiments lead.
>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>
>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>>>
>>>>>>>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190717/9234001a/attachment-0001.html>