[petsc-users] Various Questions Regarding PETSC

Wed Jul 17 06:42:08 CDT 2019

On Wed, Jul 17, 2019 at 2:49 AM Mohammed Mostafa <mo7ammedmostafa at gmail.com>
wrote:

> Hello everyone,
> Based on the many advices that you have provided me so far here is what I
> have tried.
> First I have a create tmp sparse matrix in CSR format based on my
> algorithm and then I created the MPIAIJ matrix as follows
>
> MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD, local_size, local_size,
> PETSC_DETERMINE,
> PETSC_DETERMINE, ptr, j , v, A);
>  MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE);
> MatSetOption(A,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE);
> MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE);
> MatSetOption(A,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE);
> MatSetOption(A,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);
> MatSetOption(A,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE);
> MatSetUp(A);
> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>
> and the RHS vector
>  VecCreateMPI(PETSC_COMM_WORLD, this->local_size, PETSC_DETERMINE, &RHS);
> VecSetOption(RHS,VEC_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE);
> VecAssemblyBegin(RHS);
> VecAssemblyEnd(RHS);
>
> Next, I calculated my finite volume face fluxes and stored in
> face-centered c-array which is accessed repeatly during matrix construction
> to add the contribution of different terms in PDE.
> After that I assembled each row using the cell-face connectivity and I set
> the non-zero entries for each cell which corresponds to a row using one of
> two approaches:
> (1)
> MatSetValues(A, 1, &cell_global_index, nnz_per_row, j_index, coefvalues,
> INSERT_VALUES);
> where j_index is the global indices of cols in row "cell_global_index" and
> coefvalues are nnz values in that row
>
> (2)
> Set each row in the CSR first then dump the whole matrix into PETSC mat
> using
> MatUpdateMPIAIJWithArrays(*A,local_size,   local_size,
>  PETSC_DETERMINE,PETSC_DETERMINE   , ptr, j , v ); from the master branch
> I did some numerical test and I found the following
> #1#   Setting up the matrix in that way dropped the MatAssembly cost by
> two orders of magnitude
> #2#  I applied similar options to the right hand side vector and got
> similar results as shown next
>
> This stage log is for running the routine that constructs the matrix 100
> times to get consistent timings
> Approach (1)
> VecAssemblyBegin     100 1.0 1.1079e-03 9.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd       100 1.0 1.5783e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin     100 1.0 1.4114e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd       100 1.0 4.8351e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> AssembleMats         100 1.0 2.9211e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> FillMat_with_MatSetValues     100 1.0 3.8820e+00 1.1 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 23  0  0  0  0  96  0  0  0  0     0
> callScheme           100 1.0 1.4874e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   4  0  0  0  0     0
>
>
> Approach (2)
> VecAssemblyBegin     100 1.0 1.2069e-03 7.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd       100 1.0 2.1696e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin     200 1.0 2.0931e-03 4.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd       200 1.0 1.0748e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> AssembleMats         100 1.0 2.5523e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> FillCSRMat_with_MatSetValues     100 1.0 2.8727e+00 1.0 0.00e+00 0.0
> 0.0e+00 0.0e+00 0.0e+00 18  0  0  0  0  88  0  0  0  0     0
> FillMat_with_MatSetValues     100 1.0 2.0326e-01 1.1 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  1  0  0  0  0   6  0  0  0  0     0
> callScheme           100 1.0 1.8507e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   5  0  0  0  0     0
>
>
> Now as you can see the first approach is more expensive by nearly 15%,
> however there is the additional cost memory of of CSR in approach(2)
> I believe I could directly use the CSR matrix to store scheme fluxes, I
> don't thick it is built for repeated access since it involves a binary
> search cost of O(log(nnz_per_row))
>
> Based on these results, I believe that MatSetVlaues is very expensive, so
> is there like a stripped down version of the function that is suitable for
> my application because right now it occupies more than 90% of execution time
>

1) First, it should be easy to quickly isolate communication time. If you
run on 1 process, is assembly 90% of your time?

2) How many MatVecs are you comparing against to get your 90% figure? The
MatMult and KSPSolve events are not shown above,
    so we cannot see this 90%.

    Assembled matrices are only useful for solves, where you need to
compute a preconditioner or for low order methods which do
    many MatMults. If you do only a few MatMult operations per assembly or
have high order, or are not solving anything, you are
    better off with an unassembled application.

  Thanks,

     Matt

> Thanks,
> Kamra
>
> On Sun, Jul 14, 2019 at 4:17 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>>  MAT_NO_OFF_PROC_ENTRIES - you know each process will only set values for
>> its own rows, will generate an error if any process sets values for another
>> process. This avoids all reductions in the MatAssembly routines and thus
>> improves performance for very large process counts.
>> OK, so I am seeing the whole matrix creation, including your flux calcs
>> and your intermediate data structure, as taking 0.1 sec (10/100). That is
>> about 7% of the solve time (but this looks like it could use some
>> attention) or about 25 Mat-vecs (the standard work unit of an iterative
>> solver).
>>
>> Now I see 10% of the matrix creation time going to MatAssembly end, which
>> is annoying because there is no communication.
>>
>> I don't see any big problems here, except for the solver maybe, if this
>> is a nice friendly Laplacian.
>>
>> * I would try PETSc's Set Value routines directly.
>> * You might as well try
>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetOption.html
>>
>> with
>>
>>  MAT_NO_OFF_PROC_ENTRIES - you know each process will only set values for
>> its own rows, will generate an error if any process sets values for another
>> process. This avoids all reductions in the MatAssembly routines and thus
>> improves performance for very large process counts.
>>
>> This should eliminate the MatAssemblyEnd cost.
>>
>>
>>
>> On Sat, Jul 13, 2019 at 2:43 PM Mohammed Mostafa <
>> mo7ammedmostafa at gmail.com> wrote:
>>
>>> Sorry about that
>>> I wanted to see if the assembly cost would drop with subsequent time
>>> steps but it was taking too long to run so I set it to solve only once
>>> since I was only interested in profiling the matrix builder.
>>> Again sorry for that
>>> Kamra
>>>
>>> On Sun, Jul 14, 2019 at 3:33 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> Ok, I only see one all to KSPSolve.
>>>>
>>>> On Sat, Jul 13, 2019 at 2:08 PM Mohammed Mostafa <
>>>> mo7ammedmostafa at gmail.com> wrote:
>>>>
>>>>> This log is for 100 time-steps, not a single time step
>>>>>
>>>>>
>>>>> On Sun, Jul 14, 2019 at 3:01 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> You call the assembly stuff a lot (200). BuildTwoSidedF is a global
>>>>>> thing and is taking a lot of time. You should just call these once per time
>>>>>> step (it looks like you are just doing one time step).
>>>>>>
>>>>>>
>>>>>> --- Event Stage 1: Matrix Construction
>>>>>>
>>>>>> BuildTwoSidedF       400 1.0 6.5222e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
>>>>>> VecSet                 1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>> VecAssemblyBegin     200 1.0 6.2633e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
>>>>>> VecAssemblyEnd       200 1.0 6.7163e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>> VecScatterBegin      200 1.0 5.9373e-03 2.2 0.00e+00 0.0 3.6e+03 2.1e+03 0.0e+00  0  0 79  2  0   0  0 99100  0     0
>>>>>> VecScatterEnd        200 1.0 2.7236e-0223.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>> MatAssemblyBegin     200 1.0 3.2747e-02 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>> MatAssemblyEnd       200 1.0 9.0972e-01 1.0 0.00e+00 0.0 3.6e+01 5.3e+02 8.0e+00  4  0  1  0  6   9  0  1  0100     0
>>>>>> AssembleMats         200 1.0 1.5568e+00 1.2 0.00e+00 0.0 3.6e+03 2.1e+03 8.0e+00  6  0 79  2  6  14  0100100100     0
>>>>>> myMatSetValues       200 1.0 2.5367e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11  0  0  0  0  25  0  0  0  0     0
>>>>>> setNativeMat         100 1.0 2.8223e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  28  0  0  0  0     0
>>>>>> setNativeMatII       100 1.0 3.2174e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  31  0  0  0  0     0
>>>>>> callScheme           100 1.0 2.0700e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 12, 2019 at 11:56 PM Mohammed Mostafa via petsc-users <
>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>
>>>>>>> Hello Matt,
>>>>>>> Attached is the dumped entire log output using -log_view and -info.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kamra
>>>>>>>
>>>>>>> On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
>>>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>>>
>>>>>>>>> Hello all,
>>>>>>>>> I have a few question regarding Petsc,
>>>>>>>>>
>>>>>>>>
>>>>>>>> Please send the entire output of a run with all the logging turned
>>>>>>>> on, using -log_view and -info.
>>>>>>>>
>>>>>>>>   Thanks,
>>>>>>>>
>>>>>>>>     Matt
>>>>>>>>
>>>>>>>>
>>>>>>>>> Question 1:
>>>>>>>>> For the profiling , is it possible to only show the user defined
>>>>>>>>> log events in the breakdown of each stage in Log-view.
>>>>>>>>> I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
>>>>>>>>>  PetscLogEventExcludeClass(MAT_CLASSID);
>>>>>>>>> PetscLogEventExcludeClass(VEC_CLASSID);
>>>>>>>>> PetscLogEventExcludeClass(KSP_CLASSID);
>>>>>>>>> PetscLogEventExcludeClass(PC_CLASSID);
>>>>>>>>> which should "Deactivates event logging for a PETSc object class
>>>>>>>>> in every stage" according to the manual.
>>>>>>>>> however I still see them in the stage breakdown
>>>>>>>>> --- Event Stage 1: Matrix Construction
>>>>>>>>>
>>>>>>>>> BuildTwoSidedF         4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>>>>>>> VecSet                 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>>> VecAssemblyBegin       2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>>>>>>> VecAssemblyEnd         2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>>> VecScatterBegin        2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01
>>>>>>>>> 2.1e+03 0.0e+00  0  0  3  0  0   0  0 50 80  0     0
>>>>>>>>> VecScatterEnd          2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>>> MatAssemblyBegin       2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>>>> MatAssemblyEnd         2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01
>>>>>>>>> 5.3e+02 8.0e+00  0  0  3  0  6  10  0 50 20100     0
>>>>>>>>> AssembleMats           2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01
>>>>>>>>> 1.3e+03 8.0e+00  0  0  7  0  6  28  0100100100     0  # USER EVENT
>>>>>>>>> myMatSetValues         2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  19  0  0  0  0     0   # USER EVENT
>>>>>>>>> setNativeMat           1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  24  0  0  0  0     0   # USER EVENT
>>>>>>>>> setNativeMatII         1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  28  0  0  0  0     0   # USER EVENT
>>>>>>>>> callScheme             1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0   # USER EVENT
>>>>>>>>>
>>>>>>>>> Also is possible to clear the logs so that I can write a  separate
>>>>>>>>> profiling output file for each timestep ( since I am solving a transient
>>>>>>>>> problem and I want to know the change in performance as time goes by )
>>>>>>>>>
>>>>>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>>> Question 2:
>>>>>>>>> Regarding MatSetValues
>>>>>>>>> Right now, I writing a finite volume code, due to algorithm
>>>>>>>>> requirement I have to write the matrix into local native format ( array of
>>>>>>>>> arrays) and then loop through rows and use MatSetValues to set the elements
>>>>>>>>> in "Mat A"
>>>>>>>>> MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
>>>>>>>>> but it is very slow and it is killing my performance
>>>>>>>>> although the matrix was properly set using
>>>>>>>>> MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size,
>>>>>>>>> PETSC_DETERMINE,
>>>>>>>>> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
>>>>>>>>> with d_nnz,and  o_nnz properly assigned so no mallocs occur during
>>>>>>>>> matsetvalues and all inserted values are local so no off-processor values
>>>>>>>>> So my question is it possible to set multiple rows at once
>>>>>>>>> hopefully all, I checked the manual and MatSetValues can only set dense
>>>>>>>>> matrix block because it seems that row by row is expensive
>>>>>>>>> Or perhaps is it possible to copy all rows to the underlying
>>>>>>>>> matrix data, as I mentioned all values are local and no off-processor
>>>>>>>>> values ( stash is 0 )
>>>>>>>>> [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0
>>>>>>>>> mallocs.
>>>>>>>>> [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0
>>>>>>>>> mallocs.
>>>>>>>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>>>> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>>>> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>>>> [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>>>> [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage
>>>>>>>>> space: 0 unneeded,743028 used
>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>>>>>> space: 0 unneeded,742972 used
>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>>>>>>> space: 0 unneeded,743093 used
>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>>>>>> space: 0 unneeded,743036 used
>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>>>>>> space: 0 unneeded,742938 used
>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>>>>>>> space: 0 unneeded,743049 used
>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage
>>>>>>>>> space: 0 unneeded,685 used
>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage
>>>>>>>>> space: 0 unneeded,649 used
>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage
>>>>>>>>> space: 0 unneeded,1011 used
>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage
>>>>>>>>> space: 0 unneeded,1137 used
>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage
>>>>>>>>> space: 0 unneeded,658 used
>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage
>>>>>>>>> space: 0 unneeded,648 used
>>>>>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during
>>>>>>>>> MatSetValues() is 0
>>>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>>> Question 3:
>>>>>>>>> If all matrix and vector inserted data are local, what part of the
>>>>>>>>> vec/mat assembly consumes time because matsetvalues and matassembly consume
>>>>>>>>> more time than matrix builder
>>>>>>>>> Also this is not just for the first time MAT_FINAL_ASSEMBLY
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For context the matrix in the above is nearly 1Mx1M partitioned
>>>>>>>>> over six processes and it was NOT built using DM
>>>>>>>>>
>>>>>>>>> Finally the configure options are:
>>>>>>>>>
>>>>>>>>> Configure options:
>>>>>>>>> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native
>>>>>>>>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3
>>>>>>>>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx
>>>>>>>>> --with-fc=mpif90 --download-metis --download-hypre
>>>>>>>>>
>>>>>>>>> Sorry for such long question and thanks in advance
>>>>>>>>> Thanks
>>>>>>>>> M. Kamra
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>> experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>
>>>>>>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190717/950d4f30/attachment-0001.html>