[petsc-users] KSP solve time and iterations

Valerio Barnabei valerio.barnabei at gmail.com
Fri Jan 18 15:58:27 CST 2019


>
> The only reason I may think about a slower code, is that MatSetValues for
> MATIS has indirect memory access when mapping global-to-local.
>
Are you preallocating the matrix properly? I mean, are you calling
> MatISSetPreallocation with the same input used for
> MatMPIAIJSetPreallocation?
>

You're right. I'm preallocating a different number of nnz. I forgot how
much preallocation influences performance.
My doubt is: if i created my MATIS using MatCreateIS(PETSC_COMM_WORLD, 1,
PETSC_DECIDE, PETSC_DECIDE,  ndofs , ndofs, map, map, &K), how can i
preallocate *_nnz for local submatrix? Do you suggest creating MATIS by
passing the local size instead of the global size?

Same loop, same entries. Only local indices in place of the global ones
>

And the performance is supposed to be the same, right?

A couple more questions:
-I tried to use PCGAMG, and the convergence is really fast. Thank you for
your suggestion. I'm planning to add an advection term to my equation,
which preconditioner would you suggest?
-I'm also trying to use PCBDDC (as you suggested in an earlier mail), but i
got errors, basically because I'm not sure about the options i'm supposed
to use. If you think this PC could suite my case (considering the fact I
will add an advection term soon), can I get any help to tweak it and make
it work? Attached you'll find my error log of a run with -pc_type bddc. Is
there any specific requirement (that i could have completely missed) in the
domain decomposition strategy to use this particular PC?

Thank you for your help and your time.

Valerio



Il giorno ven 18 gen 2019 alle ore 22:18 Stefano Zampini <
stefano.zampini at gmail.com> ha scritto:

>
>
> On Jan 18, 2019, at 6:47 PM, Valerio Barnabei <valerio.barnabei at gmail.com>
> wrote:
>
> Hello (again),
> I'm trying to figure out how to change my program in order to use MATIS
> type and PCBDD.
> I managed to build my MATIS matrix successfully, but my code is now
> slower. I think the main issue is that i'm using MatSetValues inside the
> element loop, and passing as values the e-th local element matrix.
>
>
> The only reason I may think about a slower code, is that MatSetValues for
> MATIS has indirect memory access when mapping global-to-local.
> Are you preallocating the matrix properly? I mean, are you calling
> MatISSetPreallocation with the same input used for
> MatMPIAIJSetPreallocation?
>
> In a sort of pseudocode, where B is the shape functions matrix and D is a
> “diffusion” matrix:
>
>
> for e in localElements
> for qp in quadraturePoints{
>                         for n in localNodes {
>                                    for m in localNodes {
>                                                ke +=
> weights[qp]*detJ[qp]*B[n]^T* D*B[m];
> }
> }
> For n in localNodes{
> For m in localNodes{
>                 fe[n] -= ke[n* localNodes +m]*ge[e][m];//ge is a simple
> array that lets me easily take into account Dirichlet conditions, for my
> simple case
> }
> for n in localNodes{
> //            idx [n]=LM [e][n];
> //LM is the DOF array (rows are Local elements, columns are local nodes,
> values are global dofs, i.e. row/col indices for global assembly of K,
> negative values are prescribed DOFs
> }
> MatSetValues(K, localNodes, (PetscInt*) &idx [0], localNodes, (PetscInt*)
> &idx [0], (PetscScalar*) &ke[0], ADD_VALUES);
> //same for VecSetValues
> }
> Just to clarify: the array I used to create the map is a unique array for
> each process, without the negative values of the prescribed DOFs; the array
> I’m using in matsetvalues is each time the portion of the array used for
> the map relative to the current element. In addition to that: the code is
> working, it gives the same result of an older version where I used MATAIJ,
> but the portion of the element assembly loop is slower, and so is the
> solving process (compared using PCNONE on both, obtained same iterations to
> converge, same norm_2 of solution vector).
> I guess using MatSetValues outside of the loop, using the same large array
> I used to create the map for the matIS is supposed to be better, but I
> cannot see a way to use it with this elemental assembly logic.
>
>
> MatSetValues (and MatSetValuesLocal) expects 2d data for the entries.
> Calling these once per element is the best you can do.
>
> Stefano suggested to use MatSetValuesLocal: in that case what is supposed
> to be idx? (I’m using global indices now).
>
>
> You should use the corresponding local indices. I mean, if your local ->
> global map is (0,1,2) -> (31,44,77), then, in place of 44, you should use 1
> as idx for MatSetValuesLocal
>
> If I use MatSetValuesLocal, can I stick with this logic of adding values
> inside the element loop?
>
>
> Same loop, same entries. Only local indices in place of the global ones
>
> Is there any other way you suggest to assign values for this case?Any
> suggestion would be really appreciated.
>
> I’m not sure I explained my issue properly, I can give any further
> information needed.
>
> Valerio
>
>
>
> Il giorno ven 18 gen 2019 alle ore 15:55 Valerio Barnabei <
> valerio.barnabei at gmail.com> ha scritto:
>
>> The mail was still incomplete. Sorry, i will write it again.
>> Apologies.
>>
>> Il giorno ven 18 gen 2019 alle ore 15:53 Valerio Barnabei <
>> valerio.barnabei at gmail.com> ha scritto:
>>
>>> Hello,
>>> I'm trying to figure out how to change my program in order to use MATIS
>>> type and PCBDD.
>>> I managed to build my MATIS matrix successfully, but my code is now
>>> slower. I think the main issue is that i'm using MatSetValues inside the
>>> element loop, and passing as values the e-th local element matrix. In a
>>> sort of pseudocode:
>>>
>>> for e in localElements:{
>>> for (int qp=0;qp<ngp_tot;qp++)
>>>        for(int A=0;A<Nnodesloc;A++){
>>>                 for(int B=0;B<Nnodesloc;B++){
>>>                     //Bat dot temp (NB: ba_n0=bat_0n)
>>>                     double
>>> prod=temp[0][A][qp]*BaT[B][0][qp]+temp[1][A][qp]*BaT[B][1][qp];
>>>                     //ke[A][B]+=weights[qp]*detJ[qp]*prod;
>>>                     ke[A*Nnodesloc+B]+=weights[qp]*detJ[qp]*prod;
>>>
>>>
>>> ?
>>> Actually i have lots of doubts:
>>> -when using ISLocalToGlobalMappingCreate(), what's the blocksize?
>>>
>>> Il giorno lun 14 gen 2019 alle ore 15:08 Valerio Barnabei <
>>> valerio.barnabei at gmail.com> ha scritto:
>>>
>>>> Thank you Stefano for your quick answer, I will try to change PC as you
>>>> suggest, and I will read something about MATIS type. I will let you know if
>>>> I still have doubts.
>>>>
>>>> Valerio
>>>>
>>>> Il giorno dom 13 gen 2019 alle ore 16:08 Stefano Zampini <
>>>> stefano.zampini at gmail.com> ha scritto:
>>>>
>>>>> What problem are you trying to solve?
>>>>> For standard elliptic problems (e.g. poisson or elasticity) you should
>>>>> use a preconditioner that scales, for example PCGAMG or BoomerAMG from
>>>>> HYPRE (PCHYPRE, if you configured PETSc with support for HYPRE via
>>>>> --download-hypre)
>>>>> Alternatively, since you are using FEM with local Assembly, you can
>>>>> use PCBDDC, but this requires few extra calls
>>>>>
>>>>> - call MatSetType(K,MATIS)
>>>>> - provide an ISLocalToGlobalMapping of local to global degrees of
>>>>> freedom before you set preallocation via MatSetLocalToGlobalMapping (if you
>>>>> provide this call, then you can use MatSetValuesLocal for both AIJ
>>>>> amd MATIS types)
>>>>>
>>>>>
>>>>>
>>>>> Il giorno dom 13 gen 2019 alle ore 17:59 Valerio Barnabei via
>>>>> petsc-users <petsc-users at mcs.anl.gov> ha scritto:
>>>>>
>>>>>> Hello everybody,
>>>>>>
>>>>>> I have some doubts about the parallel solution of a simple FEM ksp
>>>>>> problem. I searched in the mailing list but the information I found are not
>>>>>> helping me.
>>>>>> I need to control manually the parallel assemble of the main matrix,
>>>>>> but I would like to use PetSC for the solution of the final linear system
>>>>>> K*U=F.
>>>>>> The algorithm I am using, loads an already decomposed FEM domain, and
>>>>>> each process contributes to the global matrix assembly using its portion of
>>>>>> domain decomposition (made in pre-processing). In the element loop, at each
>>>>>> element I load the element submatrix in the global matrix via MatSetValue
>>>>>> function. Then, out of the element loop, I write
>>>>>> /*
>>>>>> ...matrix preallocation
>>>>>> ...calculate element local stiffness matrix (local in a FEM sense and
>>>>>> local in a domain decomposition sense)
>>>>>> MatSetValue (ADD)
>>>>>> */
>>>>>> and then
>>>>>> ierr=MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>>> ierr=MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>>> ierr=VecAssemblyBegin(F);CHKERRQ(ierr);
>>>>>> ierr=VecAssemblyEnd(F);CHKERRQ(ierr);
>>>>>>
>>>>>> This part scales perfectly and is sufficiently fast.
>>>>>> My K matrix is a sort of banded diagonal matrix: my preallocation is
>>>>>> not perfect, but it works at this stage of my work.
>>>>>> Finally I solve the system with the ksp logic:
>>>>>>
>>>>>> ierr=KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>>>>>> ierr=KSPSetOperators(ksp,K,K);CHKERRQ(ierr);
>>>>>> ierr=KSPGetPC(ksp,&pc);CHKERRQ(ierr);
>>>>>> ierr=PCSetType(pc,PCBJACOBI);CHKERRQ(ierr); //or PCNONE, or PCJACOBI,
>>>>>> the behaviour is almost identical, the iteration number lowers but is still
>>>>>> really high
>>>>>>
>>>>>> ierr=KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);
>>>>>> ierr=KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>>>>> ierr=KSPSolve(ksp,F,sol);
>>>>>>
>>>>>> When I run the code I am noticing some strange behaviours. Indeed,
>>>>>> the numerical solution appear right, and the code scales if I make a simple
>>>>>> strong-scaling test. However, the ksp_log and log_view (see below, logs for
>>>>>> a 160k quad linear element uniform 2dmesh) shows me a huge number of
>>>>>> iterations, and the time spent in "solve" function is very high.
>>>>>> Furthermore, the more I increase the problem size keeping a constant load
>>>>>> per process (weak scaling) the more iterations seems to be needed for
>>>>>> convergence. In general, when increasing the problem size (i.e. 16M
>>>>>> elements) the convergence is not achieved before the maximum iteration
>>>>>> number is achieved.
>>>>>>
>>>>>> Question 1: Am I doing it right? Or am I making some trouble with the
>>>>>> MPI communication management? (i'm aware that my domain decomposition does
>>>>>> not match the petsc matrix decomposition, I'm expecting it to cripple my
>>>>>> algorithm a bit)
>>>>>> Question 2: Why the solver takes so much time and so many iterations?
>>>>>> the problem is really simple, and the solution appears correct when
>>>>>> plotted. Am I ill conditioning the system?
>>>>>> Thanks in advance for your help. I can add any further information if
>>>>>> needed.
>>>>>>
>>>>>> Valerio
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Stefano
>>>>>
>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190118/02a8c712/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log.out
Type: application/octet-stream
Size: 122246 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190118/02a8c712/attachment-0001.obj>


More information about the petsc-users mailing list