[petsc-users] performance issue solving multiple linear systems of the same size with the different preconditioning methods

Алексей Рязанов amryazanov at gmail.com
Mon Aug 22 06:12:28 CDT 2011


Thank you for your answer!

I found too problematic to learn using PreLoad procedures. I tryed and
failed. But of course there was no problem to change the

PetscLogStagePush(StageNum1);

KSPSolve(dKSP, dvec_origRHS, dvec_Solution);

PetscLogStagePop();

part of my 2nd program to

KSPSolve(dKSP, dvec_origRHS, dvec_Solution);

PetscLogStagePush(StageNum1);

KSPSolve(dKSP, dvec_origRHS, dvec_Solution);

PetscLogStagePop();

to "make sure everything has been loaded". Of course out of the
Stage-brackets. It was my epic fail. In this "preloaded mode" Stage
statistics looks much worse. Here it is:
30x30x30 mesh

1st programm:
NONE 248its : 2.4834e+00  92.6%  1.2134e+09  99.8%  1.490e+03  98.8%
 1.423e+04       95.5%  1.246e+03  80.4%
JACOBI 249its : 2.5627e+00  92.9%  1.2318e+09  99.8%  1.496e+03  98.8%
 1.423e+04       95.5%  1.252e+03  80.4%
SOR         153its : 2.7673e+00  93.2%  1.1769e+09  99.8%  9.200e+02  98.1%
 1.412e+04       92.9%  7.710e+02  78.7%
ASM-JACOBI 249its : 2.9196e+00  93.3%  1.2336e+09  99.8%  2.508e+03  99.3%
 1.470e+04       97.4%  2.266e+03  88.0%
ASM-SOR 135its : 2.7965e+00  93.2%  1.0915e+09  99.7%  1.368e+03  98.7%
 1.494e+04       95.3%  1.238e+03  86.4%
ASM-ILU          47its  : 8.3341e-01   79.4%  3.8922e+08  99.2%  4.880e+02
 96.4%  1.588e+04       88.8%  4.510e+02  80.5%

2nd programm:
NONE 248its : 2.6047e+00   7.0%  1.2134e+09  13.0%  1.490e+03  13.4%
 1.929e+03       13.3%  1.246e+03  10.2%
JACOBI 249its : 3.0119e+00   8.1%  1.2318e+09  13.2%  1.496e+03  13.5%
 1.937e+03       13.3%  1.251e+03  10.3%
SOR         153its : 3.5001e+00   9.4%  1.1769e+09  12.6%  9.200e+02   8.3%
 1.191e+03        8.2%  7.700e+02   6.3%
ASM-JACOBI 249its : 5.4581e+00  14.6%  1.2336e+09  13.2%  2.494e+03  22.4%
 3.230e+03       22.2%  2.250e+03  18.5%
ASM-SOR 135its : 5.3384e+00  14.3%  1.0915e+09  11.7%  1.354e+03  12.2%
 1.753e+03       12.0%  1.222e+03  10.0%
ASM-ILU 47its  : 1.9040e+00     5.1%  3.8922e+08   4.2%  4.740e+02   4.3%
 6.138e+02        4.2%  4.350e+02   3.6%

2nd programm with "preloading"
NONE 248its : 3.0820e+00   2.2%  1.2134e+09   6.5%  1.490e+03   6.7%
 9.668e+02        6.7%  1.245e+03   5.1%
JACOBI 249its : 4.5530e+00   3.3%  1.2318e+09   6.6%  1.496e+03   6.7%
 9.707e+02        6.7%  1.250e+03   5.2%
SOR                 153its : 5.6799e+00   4.1%  1.1769e+09   6.3%  9.200e+02
  4.1%  5.970e+02        4.1%  7.700e+02   3.2%
ASM-JACOBI 249its : 1.5616e+01  11.2%  1.2336e+09   6.6%  2.494e+03  11.2%
 1.618e+03       11.2%  2.248e+03   9.3%
ASM-SOR 135its : 1.2508e+01   9.0%  1.0914e+09   5.9%  1.354e+03   6.1%
 8.786e+02        6.1%  1.222e+03   5.0%
ASM-ILU 47its : 4.0419e+00    2.9%  3.7906e+08   2.0%  4.740e+02   2.1%
 3.076e+02        2.1%  4.300e+02   1.8%

I also thought, that it all might be due the small number of mesh points per
direction and used more complicated mesh, but got the same results:
45x45x45 mesh

1st programm
NONE 727its : 3.1323e+01  97.9%  1.1996e+10  99.9%  4.364e+03  99.6%
 3.227e+04       98.4%  3.641e+03  82.3%
JACOBI 729its : 3.2667e+01  98.1%  1.2162e+10  99.9%  4.376e+03  99.6%
 3.227e+04       98.4%  3.652e+03  82.3%
SOR         495its : 3.9066e+01  98.5%  1.2901e+10  99.9%  2.972e+03  99.4%
 3.220e+04       97.7%  2.481e+03  81.8%
ASM-JACOBI 729its : 3.5595e+01  98.0%  1.2174e+10  99.9%  7.308e+03  99.8%
 3.263e+04       99.1%  6.586e+03  89.3%
ASM-SOR 468its : 4.1062e+01  98.4%  1.2608e+10  99.9%  4.698e+03  99.6%
 3.276e+04       98.6%  4.235e+03  88.9%
ASM-ILU 158its : 1.0881e+01  94.1%  4.2649e+09  99.8%  1.598e+03  98.9%
 3.344e+04       96.0%  1.450e+03  86.8%

2nd programm
NONE 727its : 3.2766e+01   8.9%  1.1996e+10   18.1%  4.364e+03  17.2%
 5.585e+03       17.2%  3.641e+03  14.3%
JACOBI 729its : 4.1030e+01  11.2%  1.2162e+10  18.4%  4.376e+03  17.3%
 5.601e+03       17.2%  3.651e+03  14.4%
SOR         495its : 5.6807e+01  15.5%  1.2901e+10  19.5%  2.972e+03  11.7%
 3.804e+03       11.7%  2.480e+03   9.8%
ASM-JACOBI 729its : 9.9106e+01  27.0%  1.2174e+10  18.4%  7.294e+03  28.8%
 9.335e+03       28.7%  6.570e+03  25.9%
ASM-SOR 468its : 9.8369e+01  28.3%  1.2608e+10  19.1%  4.684e+03  18.5%
 5.995e+03       18.4%  4.219e+03  16.6%
ASM-ILU 158its : 3.1968e+01   8.7%  4.2649e+09    6.4%  1.584e+03   6.3%
 2.027e+03          6.2%  1.434e+03   5.6%

By the way, for more accurate results, now i use the last one (45x45x45).

I've also checked out the -ksp_view results. As I can see they are pretty
much the same. I'm attaching -ksp_view -log_summary results from both
programs. To the point: I always get this tiny petsc crush at the end of
work, when i'm using -log_summary option. I think it can be caused by
russian localisation of ubuntu or something like this. Actually it doesn't
bug, but it happens - it often crushes at the end of work, but works
properly.

I use my own convergence monitor because I can't understand what's the point
of estimating the preconditioned residual. So I build true residual.

PetscErrorCode MyKSPConverged (KSP ksp, PetscInt it, PetscReal
rnorm,KSPConvergedReason *reason, void* da)

{

  PetscReal true_norm;

  PetscReal epsilon = 1.e-5;

  PetscInt   maxits = 1500;

  Vec t,V;



  DAGetGlobalVector(da, &t);

  DAGetGlobalVector(da, &V);



  KSPBuildResidual(ksp, t, PETSC_NULL, &V);

  VecNorm(V, NORM_2, &true_norm);

//   PetscPrintf(PETSC_COMM_WORLD, "truenorm %d %20.18f\n", it, true_norm);


  DARestoreGlobalVector(da, &t);

  DARestoreGlobalVector(da, &V);



  *reason = 0;

  if (true_norm <= epsilon){

    *reason = KSP_CONVERGED_ATOL;

    PetscPrintf(PETSC_COMM_WORLD, "RAMmonitor: KSP_Converged(): Linear
solver has converged. Residual norm %e is less than absolute tolerance %e at
Iteration %d\n", true_norm, epsilon, it);

  }



  if (it >= maxits){

    *reason = KSP_CONVERGED_ITS;

    PetscPrintf(PETSC_COMM_WORLD, "RAMmonitor: Iteration %d > limit %d\n",
it, maxits);

  }



  return 0;

}


AND THE MAIN PART: I have noticed, that when I comment the part of my 2nd
programm the rest part of it begin to do such good timing results as 1st
program do. In detail:

Structure of my 1st prog:

0) INIT ALL

1) KSPSetFromOptions(dKSP);

    SOLVE:

    PetscLogStagePush(StageNum1);

    ierr = KSPSolve(dKSP, dvec_origRHS, dvec_Solution);

    PetscLogStagePop();

RUN IT WITH: -log_summary -ksp_type KSP -pc_type PC -sub_ksp_type subKSP
-sub_pc_type subPC -ksp_view


Structure of my 2nd prog:

0) INIT ALL

1) PCSetType(dPC, PCNONE);

   SOLVE

2) PCSetType(dPC, PCJACOBI);

   SOLVE

3) PCSetType(dPC, SOR);

   SOLVE

4) PCSetType(dPC, PCASM);

   KSPSetUp(dKSP);

   PCSetUp(dPC);

   PCASMGetSubKSP(dPC, &n_local, &first_local, &ASMSubKSP);

   for (i=0; i<n_local; i++)

   {

      KSPGetPC(ASMSubKSP[i], &(SubPC[i]));

      PCSetType(SubPC[i], PCJACOBI);

   }

   SOLVE

5) SET SubPC SOR like 4)

   SOLVE

6) SET SubPC ILU like 4)

   SOLVE

RUN WITH: -log_summary -ksp_type cgs -ksp_view

So!
When i delete the 4-5-6 part of 2nd, 1-2-3 works great! with exact like 1st
results.
When i delete the 1-2-3 part of 2nd, 4-5-6 works great! with exact like 1st
results.
All program (1-2-3-4-5-6) works badly.



2011/8/22 Jed Brown <jedbrown at mcs.anl.gov>

> On Sun, Aug 21, 2011 at 16:45, Алексей Рязанов <ram at ibrae.ac.ru> wrote:
>
>> Hello!
>>
>> Could you please help me to solve my performance problem.
>> I have two programs.
>>
>> In 1st I solve one system with one method and one preconditioner and get
>> some performance numbers.
>> I run it 9 times with 9 different preconditioners.
>>
>> In 2nd I solve the same system with the same one method but with 9
>> different preconditioners consecutively one after another.
>> I run it once and also get some performance info.
>> In the 2nd case I have 2-5 times worse results, depending on used method.
>>
>> Each KSPSolve procedure placed in its own stage of course, so I can
>> compare times, flops, messages an so..
>> I can see the difference but cant explain and eliminate it.
>>
>> For example for -ksp_type cgs -pc_type asm -sub_pc_type jacobi
>> -sub_ksp_type preonly:
>> Summary of Stages:        ----- Time ------                 ----- Flops
>> -----       --- Messages ---  -- Message Lengths --  -- Reductions --
>>                                             Avg     %Total       Avg
>> %Total      counts   %Total        Avg         %Total   counts   %Total
>>     one stage frome 2nd:   5.5145e+00  14.9%  1.2336e+09  13.2%  2.494e+03
>>  22.4%  3.230e+03       22.2%  2.250e+03  18.5%
>> the once stage from 1st:  2.7541e+00  93.1%  1.2336e+09  99.8%  2.508e+03
>>  99.3%  1.470e+04       97.4%  2.266e+03  88.0%
>>
>>
>> My programs are pretty equivalent except the part with definition of
>> preconditioners and the number of called KSPSolve procedures.
>> I mean they use equal matrices, equally assemble them, use equal right
>> hand sides, equal convergence monitors.
>> Actually the 2nd one was made from the 1st.
>>
>> In 1st i use KSPSetFromOptions(KSP); and then just set the -ksp_type
>>  -pc_type -sub_pc_type -sub_ksp_type keys from command line
>>
>> In 2d i use for for nonblock PC:
>>
>>   KSPGetPC(dKSP, &dPC);
>>
>>   PCSetType(dPC, PCJACOBI);
>>
>> and for block PC:
>>
>>   PCSetType(dPC, PCASM);
>>
>>   KSPSetUp(dKSP);
>>
>>   PCSetUp(dPC);
>>
>>   PCASMGetSubKSP(dPC, &n_local, &first_local, &ASMSubKSP);
>>
>>   for (i=0; i<n_local; i++)
>>
>>   {
>>
>>     KSPGetPC(ASMSubKSP[i], &(SubPC[i]));
>>
>>     PCSetType(SubPC[i], PCJACOBI);
>>
>>   }
>>
>>
>> Im sure there is a mistake somewhere. Because 1st program compares Jacobi
>> and ASM-Jacobi preconditioners on my problem on the same KSP and tells me
>> that ASM-Jacobi is better and the 2nd shows otherwise results.
>>
>
> This could be a preload issue. You can use the PreLoadBegin()/PreLoadEnd()
> macros if you like, or otherwise solve a system first to make sure
> everything has been loaded. If the results are still confusing, run with
> -ksp_view -log_summary and send the output.
>
> There is no reason for ASM-Jacobi (with -sub_ksp_type preonly, which is
> default) to be better than Jacobi since it does the same algorithm with more
> communication.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110822/0ff2be67/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: KSPview 1st program.out
Type: application/octet-stream
Size: 11780 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110822/0ff2be67/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: KSPview 2nd program 123456.out
Type: application/octet-stream
Size: 30805 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110822/0ff2be67/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: KSPview 2nd program 456.out
Type: application/octet-stream
Size: 22794 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110822/0ff2be67/attachment-0005.obj>


More information about the petsc-users mailing list