[petsc-users] performance issue solving multiple linear systems of the same size with the different preconditioning methods
Алексей Рязанов
ram at ibrae.ac.ru
Tue Aug 23 02:37:59 CDT 2011
Thank you for your answer!
I tryed and failed to learn using PreLoad procedures. But of course there
was no problem to change the
PetscLogStagePush(StageNum1);
KSPSolve(dKSP, dvec_origRHS, dvec_Solution);
PetscLogStagePop();
part of my 2nd program to
KSPSolve(dKSP, dvec_origRHS, dvec_Solution);
PetscLogStagePush(StageNum1);
KSPSolve(dKSP, dvec_origRHS, dvec_Solution);
PetscLogStagePop();
to "make sure everything has been loaded". Of course out of the
Stage-brackets. It was my epic fail. In this "preloaded mode" Stage
statistics looks much worse. Here it is:
30x30x30 mesh
1st programm:
NONE 248its : 2.4834e+00 92.6% 1.2134e+09 99.8% 1.490e+03 98.8%
1.423e+04 95.5% 1.246e+03 80.4%
JACOBI 249its : 2.5627e+00 92.9% 1.2318e+09 99.8% 1.496e+03 98.8%
1.423e+04 95.5% 1.252e+03 80.4%
SOR 153its : 2.7673e+00 93.2% 1.1769e+09 99.8% 9.200e+02 98.1%
1.412e+04 92.9% 7.710e+02 78.7%
ASM-JACOBI 249its : 2.9196e+00 93.3% 1.2336e+09 99.8% 2.508e+03 99.3%
1.470e+04 97.4% 2.266e+03 88.0%
ASM-SOR 135its : 2.7965e+00 93.2% 1.0915e+09 99.7% 1.368e+03 98.7%
1.494e+04 95.3% 1.238e+03 86.4%
ASM-ILU 47its : 8.3341e-01 79.4% 3.8922e+08 99.2% 4.880e+02
96.4% 1.588e+04 88.8% 4.510e+02 80.5%
2nd programm:
NONE 248its : 2.6047e+00 7.0% 1.2134e+09 13.0% 1.490e+03 13.4%
1.929e+03 13.3% 1.246e+03 10.2%
JACOBI 249its : 3.0119e+00 8.1% 1.2318e+09 13.2% 1.496e+03 13.5%
1.937e+03 13.3% 1.251e+03 10.3%
SOR 153its : 3.5001e+00 9.4% 1.1769e+09 12.6% 9.200e+02 8.3%
1.191e+03 8.2% 7.700e+02 6.3%
ASM-JACOBI 249its : 5.4581e+00 14.6% 1.2336e+09 13.2% 2.494e+03 22.4%
3.230e+03 22.2% 2.250e+03 18.5%
ASM-SOR 135its : 5.3384e+00 14.3% 1.0915e+09 11.7% 1.354e+03 12.2%
1.753e+03 12.0% 1.222e+03 10.0%
ASM-ILU 47its : 1.9040e+00 5.1% 3.8922e+08 4.2% 4.740e+02 4.3%
6.138e+02 4.2% 4.350e+02 3.6%
2nd programm with "preloading"
NONE 248its : 3.0820e+00 2.2% 1.2134e+09 6.5% 1.490e+03 6.7%
9.668e+02 6.7% 1.245e+03 5.1%
JACOBI 249its : 4.5530e+00 3.3% 1.2318e+09 6.6% 1.496e+03 6.7%
9.707e+02 6.7% 1.250e+03 5.2%
SOR 153its : 5.6799e+00 4.1% 1.1769e+09 6.3% 9.200e+02
4.1% 5.970e+02 4.1% 7.700e+02 3.2%
ASM-JACOBI 249its : 1.5616e+01 11.2% 1.2336e+09 6.6% 2.494e+03 11.2%
1.618e+03 11.2% 2.248e+03 9.3%
ASM-SOR 135its : 1.2508e+01 9.0% 1.0914e+09 5.9% 1.354e+03 6.1%
8.786e+02 6.1% 1.222e+03 5.0%
ASM-ILU 47its : 4.0419e+00 2.9% 3.7906e+08 2.0% 4.740e+02 2.1%
3.076e+02 2.1% 4.300e+02 1.8%
I also thought, that it all might be due the small number of mesh points per
direction and used more complicated mesh, but got the same results:
45x45x45 mesh
1st programm
NONE 727its : 3.1323e+01 97.9% 1.1996e+10 99.9% 4.364e+03 99.6%
3.227e+04 98.4% 3.641e+03 82.3%
JACOBI 729its : 3.2667e+01 98.1% 1.2162e+10 99.9% 4.376e+03 99.6%
3.227e+04 98.4% 3.652e+03 82.3%
SOR 495its : 3.9066e+01 98.5% 1.2901e+10 99.9% 2.972e+03 99.4%
3.220e+04 97.7% 2.481e+03 81.8%
ASM-JACOBI 729its : 3.5595e+01 98.0% 1.2174e+10 99.9% 7.308e+03 99.8%
3.263e+04 99.1% 6.586e+03 89.3%
ASM-SOR 468its : 4.1062e+01 98.4% 1.2608e+10 99.9% 4.698e+03 99.6%
3.276e+04 98.6% 4.235e+03 88.9%
ASM-ILU 158its : 1.0881e+01 94.1% 4.2649e+09 99.8% 1.598e+03 98.9%
3.344e+04 96.0% 1.450e+03 86.8%
2nd programm
NONE 727its : 3.2766e+01 8.9% 1.1996e+10 18.1% 4.364e+03 17.2%
5.585e+03 17.2% 3.641e+03 14.3%
JACOBI 729its : 4.1030e+01 11.2% 1.2162e+10 18.4% 4.376e+03 17.3%
5.601e+03 17.2% 3.651e+03 14.4%
SOR 495its : 5.6807e+01 15.5% 1.2901e+10 19.5% 2.972e+03 11.7%
3.804e+03 11.7% 2.480e+03 9.8%
ASM-JACOBI 729its : 9.9106e+01 27.0% 1.2174e+10 18.4% 7.294e+03 28.8%
9.335e+03 28.7% 6.570e+03 25.9%
ASM-SOR 468its : 9.8369e+01 28.3% 1.2608e+10 19.1% 4.684e+03 18.5%
5.995e+03 18.4% 4.219e+03 16.6%
ASM-ILU 158its : 3.1968e+01 8.7% 4.2649e+09 6.4% 1.584e+03 6.3%
2.027e+03 6.2% 1.434e+03 5.6%
By the way, for more accurate results, now i use the last one (45x45x45).
I've also checked out the -ksp_view results. As I can see they are pretty
much the same. I'm attaching -ksp_view -log_summary results from both
programs. To the point: I always get this tiny petsc crush at the end of
work, when i'm using -log_summary option. I think it can be caused by
russian localisation of ubuntu or something like this. Actually it doesn't
bug, but it happens - it often crushes at the end of work, but works
properly.
I use my own convergence monitor because I can't understand what's the point
of estimating the preconditioned residual. So I build true residual.
PetscErrorCode MyKSPConverged (KSP ksp, PetscInt it, PetscReal
rnorm,KSPConvergedReason *reason, void* da)
{
PetscReal true_norm;
PetscReal epsilon = 1.e-5;
PetscInt maxits = 1500;
Vec t,V;
DAGetGlobalVector(da, &t);
DAGetGlobalVector(da, &V);
KSPBuildResidual(ksp, t, PETSC_NULL, &V);
VecNorm(V, NORM_2, &true_norm);
// PetscPrintf(PETSC_COMM_WORLD, "truenorm %d %20.18f\n", it, true_norm);
DARestoreGlobalVector(da, &t);
DARestoreGlobalVector(da, &V);
*reason = 0;
if (true_norm <= epsilon){
*reason = KSP_CONVERGED_ATOL;
PetscPrintf(PETSC_COMM_WORLD, "RAMmonitor: KSP_Converged(): Linear
solver has converged. Residual norm %e is less than absolute tolerance %e at
Iteration %d\n", true_norm, epsilon, it);
}
if (it >= maxits){
*reason = KSP_CONVERGED_ITS;
PetscPrintf(PETSC_COMM_WORLD, "RAMmonitor: Iteration %d > limit %d\n",
it, maxits);
}
return 0;
}
AND THE MAIN PART: I have noticed, that when I comment the part of my 2nd
programm the rest part of it begin to do such good timing results as 1st
program do. In detail:
Structure of my 1st prog:
0) INIT ALL
1) KSPSetFromOptions(dKSP);
SOLVE:
PetscLogStagePush(StageNum1);
ierr = KSPSolve(dKSP, dvec_origRHS, dvec_Solution);
PetscLogStagePop();
RUN IT WITH: -log_summary -ksp_type KSP -pc_type PC -sub_ksp_type subKSP
-sub_pc_type subPC -ksp_view
Structure of my 2nd prog:
0) INIT ALL
1) PCSetType(dPC, PCNONE);
SOLVE
2) PCSetType(dPC, PCJACOBI);
SOLVE
3) PCSetType(dPC, SOR);
SOLVE
4) PCSetType(dPC, PCASM);
KSPSetUp(dKSP);
PCSetUp(dPC);
PCASMGetSubKSP(dPC, &n_local, &first_local, &ASMSubKSP);
for (i=0; i<n_local; i++)
{
KSPGetPC(ASMSubKSP[i], &(SubPC[i]));
PCSetType(SubPC[i], PCJACOBI);
}
SOLVE
5) SET SubPC SOR like 4)
SOLVE
6) SET SubPC ILU like 4)
SOLVE
RUN WITH: -log_summary -ksp_type cgs -ksp_view
So!
When i delete the 4-5-6 part of 2nd, 1-2-3 works great! with exact like 1st
results.
When i delete the 1-2-3 part of 2nd, 4-5-6 works great! with exact like 1st
results.
All program (1-2-3-4-5-6) works badly.
2011/8/22 Jed Brown <jedbrown at mcs.anl.gov>
> On Sun, Aug 21, 2011 at 16:45, Алексей Рязанов <ram at ibrae.ac.ru> wrote:
>
>> Hello!
>>
>> Could you please help me to solve my performance problem.
>> I have two programs.
>>
>> In 1st I solve one system with one method and one preconditioner and get
>> some performance numbers.
>> I run it 9 times with 9 different preconditioners.
>>
>> In 2nd I solve the same system with the same one method but with 9
>> different preconditioners consecutively one after another.
>> I run it once and also get some performance info.
>> In the 2nd case I have 2-5 times worse results, depending on used method.
>>
>> Each KSPSolve procedure placed in its own stage of course, so I can
>> compare times, flops, messages an so..
>> I can see the difference but cant explain and eliminate it.
>>
>> For example for -ksp_type cgs -pc_type asm -sub_pc_type jacobi
>> -sub_ksp_type preonly:
>> Summary of Stages: ----- Time ------ ----- Flops
>> ----- --- Messages --- -- Message Lengths -- -- Reductions --
>> Avg %Total Avg
>> %Total counts %Total Avg %Total counts %Total
>> one stage frome 2nd: 5.5145e+00 14.9% 1.2336e+09 13.2% 2.494e+03
>> 22.4% 3.230e+03 22.2% 2.250e+03 18.5%
>> the once stage from 1st: 2.7541e+00 93.1% 1.2336e+09 99.8% 2.508e+03
>> 99.3% 1.470e+04 97.4% 2.266e+03 88.0%
>>
>>
>> My programs are pretty equivalent except the part with definition of
>> preconditioners and the number of called KSPSolve procedures.
>> I mean they use equal matrices, equally assemble them, use equal right
>> hand sides, equal convergence monitors.
>> Actually the 2nd one was made from the 1st.
>>
>> In 1st i use KSPSetFromOptions(KSP); and then just set the -ksp_type
>> -pc_type -sub_pc_type -sub_ksp_type keys from command line
>>
>> In 2d i use for for nonblock PC:
>>
>> KSPGetPC(dKSP, &dPC);
>>
>> PCSetType(dPC, PCJACOBI);
>>
>> and for block PC:
>>
>> PCSetType(dPC, PCASM);
>>
>> KSPSetUp(dKSP);
>>
>> PCSetUp(dPC);
>>
>> PCASMGetSubKSP(dPC, &n_local, &first_local, &ASMSubKSP);
>>
>> for (i=0; i<n_local; i++)
>>
>> {
>>
>> KSPGetPC(ASMSubKSP[i], &(SubPC[i]));
>>
>> PCSetType(SubPC[i], PCJACOBI);
>>
>> }
>>
>>
>> Im sure there is a mistake somewhere. Because 1st program compares Jacobi
>> and ASM-Jacobi preconditioners on my problem on the same KSP and tells me
>> that ASM-Jacobi is better and the 2nd shows otherwise results.
>>
>
> This could be a preload issue. You can use the PreLoadBegin()/PreLoadEnd()
> macros if you like, or otherwise solve a system first to make sure
> everything has been loaded. If the results are still confusing, run with
> -ksp_view -log_summary and send the output.
>
> There is no reason for ASM-Jacobi (with -sub_ksp_type preonly, which is
> default) to be better than Jacobi since it does the same algorithm with more
> communication.
>
--
Best regards,
Alexey Ryazanov
______________________________________
Nuclear Safety Institute of Russian Academy of Sciences
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110823/529f91bd/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: KSPview 2nd program 456.out
Type: application/octet-stream
Size: 22794 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110823/529f91bd/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: KSPview 2nd program 123456.out
Type: application/octet-stream
Size: 30805 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110823/529f91bd/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: KSPview 1st program.out
Type: application/octet-stream
Size: 11780 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110823/529f91bd/attachment-0005.obj>
More information about the petsc-users
mailing list