[petsc-users] approaches to reduce computing time

Sun Nov 10 13:20:18 CST 2013

Roc Wang <pengxwang at hotmail.com> writes:

> Hi all,
>
>    I am trying to minimize the computing time to solve a large sparse matrix. The matrix dimension is with m=321 n=321 and p=321. I am trying to reduce the computing time from two directions: 1 finding a Pre-conditioner to reduce the number of iterations which reduces the time numerically, 2 requesting more cores.
>
> ----For the first method, I tried several methods:
>  1 default KSP and PC,
>  2 -ksp_type fgmres -ksp_gmres_restart 30 -pc_type ksp  -ksp_pc_type jacobi, 
>  3 -ksp_type lgmres  -ksp_gmres_restart 40 -ksp_lgmres_augment 10,
>  4 -ksp_type lgmres  -ksp_gmres_restart 50 -ksp_lgmres_augment 10,
>  5 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10 -pc_type asm (PCASM)
>
> The iterations and timing is like the following with 128 cores requested:
> case# iter      timing (s)
> 1       1436        816  
> 2             3    12658
> 3       1069        669.64
> 4         872        768.12
> 5       927          513.14
>
> It can be seen that change -ksp_gmres_restart and -ksp_lgmres_augment can help to reduce the iterations but not the timing (comparing case 3 and 4). Second, the PCASM helps a lot.  Although the second option is able to reduce iterations, the timing increases very much. Is it because more operations are needed in the PC?
>
> My questions here are: 1. Which direction should I take to select
> -ksp_gmres_restart and -ksp_lgmres_augment? For example, if larger
> restart with large augment is better or larger restart with smaller
> augment is better?

Look at the -log_summary.  By increasing the restart, the work in
KSPGMRESOrthog will increase linearly, but the number of iterations
might decrease enough to compensate.  There is no general rule here
since it depends on the relative expense of operations for your problem
on your machine.

> ----For the second method, I tried with -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10 -pc_type asm with different number of cores.   I found the speedup ratio increases slowly when  more than 32 to 64 cores are requested. I searched the milling list archives and found that I am very likely running into the memory bandwidth bottleneck. http://www.mail-archive.com/petsc-users@mcs.anl.gov/msg19152.html:
>
> # of cores       iter     timing
>     1                 923   19541.83
>     4                 929     5897.06
>     8                 932     4854.72
>   16                 924     1494.33
>   32                 924     1480.88
>   64                 928       686.89
> 128                 927       627.33
> 256                 926       552.93

The bandwidth issue has more to do with using multiple cores within a
node rather than between nodes.  Likely the above is a load balancing
problem or bad communication.

> My question here is:    Is there any other PC can help on both reducing iterations and increasing scalability? Thanks. 

Always send -log_summary with questions like this, but algebraic multigrid is a good place to start.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20131110/8bcda7a2/attachment.pgp>