[petsc-dev] sor smoothers

Mon Aug 12 18:19:30 CDT 2013

  Jed,

   I don't understand your computations. I run

./ex29 -ksp_type chebyshev -ksp_max_it 2 -ksp_monitor -pc_type eisenstat -log_summary -ksp_view -ksp_chebyshev_eigenvalues 1.0,2.0 -ksp_norm_type none

85	  ierr =   MatSOR(eis->A,eis->b[pc->presolvedone-1],eis->omega,(MatSORType)(SOR_ZERO_INITIAL_GUESS | SOR_LOCAL_FORWARD_SWEEP),0.0,1,1,b);CHKERRQ(ierr);
Breakpoint 2, MatSOR (mat=0x7f91ac1a7e70, b=0x7f91ac1cf470, omega=1, flag=SOR_EISENSTAT, shift=0, its=1, lits=1, x=0x7f91ac1db070) at matrix.c:3681
Breakpoint 2, MatSOR (mat=0x7f91ac1a7e70, b=0x7f91ac1d5270, omega=1, flag=SOR_EISENSTAT, shift=0, its=1, lits=1, 
102	  ierr = MatSOR(eis->A,eis->b[pc->presolvedone],eis->omega,(MatSORType)(SOR_ZERO_INITIAL_GUESS | SOR_LOCAL_BACKWARD_SWEEP),0.0,1,1,x);CHKERRQ(ierr);

so get .5 + 1 + 1 + .5 = 3 work units (where a work unit is a matmult).  

I than run snes ex19 with -da_refine 6 so that the matrix ops swamp the vector ops and get 

MatMult                2 1.0 1.6258e-02 1.0 1.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 12  0  0  0   0 12  0  0  0   735
MatSOR                 4 1.0 4.9209e-02 1.0 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 17  0  0  0   0 17  0  0  0   363

note that the MatMult() time and flop counts are included in the SOR time.

Now I run with 

./ex29 -ksp_type chebyshev -ksp_max_it 2 -ksp_monitor -pc_type sor  -log_summary -ksp_view -ksp_chebyshev_eigenvalues 1.0,2.0 -ksp_norm_type none 

#1  0x000000010232ca57 in PCApply_SOR (pc=0x7ffb4697a470, x=0x7ffb469dc470, y=0x7ffb469d0870) at sor.c:35
35	  ierr = MatSOR(pc->pmat,x,jac->omega,(MatSORType)flag,jac->fshift,jac->its,jac->lits,y);CHKERRQ(ierr);
(gdb) p flag
$1 = 28

Breakpoint 2, MatMult (mat=0x7ffb469a9270, x=0x7ffb469d0870, y=0x7ffb469dc470) at matrix.c:2212
Breakpoint 1, MatSOR (mat=0x7ffb469a9270, b=0x7ffb469dc470, omega=1, flag=28, shift=0, its=1, lits=1, x=0x7ffb469d6670) at 
Breakpoint 2, MatMult (mat=0x7ffb469a9270, x=0x7ffb469d6670, y=0x7ffb469dc470) at matrix.c:2212
Breakpoint 1, MatSOR (mat=0x7ffb469a9270, b=0x7ffb469dc470, omega=1, flag=28, shift=0, its=1, lits=1, x=0x7ffb469a1670) at 

so get (.5 + .5) + 1 + (.5 + .5) + 1 + (.5 + .5) = 5 work units  

Note that for the first iteration with a zero initial guess the down AND UP sweep together can be done in one work unit; the values in the downsweep are saved and used to save work in the up sweep.

MatMult                2 1.0 1.1801e-02 1.0 1.16e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 10  0  0  0   0 10  0  0  0   981
MatSOR                 3 1.0 4.6818e-02 1.0 1.78e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 16  0  0  0   0 16  0  0  0   380

Thus we see that we save all of the MatMult time which is 2 units of the 5 units needed with SOR in terms of flops computed so 40% of the work but only 20% of the time.

On the post-smooth of the multigrid there is a nonzero initial guess eisenstat does 

  if (nonzero) {
    ierr = VecCopy(x,eis->b[pc->presolvedone-1]);CHKERRQ(ierr);
    ierr = MatSOR(eis->A,eis->b[pc->presolvedone-1],eis->omega,SOR_APPLY_UPPER,0.0,1,1,x);CHKERRQ(ierr);

so an extra .5 work unit 

while Chebychev does the matrix vector product to get the initial residual so 

Eisenstat is 3 units + .5 unit + 1 unit = 4.5 units
SOR           5  units               + 1 unit = 6 units 

so for combined pre and post smooth Eisenstat/SOR = 7.5/11 work units

Sending to petsc-dev so this get archived.

Barry

I also realized that the trick of applying the symmetric down then up in only .5 work units + .5 units by saving the intermediate sums could probably by used in for future iterations as well (the   while (its--) { case in MatSOR_SeqAIJ()) this would improve the speed of plain old SOR(3) etc a great deal. At the expensive of even more convoluted code :-).

Note also the Eisentat introduces several extra flops per grid point in vector operations so for the 5 point stencil the work units are a poor measure of actual flops performed.

On Aug 12, 2013, at 3:13 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> "Mark F. Adams" <mfadams at lbl.gov> writes:
>> In correction scheme MG (not FAS) pre smoothing has no initial
>> solution but post smoothing does.  So if we are doing one symmetric GS
>> (two GS iterations) as smoothers then Eisenstat would save us half the
>> work on the first (of 4) G-S iteration and so about 1/8 reduction in
>> work.  Is that correct?
> 
> Cheby(2)+SOR with -ksp_norm_type none currently does
> 
>  (2 units left precond SSOR) + 2*(2 units SSOR + 1 MatMult) = 8 units down-smooth
> 
> and 9 units for nonzero initial guess (need a MatMult up-front).  I
> think Mark is complaining about the half work unit that can be saved in
> the left-precond SSOR starting with zero initial guess.
> 
> With Eisenstat, we have
> 
>  (.5 unit pre-solve with zero guess) + 2*(2 units SSOR) + (1 unit post-solve) = 5.5 units down
> 
> and 7 units in the up-smoother due to a MatMult and the pre-solve being
> a full sweep.  Adding these up, I think I'm seeing a minimum of 16.5
> work units for SOR pre+post-smoothing as compared to 12.5 work units for
> Eisenstat.
> 
> Does this sound right?