[petsc-users] Call to PETSc functions way higher when using lower number of processors

Wed Jul 1 18:34:10 CDT 2015

  It looks like you are using the default preconditioner of block Jacobi with one block per process and ILU on each block.  

  "Normally" with block Jacobi as you use more blocks the convergence rate gets worse  in a reasonably monotonic way. But if the particular decomposition of the blocks "drops" important parts of the operator the convergence can be very different with slightly changes in the blocks. It looks like your problem has this kind of structure, in spades; or there is something wrong with your parallel construction of the matrix entries resulting in very different (and wrong) linear systems for different number of processors.

  I suggest you run the following experiment; run with ONE process but use -pc_type bjacobi -sub_pc_type ilu -pc_bjacobi_blocks <blocks> where you use for <blocks> 1 up to 24 and then get the number of iterations needed for each (don't worry about the time it takes, this is done for understanding of the convergence). Send the table of 

Blocks      Iterations
1                 a1
2                 a2
....
24               a24 

and from this you'll be able to see if your matrix does indeed have the special "sensitivity" to the blocks. Till then no speculation.

  Barry

> On Jul 1, 2015, at 4:29 PM, Jose A. Abell M. <jaabell at ucdavis.edu> wrote:
> 
> Dear PETSc-users,
> 
> I'm running the same dummy simulation (which involves solving a 10000 x 10000 linear system of equations 10 times) using 12 and 18 processors on a SMP machine. With 18 processors I spend 3.5s on PETsc calls, with 12 I spend ~260s. 
> 
> Again, the matrix is the same, the only difference is the number of processors, which would affect the ordering of the matrix rows and columns as the domain gets partitioned differently.
> 
> When looking at the performance log I see:
> 
> For 12 processors:
> 
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatSolve          103340 1.0 8.6910e+01 1.2 7.54e+10 1.0 0.0e+00 0.0e+00 0.0e+00 31 34  0  0  0  31 34  0  0  0 10113
> 
> and for 18 processors:
> 
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatSolve             108 1.0 6.9855e-02 1.4 5.25e+07 1.1 0.0e+00 0.0e+00 0.0e+00  2 32  0  0  0   2 32  0  0  0 13136
> 
> 
> 
> The MatSolve count is soo large in the slow case. It is similar for other operations like MatMult and all the vector-oriented operations. I've included the complete logs for these cases.
> 
> What is the main driver behind the number of calls to these functions being so high? Is it only the matrix ordering to blame or maybe there is something else I'm missing?
> 
> Regards and thanks!
> 
> 
> --
> 
> José Abell 
> PhD Candidate
> Computational Geomechanics Group
> Dept. of Civil and Environmental Engineering
> UC Davis
> 
> <petsc_log_slow.txt><petsc_log_fast.txt>