[petsc-users] Question about ksp ex3.c

Matthew Knepley knepley at gmail.com
Thu Sep 29 07:44:53 CDT 2011


On Thu, Sep 29, 2011 at 11:28 AM, Matija Kecman <matijakecman at gmail.com>wrote:

> Thanks for your response Jed! I've been doing some other
> investigations using this example. I made some small modifications:
>
> 1. Added preallocation as Jed Brown suggested in a previous email
> (http://lists.mcs.anl.gov/pipermail/petsc-users/2011-June/009054.html).
> 2. Added a small VTK viewer.
> 3. Set the initial guess to zero.
> 4. Changed the entries in the element stiffness matrix to the following:
>
>   Ke[ 0] =  2./3.; Ke[ 1] = -1./6.; Ke[ 2] = -1./3.; Ke[ 3] = -1./6.;
>   Ke[ 4] = -1./6.; Ke[ 5] =  2./3.; Ke[ 6] = -1./6.; Ke[ 7] = -1./3.;
>   Ke[ 8] = -1./3.; Ke[ 9] = -1./6.; Ke[10] =  2./3.; Ke[11] = -1./6.;
>   Ke[12] = -1./6.; Ke[13] = -1./3.; Ke[14] = -1./6.; Ke[15] =  2./3.;
>
> I computed these by evaluating $K^e_{ij} = \int_{\Omega_e} \nabla
> \psi^e_i \cdot \nabla \psi^e_j \, \mathrm{d}\Omega$ with the shape
> functions $\psi^e$ corresponding to a bilinear quadratic finite
> element denoted by $\Omega_e$. This is different to what was
> originally in the code and I'm not sure where the original code comes
> from. This isn't important so you can just ignore it if you like, I
> get the same solution using both matrices.
>
> ---
>
> I am running on a two compute node clusters which each look as
> follows: 2 quad-core Intel Xeon 5345 processors, 16GB memory. The node
> clusters are connected with the following interconnect: Mellanox
> InfiniScale 2400. I computed my results using a machine file which
> specifies that (up to) the first 8 processes are computed on node1 and
> the second group of 8 processes are computed on node2. My timing
> results are shown in the table below, I'm running each test using
> Bi-CGStab with no preconditioning (-ksp_type bcgs -pc_type none) on a
> computational grid of 800 x 800 cells, so 641601 DOFs. I have attached
> my modified source code (you could look at my changes using diff) and
> the -log_summary output for each of the tests.
>

The way I read these numbers is that there is bandwidth for about 3 cores on
this machine, and non-negligible synchronization penalty:

                        1 proc   2 proc   4 proc    8 proc
VecAXPBYCZ    496        857      1064      1070
VecDot               451        724      1089        736
MatMult              434        638        701        703

The bandwidth tops out between 2 and 4 cores (The 5345 should have 10.6 GB/s
but you should runs streams as Barry says to see what is achievable). There
is
obviously a penalty for VecDot against VecAXPYCZ, which is the sync penalty
which also seems to affect MatMult. Maybe Jed can explain that.

    Matt

# number of processes | time for KSPsolve() | iterations to
> convergence | norm of error
> 1 64.008 692 0.00433961
> 2 36.2767 626 0.00611835
> 4 35.9989 760 0.00311053
> 8 30.5215 664 0.00599148
> 16 14.1164 710 0.00792162
>
> Why is the scaling so poor? I have read the FAQ
> (http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers),
> am I experiencing the problem described? I think my machine has a
> bandwidth of 2GB/s per process as suggested. Also, how can you tell if
> a computation is memory bound by looking at the -log_summary?
>
> Many thanks,
>
> Matija
>
> On Tue, Sep 20, 2011 at 11:44 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> >
> > On Tue, Sep 20, 2011 at 11:45, Matija Kecman <matijakecman at gmail.com>
> wrote:
> >>
> >> $ mpirun -np 1 ./ex3 -ksp_type gmres -pc_type none -m 100
> >> Norm of error 0.570146 Iterations 0
> >
> > This uses a nonzero initial guess so the initial residual norm is
> compared to the right hand side.
> > $ ./ex3 -ksp_type gmres -ksp_monitor -m 100 -pc_type none
> -ksp_converged_reason -info |grep Converged
> > [0] KSPDefaultConverged(): user has provided nonzero initial guess,
> computing 2-norm of preconditioned RHS
> > [0] KSPDefaultConverged(): Linear solver has converged. Residual norm
> 1.113646413065e-04 is less than relative tolerance 1.000000000000e-05 times
> initial right hand side norm 1.291007358616e+01 at iteration 0
> > You can use the true residual, it just costs something so it's not
> enabled by default:
> > $ ./ex3 -ksp_type gmres -ksp_monitor -m 100 -pc_type none
> -ksp_converged_reason -ksp_converged_use_initial_residual_norm
> > [many iterations]
> > Linear solve converged due to CONVERGED_RTOL iterations 1393
> > Norm of error 0.000664957 Iterations 1393
>



-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110929/d7994528/attachment.htm>


More information about the petsc-users mailing list