[petsc-users] Question about ksp ex3.c

Thu Sep 29 07:49:00 CDT 2011

Thanks Barry, I'm using PETSc 3.1.0 Patch 8, is there something
similar I can do that would work for this version?

On Thu, Sep 29, 2011 at 1:42 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>  If you are using petsc-3.2 you can also run "make streams" in the $PETSC_DIR and that will run the streams benchmark giving you a pretty good idea of how the memory bandwidth scales with the number of cores. If the streams benchmark does not scale with the cores then no iterative solver will scale with more cores.
>
>   Barry
>
> On Sep 29, 2011, at 6:28 AM, Matija Kecman wrote:
>
>> Thanks for your response Jed! I've been doing some other
>> investigations using this example. I made some small modifications:
>>
>> 1. Added preallocation as Jed Brown suggested in a previous email
>> (http://lists.mcs.anl.gov/pipermail/petsc-users/2011-June/009054.html).
>> 2. Added a small VTK viewer.
>> 3. Set the initial guess to zero.
>> 4. Changed the entries in the element stiffness matrix to the following:
>>
>>   Ke[ 0] =  2./3.; Ke[ 1] = -1./6.; Ke[ 2] = -1./3.; Ke[ 3] = -1./6.;
>>   Ke[ 4] = -1./6.; Ke[ 5] =  2./3.; Ke[ 6] = -1./6.; Ke[ 7] = -1./3.;
>>   Ke[ 8] = -1./3.; Ke[ 9] = -1./6.; Ke[10] =  2./3.; Ke[11] = -1./6.;
>>   Ke[12] = -1./6.; Ke[13] = -1./3.; Ke[14] = -1./6.; Ke[15] =  2./3.;
>>
>> I computed these by evaluating $K^e_{ij} = \int_{\Omega_e} \nabla
>> \psi^e_i \cdot \nabla \psi^e_j \, \mathrm{d}\Omega$ with the shape
>> functions $\psi^e$ corresponding to a bilinear quadratic finite
>> element denoted by $\Omega_e$. This is different to what was
>> originally in the code and I'm not sure where the original code comes
>> from. This isn't important so you can just ignore it if you like, I
>> get the same solution using both matrices.
>>
>> ---
>>
>> I am running on a two compute node clusters which each look as
>> follows: 2 quad-core Intel Xeon 5345 processors, 16GB memory. The node
>> clusters are connected with the following interconnect: Mellanox
>> InfiniScale 2400. I computed my results using a machine file which
>> specifies that (up to) the first 8 processes are computed on node1 and
>> the second group of 8 processes are computed on node2. My timing
>> results are shown in the table below, I'm running each test using
>> Bi-CGStab with no preconditioning (-ksp_type bcgs -pc_type none) on a
>> computational grid of 800 x 800 cells, so 641601 DOFs. I have attached
>> my modified source code (you could look at my changes using diff) and
>> the -log_summary output for each of the tests.
>>
>> # number of processes | time for KSPsolve() | iterations to
>> convergence | norm of error
>> 1 64.008 692 0.00433961
>> 2 36.2767 626 0.00611835
>> 4 35.9989 760 0.00311053
>> 8 30.5215 664 0.00599148
>> 16 14.1164 710 0.00792162
>>
>> Why is the scaling so poor? I have read the FAQ
>> (http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers),
>> am I experiencing the problem described? I think my machine has a
>> bandwidth of 2GB/s per process as suggested. Also, how can you tell if
>> a computation is memory bound by looking at the -log_summary?
>>
>> Many thanks,
>>
>> Matija
>>
>> On Tue, Sep 20, 2011 at 11:44 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>>>
>>> On Tue, Sep 20, 2011 at 11:45, Matija Kecman <matijakecman at gmail.com> wrote:
>>>>
>>>> $ mpirun -np 1 ./ex3 -ksp_type gmres -pc_type none -m 100
>>>> Norm of error 0.570146 Iterations 0
>>>
>>> This uses a nonzero initial guess so the initial residual norm is compared to the right hand side.
>>> $ ./ex3 -ksp_type gmres -ksp_monitor -m 100 -pc_type none -ksp_converged_reason -info |grep Converged
>>> [0] KSPDefaultConverged(): user has provided nonzero initial guess, computing 2-norm of preconditioned RHS
>>> [0] KSPDefaultConverged(): Linear solver has converged. Residual norm 1.113646413065e-04 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 1.291007358616e+01 at iteration 0
>>> You can use the true residual, it just costs something so it's not enabled by default:
>>> $ ./ex3 -ksp_type gmres -ksp_monitor -m 100 -pc_type none -ksp_converged_reason -ksp_converged_use_initial_residual_norm
>>> [many iterations]
>>> Linear solve converged due to CONVERGED_RTOL iterations 1393
>>> Norm of error 0.000664957 Iterations 1393
>> <ex3_log_summary_1process><ex3_log_summary_2process><ex3_log_summary_4process><ex3_log_summary_8process><ex3_log_summary_16process><ex3_modified.c>
>
>