[petsc-users] pedagogical example

Sun Apr 12 13:47:35 CDT 2015

> On Apr 12, 2015, at 12:48 PM, Gideon Simpson <gideon.simpson at gmail.com> wrote:
> 
> I was hoping to demonstrate in my class the computational gain with petsc/mpi in solving a simple problem, like discretized poisson or heat, as the number of processes increases.  Can anyone recommend any of the petsc examples for this purpose?  Perhaps I’m just using poorly chosen KSP/PC pairs, but I haven’t been able to observe gain.  I’m planning to demo this on a commodity intel cluster with infiniband.  

   Gideon,

     I would use src/ksp/ksp/examples/tutorials/ex45 to get across three concepts

   1)  algorithmic complexity (1 process).  Run it with several levels of refinement (say -da_refine 4 depending on how much memory you have) with 
        a) -pc_type jacobi  -ksp_type bcgs  (algorithm with poor computational complexity, very parallel)
        b) -pc_type mg  -ksp_type bcgs    (algorithm with good computational complexity, good parallel but less than jacobi)
       then run it again with one more level of refinement (say -da_refine 5) and see how much more time each method takes

   2) scaling (2 process)  Run as with 1) but on two processes and note that the "poorer" algorithm Jacobi gives better "speedup" then mg

   3) understanding the limitations of your machine (see http://www.mcs.anl.gov/petsc/documentation/faq.html#computers) how total memory bandwidth of all your available cores determines the performance of the PETSc solvers. So run the streams benchmark (included now with PETSc in the src/benchmarks/streams directory) to see its speedup when you use a different number of cores and combinations of cores on each node and across nodes and then run the PETSc example to see its speedups. Note that you likely have to do something smarter than mpiexec -n n ./ex45 ... to launch the program since you need to have control over what nodes the mpi puts each of the processes it starts up; for example does it spread the processes one on each node or first pack them on one node (check the documentation for your mpiexec and how to control this). You will find that different choices lead to very different performance and this can be related to the streams benchmark and available memory bandwidth.

  Barry

> 
> -gideon
>