Load Balancing and KSPSolve

Barry Smith bsmith at mcs.anl.gov
Tue Nov 20 12:43:33 CST 2007


   Tim,

     This is an unrelated comment, but may help you with scaling to  
many processes.
Since the matrix is so SMALL it will be hard to get good scaling on  
the linear solves
for a large number of processes, but since you need MANY right hand  
sides you
might consider having different groups of processes (MPI_Comms) handle  
collections
of right hand sides. For example  if you have 64 processes you might  
use 4 MPI_Comm's
each of size 16, or even 8 MPI_Comm's each of size 8. Coding this is  
easy
simply use MPI to generate the appropriate communicator (for the  
subsets of processes)
and then create the Mat, the KSP etc on that communicator instead of  
MPI_COMM_WORLD


    Barry



On Nov 20, 2007, at 11:45 AM, Tim Stitt wrote:

> Hi all (again),
>
> I finally got some data back from the KSP PETSc code that I put  
> together to solve this sparse inverse matrix problem I was looking  
> into. Ideally I am aiming for a O(N) (time complexity) approach to  
> getting the first 'k' columns of the inverse of a sparse matrix.
>
> To recap the method: I have my solver which uses KSPSolve in a loop  
> that iterates over the first k columns of an identity matrix B and  
> computes the corresponding x vector.
>
> I am just a bit curious about some of the timings I am  
> obtaining...which I hope someone can explain. Here are the timings I  
> obtained for a global sparse matrix (4704 x 4704) and solving for  
> the first 1176 columns in the identity using P processes  
> (processors) on our cluster.
>
> (Timings are given in seconds for each process performing work in  
> the loop and were obtained by encapsulating the loop with the  
> cpu_time() Fortran intrinsic. The MUMPS package was requested for  
> factorisation/solving, although similar timings were obtained for  
> both the native solver and SUPERLU)
>
> P=1  [30.92]
> P=2  [15.47, 15.54]
> P=4  [4.68, 5.49, 4.67, 5.07]
> P=8  [2.36, 4,23, 2.81, 2.54, 3.42, 2.22, 1.41, 3.15]
> P=16 [1.04, 0.45, 1.08, 0.27, 0.87, 0.93, 1.1, 1.06, 0.29, 0.34,  
> 0.73, 0.25, 0.43, 1.09, 1.08, 1.1]
>
> Firstly, I notice very good scalability up to 16 processes...is this  
> expected (by those people who use these solvers regularly)?
>
> Also I notice that the timings per process vary as we scale up. Is  
> this a load-balancing problem related to more non-zero values being  
> on a given processor than others? Once again is this expected?
>
> Please excuse my ignorance of matters relating to these solvers and  
> their operation...as it really isn't my field of expertise.
>
> Regards,
>
> Tim.
>
> -- 
> Dr. Timothy Stitt <timothy_dot_stitt_at_ichec.ie>
> HPC Application Consultant - ICHEC (www.ichec.ie)
>
> Dublin Institute for Advanced Studies
> 5 Merrion Square - Dublin 2 - Ireland
>
> +353-1-6621333 (tel) / +353-1-6621477 (fax)
>




More information about the petsc-users mailing list