[petsc-users] LU factorization and solution of independent matrices does not scale, why?

Thu Dec 20 14:16:52 CST 2012

In my multilevel FETI-DP code, I have localized course matrices, which 
are defined on only a subset of all MPI tasks, typically between 4 and 
64 tasks. The MatAIJ and the KSP objects are both defined on a MPI 
communicator, which is a subset of MPI::COMM_WORLD. The LU factorization 
of the matrices is computed with either MUMPS or superlu_dist, but both 
show some scaling property I really wonder of: When the overall problem 
size is increased, the solve with the LU factorization of the local 
matrices does not scale! But why not? I just increase the number of 
local matrices, but all of them are independent of each other. Some 
example: I use 64 cores, each coarse matrix is spanned by 4 cores so 
there are 16 MPI communicators with 16 coarse space matrices. The 
problem need to solve 192 times with the coarse space systems, and this 
takes together 0.09 seconds. Now I increase the number of cores to 256, 
but let the local coarse space be defined again on only 4 cores. Again, 
192 solutions with these coarse spaces are required, but now this takes 
0.24 seconds. The same for 1024 cores, and we are at 1.7 seconds for the 
local coarse space solver!

For me, this is a total mystery! Any idea how to explain, debug and 
eventually how to resolve this problem?

Thomas