[petsc-users] LU factorization and solution of independent matrices does not scale, why?

Thu Dec 20 14:19:59 CST 2012

In my multilevel FETI-DP code, I have localized course matrices, which  
are defined on only a subset of all MPI tasks, typically between 4 and  
64 tasks. The MatAIJ and the KSP objects are both defined on a MPI  
communicator, which is a subset of MPI::COMM_WORLD. The LU  
factorization of the matrices is computed with either MUMPS or  
superlu_dist, but both show some scaling property I really wonder of:  
When the overall problem size is increased, the solve with the LU  
factorization of the local matrices does not scale! But why not? I  
just increase the number of local matrices, but all of them are  
independent of each other. Some example: I use 64 cores, each coarse  
matrix is spanned by 4 cores so there are 16 MPI communicators with 16  
coarse space matrices. The problem need to solve 192 times with the  
coarse space systems, and this takes together 0.09 seconds. Now I  
increase the number of cores to 256, but let the local coarse space be  
defined again on only 4 cores. Again, 192 solutions with these coarse  
spaces are required, but now this takes 0.24 seconds. The same for  
1024 cores, and we are at 1.7 seconds for the local coarse space solver!

For me, this is a total mystery! Any idea how to explain, debug and  
eventually how to resolve this problem?

Thomas