[petsc-users] LU factorization and solution of independent matrices does not scale, why?
Matthew Knepley
knepley at gmail.com
Thu Dec 20 19:19:45 CST 2012
On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski
<Thomas.Witkowski at tu-dresden.de> wrote:
> I cannot use the information from log_summary, as I have three different LU
> factorizations and solve (local matrices and two hierarchies of coarse
> grids). Therefore, I use the following work around to get the timing of the
> solve I'm intrested in:
You misunderstand how to use logging. You just put these thing in
separate stages. Stages represent
parts of the code over which events are aggregated.
Matt
> MPI::COMM_WORLD.Barrier();
> wtime = MPI::Wtime();
> KSPSolve(*(data->ksp_schur_primal_local), tmp_primal, tmp_primal);
> FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>
> The factorization is done explicitly before with "KSPSetUp", so I can
> measure the time for LU factorization. It also does not scale! For 64 cores,
> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, the
> local coarse space matrices defined on four cores have exactly the same
> number of rows and exactly the same number of non zero entries. So, from my
> point of view, the time should be absolutely constant.
>
> Thomas
>
> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>
>
>>
>> Are you timing ONLY the time to factor and solve the subproblems? Or
>> also the time to get the data to the collection of 4 cores at a time?
>>
>> If you are only using LU for these problems and not elsewhere in the
>> code you can get the factorization and time from MatLUFactor() and
>> MatSolve() or you can use stages to put this calculation in its own stage
>> and use the MatLUFactor() and MatSolve() time from that stage.
>> Also look at the load balancing column for the factorization and solve
>> stage, it is well balanced?
>>
>> Barry
>>
>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski
>> <thomas.witkowski at tu-dresden.de> wrote:
>>
>>> In my multilevel FETI-DP code, I have localized course matrices, which
>>> are defined on only a subset of all MPI tasks, typically between 4 and 64
>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI
>>> communicator, which is a subset of MPI::COMM_WORLD. The LU factorization of
>>> the matrices is computed with either MUMPS or superlu_dist, but both show
>>> some scaling property I really wonder of: When the overall problem size is
>>> increased, the solve with the LU factorization of the local matrices does
>>> not scale! But why not? I just increase the number of local matrices, but
>>> all of them are independent of each other. Some example: I use 64 cores,
>>> each coarse matrix is spanned by 4 cores so there are 16 MPI communicators
>>> with 16 coarse space matrices. The problem need to solve 192 times with the
>>> coarse space systems, and this takes together 0.09 seconds. Now I increase
>>> the number of cores to 256, but let the local coarse space be defined again
>>> on only 4 cores. Again, 192 solutions with these coarse spaces are
>>> required, but now this takes 0.24 seconds. The same for 1024 cores, and we
>>> are at 1.7 seconds for the local coarse space solver!
>>>
>>> For me, this is a total mystery! Any idea how to explain, debug and
>>> eventually how to resolve this problem?
>>>
>>> Thomas
>>
>>
>>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener
More information about the petsc-users
mailing list