[petsc-users] Which preconditioners are scalable?

Sebastian Steiger steiger at purdue.edu
Fri Mar 11 14:35:08 CST 2011


Hi Barry and Matt

>    What is N? Is that the number of processes? 
Yes

>    What does the notation 5'912'016 mean?
It means there were 5658160 bytes allocated. I introduced the 's for
readability.


>   Are the numbers in your table from a particular process? Or are they summed over all processes?
Only process 0.


>    The intention is that the ASM is memory scalable so that if for example you double the number 
>of processes and double the total number of nonzeros in the matrix
(probably by doubling the total
>number of rows and columns in the matrix) each process should require
essentially the same amount
>of memory.  But what happens in practice for a particular problem will,
to some degree, depend on
>the amount of coupling between processes in the matrix (hence how much
bigger the local overlapped
>matrix is then the original matrix on that process) and depend on how
the domain is sliced up.
>But even with a "bad" slicing I would not expect the amount of local
memory needed to double.
>I think you need to determine more completely what all this memory is
being used for.

Doubling the total number of rows and nonzeros is what I think I'm
doing. Every row has about 40 nonzeros in this example. The coupling /
slicing should be fine since I am using pretty much the same system for
another calculation where I compute interior eigenstates in a matrix
with the same sparsity. There I do not use ASM and I can scale up to
80000 cores without memory problems anymore (after I have done a
workaround for not using AOCreateMapping, see my report earlier this week).

Also when I turn off ASM and use no preconditioning at all, or when I
use the Jacobi preconditioner, then memory stays constant at about
30MB/core. But then the convergence deteriorates...



Matt:

> We have run ASM on 224,000 processors of the XT5 at ORNL, so
> something else is going on here. The best thing to do here is send us
> -log_summary. For attachments, we usually recommend
> petsc-maint at mcs.anl.gov.


My data also comes from the XT5, but it's important for me to know that
there are cases where it scales to 224000 processors. I will post more
complete profiling information to petsc-maint at mcs.anl.gov in a couple of
minutes.


Best
Sebastian














> 
>   Barry
> 
> 
> 
> 
> On Mar 11, 2011, at 9:52 AM, Sebastian Steiger wrote:
> 
>> Hello PETSc developers
>>
>> I'm doing some scaling benchmarks and I found that the parallel asm
>> preconditioner, my favorite preconditioner, has a limit in the number of
>> cores it can handle.
>>
>> I am doing a numerical experiment where I scale up the size of my matrix
>> by roughly the same factor as the number of CPUs employed. When I look
>> at which function used how much memory using PETSc's routine
>> PetscMallocDumpLog, I see the following:
>>
>> Function name                        N=300         N=600     increase
>> ======================================================================
>> MatGetSubMatrices_MPIAIJ_Local    75'912'016   134'516'928    1.77
>> MatIncreaseOverlap_MPIAIJ_Once    168'288'288  346'870'832    2.06
>> MatIncreaseOverlap_MPIAIJ_Receive  2'918'960     5'658'160    1.94
>>
>> The matrix sizes are 6'899'904 and 14'224'896, respectively. Above
>> N~5000 CPUs I am running out of memory.
>>
>> Here's my question now: Is the asm preconditioner limited from the
>> algorithm point of view, or is it the implementation? I thought that
>> 'only' the local matrices, plus some constant overlap with neighbors,
>> are solved, so that memory consumption should stay constant when I scale
>> up with a constant number of rows per process.
>>
>> Best
>> Sebastian
>>
> 



More information about the petsc-users mailing list