[petsc-users] Which preconditioners are scalable?

Fri Mar 11 15:00:32 CST 2011

> Good. Then either (1) the matrix is strange in that using a level overlap of 1 behaviors differently for larger problems (which seems unlikely) or (2) there is a memory bug in the code that scarves up much more memory then it should.
I've just posted detailed data to petsc-maint.


>  This requires deeper analysis of the implementation. Do we have arrays that grow in size with the number of processes in the code?
I'm pretty sure that I do not have such arrays on my side. For
simulations employing iterative solutions without ASM memory
consumption is small and roughly constant when I scale up.


>   Are you doing the basic ASM with one block per process?
Yes.


Best
Sebastian


> 
> 
>   Barry
> 
> 
> On Mar 11, 2011, at 2:35 PM, Sebastian Steiger wrote:
> 
>> Hi Barry and Matt
>>
>>>   What is N? Is that the number of processes? 
>> Yes
>>
>>>   What does the notation 5'912'016 mean?
>> It means there were 5658160 bytes allocated. I introduced the 's for
>> readability.
>>
>>
>>>  Are the numbers in your table from a particular process? Or are they summed over all processes?
>> Only process 0.
>>
>>
>>>   The intention is that the ASM is memory scalable so that if for example you double the number 
>>> of processes and double the total number of nonzeros in the matrix
>> (probably by doubling the total
>>> number of rows and columns in the matrix) each process should require
>> essentially the same amount
>>> of memory.  But what happens in practice for a particular problem will,
>> to some degree, depend on
>>> the amount of coupling between processes in the matrix (hence how much
>> bigger the local overlapped
>>> matrix is then the original matrix on that process) and depend on how
>> the domain is sliced up.
>>> But even with a "bad" slicing I would not expect the amount of local
>> memory needed to double.
>>> I think you need to determine more completely what all this memory is
>> being used for.
>>
>> Doubling the total number of rows and nonzeros is what I think I'm
>> doing. Every row has about 40 nonzeros in this example. The coupling /
>> slicing should be fine since I am using pretty much the same system for
>> another calculation where I compute interior eigenstates in a matrix
>> with the same sparsity. There I do not use ASM and I can scale up to
>> 80000 cores without memory problems anymore (after I have done a
>> workaround for not using AOCreateMapping, see my report earlier this week).
>>
>> Also when I turn off ASM and use no preconditioning at all, or when I
>> use the Jacobi preconditioner, then memory stays constant at about
>> 30MB/core. But then the convergence deteriorates...
>>
>>
>>
>> Matt:
>>
>>> We have run ASM on 224,000 processors of the XT5 at ORNL, so
>>> something else is going on here. The best thing to do here is send us
>>> -log_summary. For attachments, we usually recommend
>>> petsc-maint at mcs.anl.gov.
>>
>>
>> My data also comes from the XT5, but it's important for me to know that
>> there are cases where it scales to 224000 processors. I will post more
>> complete profiling information to petsc-maint at mcs.anl.gov in a couple of
>> minutes.
>>
>>
>> Best
>> Sebastian
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>>
>>>  Barry
>>>
>>>
>>>
>>>
>>> On Mar 11, 2011, at 9:52 AM, Sebastian Steiger wrote:
>>>
>>>> Hello PETSc developers
>>>>
>>>> I'm doing some scaling benchmarks and I found that the parallel asm
>>>> preconditioner, my favorite preconditioner, has a limit in the number of
>>>> cores it can handle.
>>>>
>>>> I am doing a numerical experiment where I scale up the size of my matrix
>>>> by roughly the same factor as the number of CPUs employed. When I look
>>>> at which function used how much memory using PETSc's routine
>>>> PetscMallocDumpLog, I see the following:
>>>>
>>>> Function name                        N=300         N=600     increase
>>>> ======================================================================
>>>> MatGetSubMatrices_MPIAIJ_Local    75'912'016   134'516'928    1.77
>>>> MatIncreaseOverlap_MPIAIJ_Once    168'288'288  346'870'832    2.06
>>>> MatIncreaseOverlap_MPIAIJ_Receive  2'918'960     5'658'160    1.94
>>>>
>>>> The matrix sizes are 6'899'904 and 14'224'896, respectively. Above
>>>> N~5000 CPUs I am running out of memory.
>>>>
>>>> Here's my question now: Is the asm preconditioner limited from the
>>>> algorithm point of view, or is it the implementation? I thought that
>>>> 'only' the local matrices, plus some constant overlap with neighbors,
>>>> are solved, so that memory consumption should stay constant when I scale
>>>> up with a constant number of rows per process.
>>>>
>>>> Best
>>>> Sebastian
>>>>
>>>
>>
>