[petsc-users] Which preconditioners are scalable?

Barry Smith bsmith at mcs.anl.gov
Fri Mar 11 14:51:40 CST 2011


  Good. Then either (1) the matrix is strange in that using a level overlap of 1 behaviors differently for larger problems (which seems unlikely) or (2) there is a memory bug in the code that scarves up much more memory then it should.

   This requires deeper analysis of the implementation. Do we have arrays that grow in size with the number of processes in the code?

  Are you doing the basic ASM with one block per process?


  Barry


On Mar 11, 2011, at 2:35 PM, Sebastian Steiger wrote:

> Hi Barry and Matt
> 
>>   What is N? Is that the number of processes? 
> Yes
> 
>>   What does the notation 5'912'016 mean?
> It means there were 5658160 bytes allocated. I introduced the 's for
> readability.
> 
> 
>>  Are the numbers in your table from a particular process? Or are they summed over all processes?
> Only process 0.
> 
> 
>>   The intention is that the ASM is memory scalable so that if for example you double the number 
>> of processes and double the total number of nonzeros in the matrix
> (probably by doubling the total
>> number of rows and columns in the matrix) each process should require
> essentially the same amount
>> of memory.  But what happens in practice for a particular problem will,
> to some degree, depend on
>> the amount of coupling between processes in the matrix (hence how much
> bigger the local overlapped
>> matrix is then the original matrix on that process) and depend on how
> the domain is sliced up.
>> But even with a "bad" slicing I would not expect the amount of local
> memory needed to double.
>> I think you need to determine more completely what all this memory is
> being used for.
> 
> Doubling the total number of rows and nonzeros is what I think I'm
> doing. Every row has about 40 nonzeros in this example. The coupling /
> slicing should be fine since I am using pretty much the same system for
> another calculation where I compute interior eigenstates in a matrix
> with the same sparsity. There I do not use ASM and I can scale up to
> 80000 cores without memory problems anymore (after I have done a
> workaround for not using AOCreateMapping, see my report earlier this week).
> 
> Also when I turn off ASM and use no preconditioning at all, or when I
> use the Jacobi preconditioner, then memory stays constant at about
> 30MB/core. But then the convergence deteriorates...
> 
> 
> 
> Matt:
> 
>> We have run ASM on 224,000 processors of the XT5 at ORNL, so
>> something else is going on here. The best thing to do here is send us
>> -log_summary. For attachments, we usually recommend
>> petsc-maint at mcs.anl.gov.
> 
> 
> My data also comes from the XT5, but it's important for me to know that
> there are cases where it scales to 224000 processors. I will post more
> complete profiling information to petsc-maint at mcs.anl.gov in a couple of
> minutes.
> 
> 
> Best
> Sebastian
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> 
>>  Barry
>> 
>> 
>> 
>> 
>> On Mar 11, 2011, at 9:52 AM, Sebastian Steiger wrote:
>> 
>>> Hello PETSc developers
>>> 
>>> I'm doing some scaling benchmarks and I found that the parallel asm
>>> preconditioner, my favorite preconditioner, has a limit in the number of
>>> cores it can handle.
>>> 
>>> I am doing a numerical experiment where I scale up the size of my matrix
>>> by roughly the same factor as the number of CPUs employed. When I look
>>> at which function used how much memory using PETSc's routine
>>> PetscMallocDumpLog, I see the following:
>>> 
>>> Function name                        N=300         N=600     increase
>>> ======================================================================
>>> MatGetSubMatrices_MPIAIJ_Local    75'912'016   134'516'928    1.77
>>> MatIncreaseOverlap_MPIAIJ_Once    168'288'288  346'870'832    2.06
>>> MatIncreaseOverlap_MPIAIJ_Receive  2'918'960     5'658'160    1.94
>>> 
>>> The matrix sizes are 6'899'904 and 14'224'896, respectively. Above
>>> N~5000 CPUs I am running out of memory.
>>> 
>>> Here's my question now: Is the asm preconditioner limited from the
>>> algorithm point of view, or is it the implementation? I thought that
>>> 'only' the local matrices, plus some constant overlap with neighbors,
>>> are solved, so that memory consumption should stay constant when I scale
>>> up with a constant number of rows per process.
>>> 
>>> Best
>>> Sebastian
>>> 
>> 
> 



More information about the petsc-users mailing list