[petsc-dev] [petsc-maint #66751] Re: [petsc-users] Which preconditioners are scalable?

Wed Mar 30 18:50:45 CDT 2011

   Sebastian,

    Don't ask me, ask US!   I know Matt and Jed did a couple of things but don't know what is left to be done. 

   Everyone please chip in what needs to be done and if they can contribute. I think Hong finished the AO so she might be interested in tackling the rest of what needs to be resolved here and Satish can help here since she did much of the original ctable stuff.

    Barry

On Mar 30, 2011, at 9:32 AM, Sebastian Steiger wrote:

> Barry,
> 
> Have you by any chance had the time to look into this issue?
> 
> Sebastian
> 
> 
> 
> On 03/12/2011 07:48 PM, Barry Smith wrote:
>> 
>> On Mar 12, 2011, at 6:42 PM, Matthew Knepley wrote:
>> 
>>> On Sat, Mar 12, 2011 at 6:36 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>> 
>>> 
>>> I thought all that code is there and turned on. I have PETSC_USE_CTABLE defined. Why doesn't he?
>> 
>>    Only some things in parallel matrices has the memory scalable version, like putting values into an assembled matrix so that code is scattered with PETSC_USE_CTABLE.
>> 
>>    Other code like the get overlap and get submatrices was never made memory scalable using ctable. That is the code that is screwing him. I had completely forgotten/not thought about the non memory scaleble  issues in these routines for 15 years.
>> 
>> 
>>     Barry
>> 
>>> 
>>>   Matt
>>> 
>>>  Sebastian,
>>> 
>>>     Now you see the awful truth. Satish and I wrote this code 15 years ago and using the array colmap was ok then. I got old and the machines slowly got larger making what use to be ok no longer scalable.
>>> 
>>> 
>>>  Barry
>>> 
>>>> 
>>>> I don't know the meaning of these variables, but the latter two look
>>>> suspicious to me :-)
>>>> 
>>>> 
>>>> 
>>>>> We'll get this figured out and fixed even if it burns all the
>>>> bandwidth between purdue and ANL.
>>>> 
>>>> I don't know what that means, but I'm glad you are willing to help me.
>>>> 
>>>> 
>>>> 
>>>> Best
>>>> Sebastian
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 03/12/2011 03:23 PM, Barry Smith wrote:
>>>>> 
>>>>> On Mar 12, 2011, at 2:12 PM, Sebastian Steiger wrote:
>>>>> 
>>>>>> Barry,
>>>>>> 
>>>>>> I have checked out the development version and was able to compile it -
>>>>>> however, there is a major problem because my code also uses SLEPc and
>>>>>> that one isn't compatible anymore with the development version of PETSc
>>>>>> ( seems like you guys changed EXTERN->extern, and also
>>>>>> INSTALLDIR->DESTDIR in petscvariables - I suppose there are more changes
>>>>>> like this).
>>>>>> 
>>>>>> Hence I cannot link my application against petsc-dev. However, I
>>>>>> compared src/mat/impls/aij/mpi/miov.c in petsc-3.1-p4 (which I am using)
>>>>>> and petsc-dev, and I presume you inserted
>>>>>> 
>>>>>> ierr = PetscInfo2(C,"Number of outgoing messages %D Total message length
>>>>>> %D\n",nrqs,msz);CHKERRQ(ierr);
>>>>>> 
>>>>>> I have inserted this line manually into petsc-3.1-p4 and am now
>>>>>> recompiling it. Then I rerun the stuff using -info.
>>>>>> 
>>>>> OK
>>>>> 
>>>>>> Best
>>>>>> Sebastian
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 03/11/2011 11:00 PM, Barry Smith wrote:
>>>>>>> 
>>>>>>> Sebastian
>>>>>>> 
>>>>>>> I've gone through all the mallocs in MatGetSubMatrices_MPIAIJ_Local() and don't see why there should be a problem with "normal" matrices.
>>>>>>> 
>>>>>>> I checked also in your data the memory used in the local part of the original matrix and then in the overlapping matrix and that memory scaled, so the problem is not that the memory of the matrices grows, making it more likely your matrices are "normal".
>>>>>>> 
>>>>>>>  I have added a PetscInfo() call to MatGetSubMatrices_MPIAIJ_Local to  provide useful information to further debug the problem. So if you could hg pull ; recompile in the src/mat/impls/aij/mpi and relink the program than run the 300 and 600 cases with the option -info and send use the two resulting outputs. From this information I can decide what needs to be done next.
>>>>>>> 
>>>>>>> Once we track down the issue in atGetSubMatrices_MPIAIJ_Local() I think it is likely the same cause is there for the get overlap and that will be easy to find.
>>>>>>> 
>>>>>>> Barry
>>>>>>> 
>>>>>>> Unfortunately you have to be using petsc-dev to do this.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mar 11, 2011, at 2:55 PM, Sebastian Steiger wrote:
>>>>>>> 
>>>>>>>> Matt, Barry,
>>>>>>>> 
>>>>>>>> Attached you find some detailed function-resolved information on memory,
>>>>>>>> time and flops for my simulation that uses the ASM preconditioner, for
>>>>>>>> 300, 600 and 1200 cores.
>>>>>>>> 
>>>>>>>> For this run I have set the ASM overlap to be 0. Seems to me that the
>>>>>>>> memory consumption is dominated by MatGetSubMatrices_MPIAIJ_Local() for
>>>>>>>> this case.
>>>>>>>> 
>>>>>>>> Also look for the following lines (here for N=300):
>>>>>>>> 
>>>>>>>> [StrainVFF] total problem size (DOFs): 6899904
>>>>>>>> [StrainVFF] p0 stores 8008 atoms
>>>>>>>> Avg line fill in Hessian: 46.259490509491 on same process,
>>>>>>>> 3.4547952047952 on other processes.
>>>>>>>> p0 number of ghost points: 6204
>>>>>>>> 
>>>>>>>> The fact that all these numbers stay roughly constant except for the
>>>>>>>> total problem size tells me that my problem should be scalable.
>>>>>>>> 
>>>>>>>> My simulation actually employs the PETSc newton solver which needs 4
>>>>>>>> iterations to reach convergence. So every run solves Ax=b four times.
>>>>>>>> 
>>>>>>>> I appreciate you looking into the memory scalability issue.
>>>>>>>> 
>>>>>>>> Best
>>>>>>>> Sebastian
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 03/11/2011 11:56 AM, Matthew Knepley wrote:
>>>>>>>>> On Fri, Mar 11, 2011 at 9:52 AM, Sebastian Steiger <steiger at purdue.edu
>>>>>>>>> <mailto:steiger at purdue.edu>> wrote:
>>>>>>>>> 
>>>>>>>>> Hello PETSc developers
>>>>>>>>> 
>>>>>>>>> I'm doing some scaling benchmarks and I found that the parallel asm
>>>>>>>>> preconditioner, my favorite preconditioner, has a limit in the number of
>>>>>>>>> cores it can handle.
>>>>>>>>> 
>>>>>>>>> I am doing a numerical experiment where I scale up the size of my matrix
>>>>>>>>> by roughly the same factor as the number of CPUs employed. When I look
>>>>>>>>> at which function used how much memory using PETSc's routine
>>>>>>>>> PetscMallocDumpLog, I see the following:
>>>>>>>>> 
>>>>>>>>> Function name                        N=300         N=600     increase
>>>>>>>>> ======================================================================
>>>>>>>>> MatGetSubMatrices_MPIAIJ_Local    75'912'016   134'516'928    1.77
>>>>>>>>> MatIncreaseOverlap_MPIAIJ_Once    168'288'288  346'870'832    2.06
>>>>>>>>> MatIncreaseOverlap_MPIAIJ_Receive  2'918'960     5'658'160    1.94
>>>>>>>>> 
>>>>>>>>> The matrix sizes are 6'899'904 and 14'224'896, respectively. Above
>>>>>>>>> N~5000 CPUs I am running out of memory.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> We have run ASM on 224,000 processors of the XT5 at ORNL, so something else
>>>>>>>>> is going on here. The best thing to do here is send us -log_summary. For
>>>>>>>>> attachments,
>>>>>>>>> we usually recommend petsc-maint at mcs.anl.gov
>>>>>>>>> <mailto:petsc-maint at mcs.anl.gov>.
>>>>>>>>> 
>>>>>>>>> Matt
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Here's my question now: Is the asm preconditioner limited from the
>>>>>>>>> algorithm point of view, or is it the implementation? I thought that
>>>>>>>>> 'only' the local matrices, plus some constant overlap with neighbors,
>>>>>>>>> are solved, so that memory consumption should stay constant when I scale
>>>>>>>>> up with a constant number of rows per process.
>>>>>>>>> 
>>>>>>>>> Best
>>>>>>>>> Sebastian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>> experiments is infinitely more interesting than any results to which
>>>>>>>>> their experiments lead.
>>>>>>>>> -- Norbert Wiener
>>>>>>>> 
>>>>>>>> 
>>>>>>>> <qd_300.log><qd_600.log><qd_1200.log>
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>> 
>