[petsc-users] On the edge of 2^31 unknowns

hong at aspiritech.org hong at aspiritech.org
Wed Jun 22 09:36:26 CDT 2016


Eric :

> thanks... but I just realized, that into petsc 3.5.4 (which I used for
> this calculus) the default *was* the scalable version... :/
>
We used it as default in the beginning, then switched to 'nonscalable' as
default. We fixed some bugs in the latest release for the scalable version.
More testings would help though.

>
> So I prepared a patch (see attachment) to help debug, but it will be based
> on 3.7.2.  We may use it to extract information to do a better bug report.
> Do you think there is enough PetscInfo added to resolve this issue?
>

Your patch is to add a set of PetscInfo(). We use PetscInfo for
optimization, e.g.,
line 321 of mpimatmatmult.c:
 PetscInfo3(Cmpi,"Reallocs %D; Fill ratio: given %g needed %g....);

You want to display following info for debugging support:
am, pN, lnnz, lMem, apnz_max,...

When there is a malloc() failure, PETSc would indicate the line of the
code, so user would know which parameter causes the problem. Adding so many
PetscInfo() for these parameters is not good for code maintenance, in my
opinion.

For your problem with 1.7 billion variables, you should use 'scalable'
version.
Let us know if it fails.

Hong

>
> On 21/06/16 09:36 AM, hong at aspiritech.org wrote:
>
>> Eric:
>> The nonscalable implementation is robust, and faster for small to medium
>> size problems, thus we set it as the default. You can switch with option
>> '-matmatmult_via scalable', which requires estimate of nonzeros A*B.
>> The estimate was buggy, not well-tested. If you encounter any problem,
>> let us know.
>>
>> Hong
>>
>> On Mon, Jun 20, 2016 at 10:49 PM, Eric Chamberland
>> <Eric.Chamberland at giref.ulaval.ca
>> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>
>>
>>
>>     Le 2016-06-20 23:37, Barry Smith a écrit :
>>
>>             On Jun 20, 2016, at 10:32 PM, Eric Chamberland
>>             <Eric.Chamberland at giref.ulaval.ca
>>             <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>>
>>             ok, but what about -matmatmult_via scalable?
>>
>>             Both should work. It just may be one is faster or slower
>>         than the other depending on the problem size.
>>
>>     ok, digging further, I found this into blaming the code:
>>
>>     0fc8cf34 (Hong Zhang       2013-06-27 14:04:58 -0500  696) /* same
>>     as MatMatMultSymbolic_MPIAIJ_MPIAIJ_nonscalable(), except using
>>     LLCondensed to avoid O(BN) memory requirement */
>>
>>     But the commit comment says:
>>     ...
>>          rename MatMatMultSymbolic_MPIAIJ_MPIAIJ ->
>>     MatMatMultSymbolic_MPIAIJ_MPIAIJ_nonscalable (non-default)
>>
>>     But it *is* the default...  since another commit:
>>
>>     commit 0d3441ae8a080c728abf17e90308c510e39e951b
>>     Author: Hong Zhang <hzhang at mcs.anl.gov <mailto:hzhang at mcs.anl.gov>>
>>     Date:   Mon Aug 24 16:40:35 2015 -0500
>>
>>          add MatPtAPxxx_MPIAIJ_MPIAIJ_new
>>
>>     which changed the behaviour programmed in 0fc8cf34.  Is it normal?
>>
>>     Eric
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160622/1f128011/attachment.html>


More information about the petsc-users mailing list