[petsc-dev] many subdomains per process

Barry Smith bsmith at mcs.anl.gov
Sun Feb 7 17:09:37 CST 2010

    This might be a much easier way. Based on the maximum number of  
nonzeros in any row of the matrix we can use that to give an upper  
bound on the overlap size.
(Number of entries in the IS)*(Maximum number of nonzeros in the  
matrix) This is the worst case situation where each row of the IS  
introduces completely different overlap entries. Generally it would be  
much smaller.  This, at least is trivial to try.


On Feb 7, 2010, at 5:01 PM, Barry Smith wrote:

>  It doesn't know in advance how many entries there will be in the IS  
> that defines the larger overlap; hence it allocates Mbs space (the  
> TOTAL number of block rows in the ENTIRE matrix) for EACH IS. It has  
> to allocate PetscInt for each rows/columns collected plus a bit  
> array to cheaply check if one has already been collected. This is  
> true to AIJ and BAIJ matrices.
> When there are a lot of IS or a large Mbs this is troublesome.
>   If we go through the process twice we could count the number of  
> entires for each the first time and then allocated the correct  
> space. This would change the memory usage from 33 (or 65) bits *# IS  
> * Mbs to 1 bit * # IS *Mbs. We could replace the bit array with a  
> dynamic hash table to get rid of the dependence on Mbs.
>  But if you really want thousands of subdomains per process we may  
> want a completely different model because we don't really want to be  
> allocating thousands of tiny IS.
>  We could also do a small number of IS each time and loop over the  
> bunches of IS. Doesn't make it scalable but might be good enough for  
> basic tests.
>   Barry
> On Feb 7, 2010, at 10:38 AM, Jed Brown wrote:
>> On Sat, 6 Feb 2010 19:39:30 -0600, Barry Smith <bsmith at mcs.anl.gov>  
>> wrote:
>>>   You could try running with -malloc_log to see where all the memory
>>> is being malloced by PETSc.
>> baijov.c:182
>>   ierr = PetscMalloc((imax)*(sizeof(PetscBT) + sizeof(PetscInt*)+  
>> sizeof(PetscInt)) +
>>     (Mbs)*imax*sizeof(PetscInt)  + (Mbs/PETSC_BITS_PER_BYTE 
>> +1)*imax*sizeof(char),&table);CHKERRQ(ierr);
>> This involves Mbs*imax which is the number of nodes per process times
>> the number of subdomains per process.  I haven't investigated how
>> difficult it would be to make this scalable.
>> Jed

More information about the petsc-dev mailing list