[petsc-dev] many subdomains per process

Sun Feb 7 17:01:20 CST 2010

   It doesn't know in advance how many entries there will be in the IS  
that defines the larger overlap; hence it allocates Mbs space (the  
TOTAL number of block rows in the ENTIRE matrix) for EACH IS. It has  
to allocate PetscInt for each rows/columns collected plus a bit array  
to cheaply check if one has already been collected. This is true to  
AIJ and BAIJ matrices.

When there are a lot of IS or a large Mbs this is troublesome.

    If we go through the process twice we could count the number of  
entires for each the first time and then allocated the correct space.  
This would change the memory usage from 33 (or 65) bits *# IS * Mbs to  
1 bit * # IS *Mbs. We could replace the bit array with a dynamic hash  
table to get rid of the dependence on Mbs.

   But if you really want thousands of subdomains per process we may  
want a completely different model because we don't really want to be  
allocating thousands of tiny IS.

   We could also do a small number of IS each time and loop over the  
bunches of IS. Doesn't make it scalable but might be good enough for  
basic tests.

    Barry

On Feb 7, 2010, at 10:38 AM, Jed Brown wrote:

> On Sat, 6 Feb 2010 19:39:30 -0600, Barry Smith <bsmith at mcs.anl.gov>  
> wrote:
>>
>>    You could try running with -malloc_log to see where all the memory
>> is being malloced by PETSc.
>
> baijov.c:182
>
>    ierr = PetscMalloc((imax)*(sizeof(PetscBT) + sizeof(PetscInt*)+  
> sizeof(PetscInt)) +
>      (Mbs)*imax*sizeof(PetscInt)  + (Mbs/PETSC_BITS_PER_BYTE 
> +1)*imax*sizeof(char),&table);CHKERRQ(ierr);
>
> This involves Mbs*imax which is the number of nodes per process times
> the number of subdomains per process.  I haven't investigated how
> difficult it would be to make this scalable.
>
> Jed