[petsc-dev] many subdomains per process
Barry Smith
bsmith at mcs.anl.gov
Sun Feb 7 17:01:20 CST 2010
It doesn't know in advance how many entries there will be in the IS
that defines the larger overlap; hence it allocates Mbs space (the
TOTAL number of block rows in the ENTIRE matrix) for EACH IS. It has
to allocate PetscInt for each rows/columns collected plus a bit array
to cheaply check if one has already been collected. This is true to
AIJ and BAIJ matrices.
When there are a lot of IS or a large Mbs this is troublesome.
If we go through the process twice we could count the number of
entires for each the first time and then allocated the correct space.
This would change the memory usage from 33 (or 65) bits *# IS * Mbs to
1 bit * # IS *Mbs. We could replace the bit array with a dynamic hash
table to get rid of the dependence on Mbs.
But if you really want thousands of subdomains per process we may
want a completely different model because we don't really want to be
allocating thousands of tiny IS.
We could also do a small number of IS each time and loop over the
bunches of IS. Doesn't make it scalable but might be good enough for
basic tests.
Barry
On Feb 7, 2010, at 10:38 AM, Jed Brown wrote:
> On Sat, 6 Feb 2010 19:39:30 -0600, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
>>
>> You could try running with -malloc_log to see where all the memory
>> is being malloced by PETSc.
>
> baijov.c:182
>
> ierr = PetscMalloc((imax)*(sizeof(PetscBT) + sizeof(PetscInt*)+
> sizeof(PetscInt)) +
> (Mbs)*imax*sizeof(PetscInt) + (Mbs/PETSC_BITS_PER_BYTE
> +1)*imax*sizeof(char),&table);CHKERRQ(ierr);
>
> This involves Mbs*imax which is the number of nodes per process times
> the number of subdomains per process. I haven't investigated how
> difficult it would be to make this scalable.
>
> Jed
More information about the petsc-dev
mailing list