[petsc-dev] many subdomains per process
Barry Smith
bsmith at mcs.anl.gov
Sun Feb 7 17:09:37 CST 2010
This might be a much easier way. Based on the maximum number of
nonzeros in any row of the matrix we can use that to give an upper
bound on the overlap size.
(Number of entries in the IS)*(Maximum number of nonzeros in the
matrix) This is the worst case situation where each row of the IS
introduces completely different overlap entries. Generally it would be
much smaller. This, at least is trivial to try.
Barry
On Feb 7, 2010, at 5:01 PM, Barry Smith wrote:
>
> It doesn't know in advance how many entries there will be in the IS
> that defines the larger overlap; hence it allocates Mbs space (the
> TOTAL number of block rows in the ENTIRE matrix) for EACH IS. It has
> to allocate PetscInt for each rows/columns collected plus a bit
> array to cheaply check if one has already been collected. This is
> true to AIJ and BAIJ matrices.
>
> When there are a lot of IS or a large Mbs this is troublesome.
>
> If we go through the process twice we could count the number of
> entires for each the first time and then allocated the correct
> space. This would change the memory usage from 33 (or 65) bits *# IS
> * Mbs to 1 bit * # IS *Mbs. We could replace the bit array with a
> dynamic hash table to get rid of the dependence on Mbs.
>
> But if you really want thousands of subdomains per process we may
> want a completely different model because we don't really want to be
> allocating thousands of tiny IS.
>
> We could also do a small number of IS each time and loop over the
> bunches of IS. Doesn't make it scalable but might be good enough for
> basic tests.
>
> Barry
>
> On Feb 7, 2010, at 10:38 AM, Jed Brown wrote:
>
>> On Sat, 6 Feb 2010 19:39:30 -0600, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>>>
>>> You could try running with -malloc_log to see where all the memory
>>> is being malloced by PETSc.
>>
>> baijov.c:182
>>
>> ierr = PetscMalloc((imax)*(sizeof(PetscBT) + sizeof(PetscInt*)+
>> sizeof(PetscInt)) +
>> (Mbs)*imax*sizeof(PetscInt) + (Mbs/PETSC_BITS_PER_BYTE
>> +1)*imax*sizeof(char),&table);CHKERRQ(ierr);
>>
>> This involves Mbs*imax which is the number of nodes per process times
>> the number of subdomains per process. I haven't investigated how
>> difficult it would be to make this scalable.
>>
>> Jed
>
More information about the petsc-dev
mailing list