[petsc-dev] Slowness of PetscSortIntWithArrayPair in MatAssembly

Jed Brown jed at jedbrown.org
Tue Jul 2 14:17:56 CDT 2019


John Peterson <jwpeterson at gmail.com> writes:

> On Tue, Jul 2, 2019 at 1:44 PM Jed Brown <jed at jedbrown.org> wrote:
>
>> Fande Kong via petsc-dev <petsc-dev at mcs.anl.gov> writes:
>>
>> > Hi Developers,
>> >
>> > John just noticed that the matrix assembly was slow when having
>> sufficient
>> > amount of off-diagonal entries. It was not a MPI issue since I was  able
>> to
>> > reproduce the issue using two cores on my desktop, that is, "mpirun -n
>> 2".
>> >
>> > I turned  on a profiling, and 99.99% of the time was spent
>> > on PetscSortIntWithArrayPair (recursively calling).   It took THREE
>> MINUTES
>> >  to get the assembly done. And then changed to use the option
>> > "-matstash_legacy" to restore
>> > the code to the old assembly routine, and the same code took ONE SECOND
>> to
>> > get the matrix assembly done.
>>
>> Uff.
>>
>> > Should write any better sorting algorithms?
>>
>> It would be good to confirm (e.g., via debugger) that the problematic
>> array has some particular structure.  The naive quicksort in PETSc
>> degrades to quadratic for some ordered inputs data.  I (cheap) "fixed"
>> that in 2010, but never propagated it to the WithArray variants.  I
>> think we should do something similar (better median estimation) or
>> perhaps move to radix sorts.
>>
>>
>> https://bitbucket.org/petsc/petsc/commits/ef8e358335c5882e7d377b87160557d47280dc77
>
>
> Hi Jed,
>
> I replied to Junchao on Fande's thread, but my email was held for
> moderation. Apologies if you have already seen that mail, but as I
> mentioned there, I think the issue is that we pass an already-sorted array
> that is also very long for some reason to PetscSortIntWithArrayPair (see
> stack trace below).
>
>
> That's what I think is happening. We managed to get a stack trace from one
> of the slow running cases, and it was over 100k frames deep in recursive
> PetscSortIntWithArrayPair_Private calls (see below). So perhaps we are
> seeing worse case O(n) stack frame depth for quicksort on an already-sorted
> array, but I also am not sure where the big number, n=35241426, comes from,
> as the problem did not have nearly that many DOFs.

Do you add values many times into the same location?  The array length
will be the number of misses to the local part of the matrix.  We could
(and maybe should) make the stash use a hash instead of building the
array with multiplicity and combining duplicates later.

The "legacy" variant sends all that redundant data and inserts one at a
time into the local data structures.

> #104609 PetscSortIntWithArrayPair_Private (L=0x7ff6a413e650,
> J=0x7ff69bace650, K=0x7ff68277e650, right=33123131)
> #104610 PetscSortIntWithArrayPair_Private (L=0x7ff6a413e650,
> J=0x7ff69bace650, K=0x7ff68277e650, right=35241425)
> #104611 PetscSortIntWithArrayPair (n=35241426, L=0x7ff6a413e650,
> J=0x7ff69bace650, K=0x7ff68277e650)
> #104612 MatStashSortCompress_Private (stash=0x555f7603b628,
> insertmode=ADD_VALUES)
> #104613 MatStashScatterBegin_BTS (mat=0x555f7603aee0, stash=0x555f7603b628,
> owners=0x555f7623c9c0)
> #104614 MatStashScatterBegin_Private (mat=0x555f7603aee0,
> stash=0x555f7603b628, owners=0x555f7623c9c0)
> #104615 MatAssemblyBegin_MPIAIJ (mat=0x555f7603aee0,
> mode=MAT_FINAL_ASSEMBLY)
>
>
>
> -- 
> John


More information about the petsc-dev mailing list