[petsc-dev] Slowness of PetscSortIntWithArrayPair in MatAssembly

Fande Kong fdkong.jd at gmail.com
Thu Jul 18 12:07:12 CDT 2019


Thanks Junchao,

Sorry for the late reply.

time mpirun -n 2 ./matrix_sparsity-opt  -matstash_legacy
Close matrix for np = 2 ...
Matrix successfully closed

real 0m1.963s
user 0m3.256s
sys 0m0.554s


time mpirun -n 2 ./matrix_sparsity-opt
Close matrix for np = 2 ...
Matrix successfully closed

real 0m2.351s
user 0m4.052s
sys 0m0.519s


A bit slower, but it might do not matter much.

Thanks,

Fande,

On Wed, Jul 10, 2019 at 3:51 PM Zhang, Junchao <jczhang at mcs.anl.gov> wrote:

> Fande,
>   I ran your code with two processes and found the poor performance
> of PetscSortIntWithArrayPair() was due to duplicates.  In particular,
> rank 0 has array length = 0 and rank 1 has array length = 4,180,070. On
> rank 1, each unique array value has ~95 duplicates; The duplicates are
> already clustered together before sorting.
>   The old quick sort algorithm has an O(n^2) complexity on these
> duplicates. I added a three-way-partition algorithm to split the array into
> <, =, >pivot parts and get these numbers:
>
> Master:
> $ time mpirun -n 2 ./ex_petsc_only
> real 0m55.359s
> user 1m7.807s
> sys 0m42.651s
>
> Master:
> $ time mpirun -n 2 ./ex_petsc_only -matstash_legacy
> real 0m0.987s
> user 0m1.565s
> sys 0m0.285s
>
> Three way partition
> $ time mpirun -n 2 ./ex_petsc_only
> real 0m1.015s
> user 0m1.535s
> sys 0m0.392s
>
> We can see the new sort algorithm gives a 55x speedup and can almost catch
> up with -matstash_legacy. So I think it is good to have no matter whether
> we will do hashing or not in mat assembly.
> Please try branch jczhang/feature-better-quicksort-pivot on your side to
> see if it helps and if the segfault persist.
> Thanks.
>
> --Junchao Zhang
>
>
> On Tue, Jul 9, 2019 at 10:18 AM Fande Kong <fdkong.jd at gmail.com> wrote:
>
>> Hi Junchao,
>>
>> If you are struggling on building the right package environment, here is
>> a native PETSc example.
>>
>>
>> Fande,
>>
>> On Mon, Jul 8, 2019 at 3:00 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>
>>> I guess John has a pure PETSc example,
>>>
>>> John, could you share the pure PETSc example with Junchao?
>>>
>>>
>>> Fande,
>>>
>>> On Mon, Jul 8, 2019 at 2:58 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>
>>>> Yes, here it is https://github.com/fdkong/matrixsparsity
>>>>
>>>>
>>>> You need to follow instructions here to install MOOSE
>>>> https://www.mooseframework.org/getting_started/installation/mac_os.html
>>>>
>>>>
>>>> Thanks for your help.
>>>>
>>>>
>>>> Fande
>>>>
>>>>
>>>>
>>>> On Mon, Jul 8, 2019 at 2:28 PM Zhang, Junchao <jczhang at mcs.anl.gov>
>>>> wrote:
>>>>
>>>>> Is the code public for me to test?
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Mon, Jul 8, 2019 at 3:06 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>>>
>>>>>> Thanks Junchao,
>>>>>>
>>>>>> Tried your code. I did not hit seg fault this time, but the assembly
>>>>>> was still slow
>>>>>>
>>>>>>
>>>>>> *time mpirun -n 2 ./matrix_sparsity-opt   -matstash_legacy*
>>>>>> *Close matrix for np = 2 ... *
>>>>>> *Matrix successfully closed*
>>>>>>
>>>>>> *real 0m2.009s*
>>>>>> *user 0m3.324s*
>>>>>> *sys 0m0.575s*
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> * time mpirun -n 2 ./matrix_sparsity-opt*
>>>>>> *Close matrix for np = 2 ... *
>>>>>> *Matrix successfully closed*
>>>>>>
>>>>>> *real 3m39.235s*
>>>>>> *user 6m42.184s*
>>>>>> *sys 0m35.084s*
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Fande,
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 8, 2019 at 8:47 AM Fande Kong <fdkong.jd at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Will let you know soon.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Fande,
>>>>>>>
>>>>>>> On Mon, Jul 8, 2019 at 8:41 AM Zhang, Junchao <jczhang at mcs.anl.gov>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Fande or John,
>>>>>>>>   Could any of you have a try? Thanks
>>>>>>>> --Junchao Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------- Forwarded message ---------
>>>>>>>> From: Junchao Zhang <jczhang at mcs.anl.gov>
>>>>>>>> Date: Thu, Jul 4, 2019 at 8:21 AM
>>>>>>>> Subject: Re: [petsc-dev] Slowness of PetscSortIntWithArrayPair in
>>>>>>>> MatAssembly
>>>>>>>> To: Fande Kong <fdkong.jd at gmail.com>
>>>>>>>>
>>>>>>>>
>>>>>>>> Fande,
>>>>>>>>   I wrote tests but could not reproduce the error. I pushed a
>>>>>>>> commit that changed the MEDIAN macro to a function to make it easier to
>>>>>>>> debug.  Could you run and debug it again? It should be easy to see what is
>>>>>>>> wrong in gdb.
>>>>>>>>   Thanks.
>>>>>>>> --Junchao Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 3, 2019 at 6:48 PM Fande Kong <fdkong.jd at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Process 3915 resuming
>>>>>>>>> Process 3915 stopped
>>>>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason =
>>>>>>>>> EXC_BAD_ACCESS (code=2, address=0x7ffee9b91fc8)
>>>>>>>>>     frame #0: 0x000000010cbaa031
>>>>>>>>> libpetsc.3.011.dylib`PetscSortIntWithArrayPair_Private(L=0x0000000119fc5480,
>>>>>>>>> J=0x000000011bfaa480, K=0x000000011ff74480, right=13291) at sorti.c:298
>>>>>>>>>    295      }
>>>>>>>>>    296      PetscFunctionReturn(0);
>>>>>>>>>    297    }
>>>>>>>>> -> 298    i    = MEDIAN(L,right);
>>>>>>>>>    299    SWAP3(L[0],L[i],J[0],J[i],K[0],K[i],tmp);
>>>>>>>>>    300    vl   = L[0];
>>>>>>>>>    301    last = 0;
>>>>>>>>> (lldb)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 3, 2019 at 4:32 PM Zhang, Junchao <jczhang at mcs.anl.gov>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Could you debug it or paste the stack trace? Since it is a
>>>>>>>>>> segfault, it should be easy.
>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 3, 2019 at 5:16 PM Fande Kong <fdkong.jd at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Junchao,
>>>>>>>>>>>
>>>>>>>>>>> But there is still segment fault. I guess you could write
>>>>>>>>>>> some continuous integers to test your changes.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Fande
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 3, 2019 at 12:57 PM Zhang, Junchao <
>>>>>>>>>>> jczhang at mcs.anl.gov> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Fande and John,
>>>>>>>>>>>>   Could you try jczhang/feature-better-quicksort-pivot? It
>>>>>>>>>>>> passed Jenkins tests and I could not imagine why it failed on yours.
>>>>>>>>>>>>   Hash table has its own cost. We'd better get quicksort right
>>>>>>>>>>>> and see how it performs before rewriting code.
>>>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 2, 2019 at 2:37 PM Fande Kong <fdkong.jd at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>>>>>>>>>>>>> fault: 11 (signal 11)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Segmentation fault :-)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> As Jed said, it might be a good idea to rewrite the code using
>>>>>>>>>>>>> the hashing table.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Fande,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jul 2, 2019 at 1:27 PM Zhang, Junchao <
>>>>>>>>>>>>> jczhang at mcs.anl.gov> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Try this to see if it helps:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/src/sys/utils/sorti.c b/src/sys/utils/sorti.c
>>>>>>>>>>>>>> index 1b07205a..90779891 100644
>>>>>>>>>>>>>> --- a/src/sys/utils/sorti.c
>>>>>>>>>>>>>> +++ b/src/sys/utils/sorti.c
>>>>>>>>>>>>>> @@ -294,7 +294,8 @@ static PetscErrorCode
>>>>>>>>>>>>>> PetscSortIntWithArrayPair_Private(PetscInt *L,PetscInt *J,
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>      PetscFunctionReturn(0);
>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>> -  SWAP3(L[0],L[right/2],J[0],J[right/2],K[0],K[right/2],tmp);
>>>>>>>>>>>>>> +  i = MEDIAN(L,right);
>>>>>>>>>>>>>> +  SWAP3(L[0],L[i],J[0],J[i],K[0],K[i],tmp);
>>>>>>>>>>>>>>    vl   = L[0];
>>>>>>>>>>>>>>    last = 0;
>>>>>>>>>>>>>>    for (i=1; i<=right; i++) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 2, 2019 at 12:14 PM Fande Kong via petsc-dev <
>>>>>>>>>>>>>> petsc-dev at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> BTW,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> PetscSortIntWithArrayPair is used
>>>>>>>>>>>>>>> in MatStashSortCompress_Private.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any way to avoid to use PetscSortIntWithArrayPair in
>>>>>>>>>>>>>>> MatStashSortCompress_Private?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Fande,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jul 2, 2019 at 11:09 AM Fande Kong <
>>>>>>>>>>>>>>> fdkong.jd at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Developers,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John just noticed that the matrix assembly was slow when
>>>>>>>>>>>>>>>> having sufficient amount of off-diagonal entries. It was not a MPI issue
>>>>>>>>>>>>>>>> since I was  able to reproduce the issue using two cores on my desktop,
>>>>>>>>>>>>>>>> that is, "mpirun -n 2".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I turned  on a profiling, and 99.99% of the time was spent
>>>>>>>>>>>>>>>> on PetscSortIntWithArrayPair (recursively calling).   It took THREE MINUTES
>>>>>>>>>>>>>>>>  to get the assembly done. And then changed to use the option
>>>>>>>>>>>>>>>> "-matstash_legacy" to restore
>>>>>>>>>>>>>>>> the code to the old assembly routine, and the same code
>>>>>>>>>>>>>>>> took ONE SECOND to get the matrix assembly done.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Should write any better sorting algorithms?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Fande,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190718/8483f03f/attachment.html>


More information about the petsc-dev mailing list