[petsc-dev] Slowness of PetscSortIntWithArrayPair in MatAssembly
Fande Kong
fdkong.jd at gmail.com
Thu Jul 18 12:07:12 CDT 2019
Thanks Junchao,
Sorry for the late reply.
time mpirun -n 2 ./matrix_sparsity-opt -matstash_legacy
Close matrix for np = 2 ...
Matrix successfully closed
real 0m1.963s
user 0m3.256s
sys 0m0.554s
time mpirun -n 2 ./matrix_sparsity-opt
Close matrix for np = 2 ...
Matrix successfully closed
real 0m2.351s
user 0m4.052s
sys 0m0.519s
A bit slower, but it might do not matter much.
Thanks,
Fande,
On Wed, Jul 10, 2019 at 3:51 PM Zhang, Junchao <jczhang at mcs.anl.gov> wrote:
> Fande,
> I ran your code with two processes and found the poor performance
> of PetscSortIntWithArrayPair() was due to duplicates. In particular,
> rank 0 has array length = 0 and rank 1 has array length = 4,180,070. On
> rank 1, each unique array value has ~95 duplicates; The duplicates are
> already clustered together before sorting.
> The old quick sort algorithm has an O(n^2) complexity on these
> duplicates. I added a three-way-partition algorithm to split the array into
> <, =, >pivot parts and get these numbers:
>
> Master:
> $ time mpirun -n 2 ./ex_petsc_only
> real 0m55.359s
> user 1m7.807s
> sys 0m42.651s
>
> Master:
> $ time mpirun -n 2 ./ex_petsc_only -matstash_legacy
> real 0m0.987s
> user 0m1.565s
> sys 0m0.285s
>
> Three way partition
> $ time mpirun -n 2 ./ex_petsc_only
> real 0m1.015s
> user 0m1.535s
> sys 0m0.392s
>
> We can see the new sort algorithm gives a 55x speedup and can almost catch
> up with -matstash_legacy. So I think it is good to have no matter whether
> we will do hashing or not in mat assembly.
> Please try branch jczhang/feature-better-quicksort-pivot on your side to
> see if it helps and if the segfault persist.
> Thanks.
>
> --Junchao Zhang
>
>
> On Tue, Jul 9, 2019 at 10:18 AM Fande Kong <fdkong.jd at gmail.com> wrote:
>
>> Hi Junchao,
>>
>> If you are struggling on building the right package environment, here is
>> a native PETSc example.
>>
>>
>> Fande,
>>
>> On Mon, Jul 8, 2019 at 3:00 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>
>>> I guess John has a pure PETSc example,
>>>
>>> John, could you share the pure PETSc example with Junchao?
>>>
>>>
>>> Fande,
>>>
>>> On Mon, Jul 8, 2019 at 2:58 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>
>>>> Yes, here it is https://github.com/fdkong/matrixsparsity
>>>>
>>>>
>>>> You need to follow instructions here to install MOOSE
>>>> https://www.mooseframework.org/getting_started/installation/mac_os.html
>>>>
>>>>
>>>> Thanks for your help.
>>>>
>>>>
>>>> Fande
>>>>
>>>>
>>>>
>>>> On Mon, Jul 8, 2019 at 2:28 PM Zhang, Junchao <jczhang at mcs.anl.gov>
>>>> wrote:
>>>>
>>>>> Is the code public for me to test?
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Mon, Jul 8, 2019 at 3:06 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>>>
>>>>>> Thanks Junchao,
>>>>>>
>>>>>> Tried your code. I did not hit seg fault this time, but the assembly
>>>>>> was still slow
>>>>>>
>>>>>>
>>>>>> *time mpirun -n 2 ./matrix_sparsity-opt -matstash_legacy*
>>>>>> *Close matrix for np = 2 ... *
>>>>>> *Matrix successfully closed*
>>>>>>
>>>>>> *real 0m2.009s*
>>>>>> *user 0m3.324s*
>>>>>> *sys 0m0.575s*
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> * time mpirun -n 2 ./matrix_sparsity-opt*
>>>>>> *Close matrix for np = 2 ... *
>>>>>> *Matrix successfully closed*
>>>>>>
>>>>>> *real 3m39.235s*
>>>>>> *user 6m42.184s*
>>>>>> *sys 0m35.084s*
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Fande,
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 8, 2019 at 8:47 AM Fande Kong <fdkong.jd at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Will let you know soon.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Fande,
>>>>>>>
>>>>>>> On Mon, Jul 8, 2019 at 8:41 AM Zhang, Junchao <jczhang at mcs.anl.gov>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Fande or John,
>>>>>>>> Could any of you have a try? Thanks
>>>>>>>> --Junchao Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------- Forwarded message ---------
>>>>>>>> From: Junchao Zhang <jczhang at mcs.anl.gov>
>>>>>>>> Date: Thu, Jul 4, 2019 at 8:21 AM
>>>>>>>> Subject: Re: [petsc-dev] Slowness of PetscSortIntWithArrayPair in
>>>>>>>> MatAssembly
>>>>>>>> To: Fande Kong <fdkong.jd at gmail.com>
>>>>>>>>
>>>>>>>>
>>>>>>>> Fande,
>>>>>>>> I wrote tests but could not reproduce the error. I pushed a
>>>>>>>> commit that changed the MEDIAN macro to a function to make it easier to
>>>>>>>> debug. Could you run and debug it again? It should be easy to see what is
>>>>>>>> wrong in gdb.
>>>>>>>> Thanks.
>>>>>>>> --Junchao Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 3, 2019 at 6:48 PM Fande Kong <fdkong.jd at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Process 3915 resuming
>>>>>>>>> Process 3915 stopped
>>>>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason =
>>>>>>>>> EXC_BAD_ACCESS (code=2, address=0x7ffee9b91fc8)
>>>>>>>>> frame #0: 0x000000010cbaa031
>>>>>>>>> libpetsc.3.011.dylib`PetscSortIntWithArrayPair_Private(L=0x0000000119fc5480,
>>>>>>>>> J=0x000000011bfaa480, K=0x000000011ff74480, right=13291) at sorti.c:298
>>>>>>>>> 295 }
>>>>>>>>> 296 PetscFunctionReturn(0);
>>>>>>>>> 297 }
>>>>>>>>> -> 298 i = MEDIAN(L,right);
>>>>>>>>> 299 SWAP3(L[0],L[i],J[0],J[i],K[0],K[i],tmp);
>>>>>>>>> 300 vl = L[0];
>>>>>>>>> 301 last = 0;
>>>>>>>>> (lldb)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 3, 2019 at 4:32 PM Zhang, Junchao <jczhang at mcs.anl.gov>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Could you debug it or paste the stack trace? Since it is a
>>>>>>>>>> segfault, it should be easy.
>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 3, 2019 at 5:16 PM Fande Kong <fdkong.jd at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Junchao,
>>>>>>>>>>>
>>>>>>>>>>> But there is still segment fault. I guess you could write
>>>>>>>>>>> some continuous integers to test your changes.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Fande
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 3, 2019 at 12:57 PM Zhang, Junchao <
>>>>>>>>>>> jczhang at mcs.anl.gov> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Fande and John,
>>>>>>>>>>>> Could you try jczhang/feature-better-quicksort-pivot? It
>>>>>>>>>>>> passed Jenkins tests and I could not imagine why it failed on yours.
>>>>>>>>>>>> Hash table has its own cost. We'd better get quicksort right
>>>>>>>>>>>> and see how it performs before rewriting code.
>>>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 2, 2019 at 2:37 PM Fande Kong <fdkong.jd at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>>>>>>>>>>>>> fault: 11 (signal 11)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Segmentation fault :-)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> As Jed said, it might be a good idea to rewrite the code using
>>>>>>>>>>>>> the hashing table.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Fande,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jul 2, 2019 at 1:27 PM Zhang, Junchao <
>>>>>>>>>>>>> jczhang at mcs.anl.gov> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Try this to see if it helps:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/src/sys/utils/sorti.c b/src/sys/utils/sorti.c
>>>>>>>>>>>>>> index 1b07205a..90779891 100644
>>>>>>>>>>>>>> --- a/src/sys/utils/sorti.c
>>>>>>>>>>>>>> +++ b/src/sys/utils/sorti.c
>>>>>>>>>>>>>> @@ -294,7 +294,8 @@ static PetscErrorCode
>>>>>>>>>>>>>> PetscSortIntWithArrayPair_Private(PetscInt *L,PetscInt *J,
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> PetscFunctionReturn(0);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> - SWAP3(L[0],L[right/2],J[0],J[right/2],K[0],K[right/2],tmp);
>>>>>>>>>>>>>> + i = MEDIAN(L,right);
>>>>>>>>>>>>>> + SWAP3(L[0],L[i],J[0],J[i],K[0],K[i],tmp);
>>>>>>>>>>>>>> vl = L[0];
>>>>>>>>>>>>>> last = 0;
>>>>>>>>>>>>>> for (i=1; i<=right; i++) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 2, 2019 at 12:14 PM Fande Kong via petsc-dev <
>>>>>>>>>>>>>> petsc-dev at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> BTW,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> PetscSortIntWithArrayPair is used
>>>>>>>>>>>>>>> in MatStashSortCompress_Private.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any way to avoid to use PetscSortIntWithArrayPair in
>>>>>>>>>>>>>>> MatStashSortCompress_Private?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Fande,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jul 2, 2019 at 11:09 AM Fande Kong <
>>>>>>>>>>>>>>> fdkong.jd at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Developers,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John just noticed that the matrix assembly was slow when
>>>>>>>>>>>>>>>> having sufficient amount of off-diagonal entries. It was not a MPI issue
>>>>>>>>>>>>>>>> since I was able to reproduce the issue using two cores on my desktop,
>>>>>>>>>>>>>>>> that is, "mpirun -n 2".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I turned on a profiling, and 99.99% of the time was spent
>>>>>>>>>>>>>>>> on PetscSortIntWithArrayPair (recursively calling). It took THREE MINUTES
>>>>>>>>>>>>>>>> to get the assembly done. And then changed to use the option
>>>>>>>>>>>>>>>> "-matstash_legacy" to restore
>>>>>>>>>>>>>>>> the code to the old assembly routine, and the same code
>>>>>>>>>>>>>>>> took ONE SECOND to get the matrix assembly done.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Should write any better sorting algorithms?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Fande,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190718/8483f03f/attachment.html>
More information about the petsc-dev
mailing list