[petsc-users] SLEPc EPSGD: too much time in single iteration

Runfeng Jin jsfaraway at gmail.com
Wed Jun 15 20:31:26 CDT 2022


Hi! Thank you for your reply.

I am a little confused about the problem of machine. These two matrices
solved in the same cluster, if there are some problems about the machine,
why the low performance just happen to the matrix B?
 And, what is the situation of oversubscribing? Could you give some
examples?

Thank you!

Runfeng Jin

Matthew Knepley <knepley at gmail.com> 于2022年6月15日周三 19:22写道:

> On Wed, Jun 15, 2022 at 4:21 AM Runfeng Jin <jsfaraway at gmail.com> wrote:
>
>> Hi!
>> I use the same machine, same nodes and same processors per nodes. And I
>> test many times, so this seems not an accidental result. But your points do
>> inspire me. I use Global Array's communicator when solving matrix A, ang
>> just MPI_COMM_WORLD in B. In every node, Global Array's communicator
>> make one processor dedicated to  manage communicate, maybe this is the
>> reason for the difference in communicating speed?
>>
>> I  will have a try and respond as soon as I get the result!
>>
>
> I would ask the sysadmin for that machine. That Barrier time is so high, I
> would think something is wrong with the switch. Or you are
> oversubscribing which is causing massive slowdown.
>
>   Thanks,
>
>      Matt
>
>
>> Runfeng Jin
>>
>>
>> Jose E. Roman <jroman at dsic.upv.es> 于2022年6月15日周三 16:09写道:
>>
>>> You are comparing two different codes on two different machines? Or is
>>> it the same machine? with different number of processes and different
>>> solver options...
>>>
>>> If it is the same machine, the performance seems very different:
>>>
>>> Matrix A:
>>> Average time for MPI_Barrier(): 1.90986e-05
>>> Average time for zero size MPI_Send(): 3.44587e-06
>>>
>>> Matrix B:
>>> Average time for MPI_Barrier(): 0.0578456
>>> Average time for zero size MPI_Send(): 0.00358668
>>>
>>> The reductions (VecReduceComm) are taking 2.1629e-01 and 2.4972e+01,
>>> respectively. It's a two orders of magnitude difference.
>>>
>>> Jose
>>>
>>>
>>> > El 15 jun 2022, a las 8:58, Runfeng Jin <jsfaraway at gmail.com>
>>> escribió:
>>> >
>>> > Sorry ,I miss the attachment.
>>> >
>>> > Runfeng Jin
>>> >
>>> > Runfeng Jin <jsfaraway at gmail.com> 于2022年6月15日周三 14:56写道:
>>> > Hi! You are right!  I try to use a SLEPc and PETSc version with
>>> nodebug, and the matrix B's solver time become 99s. But It is still a
>>> little higher than matrix A(8s). Same as mentioned before, attachment is
>>> log view of no-debug version:
>>> >    file 1:  log of matrix A solver. This is a larger
>>> matrix(900,000*900,000) but solved quickly(8s);
>>> >    file 2: log of matix B solver. This is a smaller
>>> matrix(2,547*2,547) but solved much slower(99s).
>>> >
>>> > By comparing these two files,  the strang phenomenon still exist:
>>> > 1) Matrix A has more basis vectors(375) than B(189), but A spent less
>>> time on BVCreate(0.6s) than B(32s);
>>> > 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s)
>>> > 3) In debug version, matrix B distribute much more unbalancedly
>>> storage among processors(memory max/min 4365) than A(memory max/min 1.113),
>>> but other metrics seems more balanced. And in no-debug version there is no
>>> memory information output.
>>> >
>>> > The significant difference I can tell is :1) B use preallocation; 2)
>>> A's matrix elements are calculated by CPU, while B's matrix elements are
>>> calculated by GPU and then transfered to CPU and solved by PETSc in CPU.
>>> >
>>> > Does this is a normal result? I mean, the matrix with less non-zero
>>> elements and less dimension can cost more epssolve time? Is this due to the
>>> structure of matrix? IF so, is there any ways to increase the solve speed?
>>> >
>>> > Or this is weired and should  be fixed by some ways?
>>> > Thank you!
>>> >
>>> > Runfeng Jin
>>> >
>>> >
>>> > Jose E. Roman <jroman at dsic.upv.es> 于2022年6月12日周日 16:08写道:
>>> > Please always respond to the list.
>>> >
>>> > Pay attention to the warnings in the log:
>>> >
>>> >       ##########################################################
>>> >       #                                                        #
>>> >       #                       WARNING!!!                       #
>>> >       #                                                        #
>>> >       #   This code was compiled with a debugging option.      #
>>> >       #   To get timing results run ./configure                #
>>> >       #   using --with-debugging=no, the performance will      #
>>> >       #   be generally two or three times faster.              #
>>> >       #                                                        #
>>> >       ##########################################################
>>> >
>>> > With the debugging option the times are not trustworthy, so I suggest
>>> repeating the analysis with an optimized build.
>>> >
>>> > Jose
>>> >
>>> >
>>> > > El 12 jun 2022, a las 5:41, Runfeng Jin <jsfaraway at gmail.com>
>>> escribió:
>>> > >
>>> > > Hello!
>>> > >  I compare these two matrix solver's log view and find some strange
>>> thing. Attachment files are the log view.:
>>> > >    file 1:  log of matrix A solver. This is a larger
>>> matrix(900,000*900,000) but solved quickly(30s);
>>> > >    file 2: log of matix B solver. This is a smaller
>>> matrix(2,547*2,547 , a little different from the matrix B that is mentioned
>>> in initial email, but solved much slower too. I use this for a quicker
>>> test) but solved much slower(1244s).
>>> > >
>>> > > By comparing these two files, I find some thing:
>>> > > 1) Matrix A has more basis vectors(375) than B(189), but A spent
>>> less time on BVCreate(0.349s) than B(296s);
>>> > > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s)
>>> > > 3) Matrix B distribute much more unbalancedly storage among
>>> processors(memory max/min 4365) than A(memory max/min 1.113), but other
>>> metrics seems more balanced.
>>> > >
>>> > > I don't do prealocation in A, and it is distributed across
>>> processors by PETSc. For B , when preallocation I use PetscSplitOwnership
>>> to decide which part belongs to local processor, and B is also distributed
>>> by PETSc when compute matrix values.
>>> > >
>>> > > - Does this mean, for matrix B, too much nonzero elements are stored
>>> in single process, and this is why it cost too much more time in solving
>>> the matrix and find eigenvalues? If so,  are there some better ways to
>>> distribute the matrix among processors?
>>> > > - Or are there any else reasons for this difference in cost time?
>>> > >
>>> > > Hope to recieve your reply, thank you!
>>> > >
>>> > > Runfeng Jin
>>> > >
>>> > >
>>> > >
>>> > > Runfeng Jin <jsfaraway at gmail.com> 于2022年6月11日周六 20:33写道:
>>> > > Hello!
>>> > > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much
>>> time. Is there anything else I can do? Attachment is log when use
>>> PETSC_DEFAULT for eps_ncv.
>>> > >
>>> > > Thank you !
>>> > >
>>> > > Runfeng Jin
>>> > >
>>> > > Jose E. Roman <jroman at dsic.upv.es> 于2022年6月10日周五 20:50写道:
>>> > > The value -eps_ncv 5000 is huge.
>>> > > Better let SLEPc use the default value.
>>> > >
>>> > > Jose
>>> > >
>>> > >
>>> > > > El 10 jun 2022, a las 14:24, Jin Runfeng <jsfaraway at gmail.com>
>>> escribió:
>>> > > >
>>> > > > Hello!
>>> > > >  I want to acquire the 3 smallest eigenvalue, and attachment is
>>> the log  view output. I can see epssolve really cost the major time. But I
>>> can not see why it cost so much time. Can you see something from it?
>>> > > >
>>> > > > Thank you !
>>> > > >
>>> > > > Runfeng Jin
>>> > > >
>>> > > > On 6月 4 2022, at 1:37 凌晨, Jose E. Roman <jroman at dsic.upv.es>
>>> wrote:
>>> > > > Convergence depends on distribution of eigenvalues you want to
>>> compute. On the other hand, the cost also depends on the time it takes to
>>> build the preconditioner. Use -log_view to see the cost of the different
>>> steps of the computation.
>>> > > >
>>> > > > Jose
>>> > > >
>>> > > >
>>> > > > > El 3 jun 2022, a las 18:50, jsfaraway <jsfaraway at gmail.com>
>>> escribió:
>>> > > > >
>>> > > > > hello!
>>> > > > >
>>> > > > > I am trying to use epsgd compute matrix's one smallest
>>> eigenvalue. And I find a strang thing. There are two matrix
>>> A(900000*900000) and B(90000*90000). While solve A use 371 iterations and
>>> only 30.83s, solve B use 22 iterations and 38885s! What could be the reason
>>> for this? Or what can I do to find the reason?
>>> > > > >
>>> > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ".
>>> > > > > And there is one difference I can tell is matrix B has many
>>> small value, whose absolute value is less than 10-6. Could this be the
>>> reason?
>>> > > > >
>>> > > > > Thank you!
>>> > > > >
>>> > > > > Runfeng Jin
>>> > > > <log_view.txt>
>>> > >
>>> > >
>>> <File2_lower-But-Smaller-Matrix.txt><File1_fatesr-But-Larger-MATRIX.txt>
>>> >
>>> > <file2_nodebug_MatrixB.txt><file1_nodebug_MatrixA.txt>
>>>
>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220616/eac666a5/attachment.html>


More information about the petsc-users mailing list