[petsc-users] SLEPc EPSGD: too much time in single iteration

Matthew Knepley knepley at gmail.com
Wed Jun 15 06:22:30 CDT 2022


On Wed, Jun 15, 2022 at 4:21 AM Runfeng Jin <jsfaraway at gmail.com> wrote:

> Hi!
> I use the same machine, same nodes and same processors per nodes. And I
> test many times, so this seems not an accidental result. But your points do
> inspire me. I use Global Array's communicator when solving matrix A, ang
> just MPI_COMM_WORLD in B. In every node, Global Array's communicator
> make one processor dedicated to  manage communicate, maybe this is the
> reason for the difference in communicating speed?
>
> I  will have a try and respond as soon as I get the result!
>

I would ask the sysadmin for that machine. That Barrier time is so high, I
would think something is wrong with the switch. Or you are
oversubscribing which is causing massive slowdown.

  Thanks,

     Matt


> Runfeng Jin
>
>
> Jose E. Roman <jroman at dsic.upv.es> 于2022年6月15日周三 16:09写道:
>
>> You are comparing two different codes on two different machines? Or is it
>> the same machine? with different number of processes and different solver
>> options...
>>
>> If it is the same machine, the performance seems very different:
>>
>> Matrix A:
>> Average time for MPI_Barrier(): 1.90986e-05
>> Average time for zero size MPI_Send(): 3.44587e-06
>>
>> Matrix B:
>> Average time for MPI_Barrier(): 0.0578456
>> Average time for zero size MPI_Send(): 0.00358668
>>
>> The reductions (VecReduceComm) are taking 2.1629e-01 and 2.4972e+01,
>> respectively. It's a two orders of magnitude difference.
>>
>> Jose
>>
>>
>> > El 15 jun 2022, a las 8:58, Runfeng Jin <jsfaraway at gmail.com> escribió:
>> >
>> > Sorry ,I miss the attachment.
>> >
>> > Runfeng Jin
>> >
>> > Runfeng Jin <jsfaraway at gmail.com> 于2022年6月15日周三 14:56写道:
>> > Hi! You are right!  I try to use a SLEPc and PETSc version with
>> nodebug, and the matrix B's solver time become 99s. But It is still a
>> little higher than matrix A(8s). Same as mentioned before, attachment is
>> log view of no-debug version:
>> >    file 1:  log of matrix A solver. This is a larger
>> matrix(900,000*900,000) but solved quickly(8s);
>> >    file 2: log of matix B solver. This is a smaller matrix(2,547*2,547)
>> but solved much slower(99s).
>> >
>> > By comparing these two files,  the strang phenomenon still exist:
>> > 1) Matrix A has more basis vectors(375) than B(189), but A spent less
>> time on BVCreate(0.6s) than B(32s);
>> > 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s)
>> > 3) In debug version, matrix B distribute much more unbalancedly storage
>> among processors(memory max/min 4365) than A(memory max/min 1.113), but
>> other metrics seems more balanced. And in no-debug version there is no
>> memory information output.
>> >
>> > The significant difference I can tell is :1) B use preallocation; 2)
>> A's matrix elements are calculated by CPU, while B's matrix elements are
>> calculated by GPU and then transfered to CPU and solved by PETSc in CPU.
>> >
>> > Does this is a normal result? I mean, the matrix with less non-zero
>> elements and less dimension can cost more epssolve time? Is this due to the
>> structure of matrix? IF so, is there any ways to increase the solve speed?
>> >
>> > Or this is weired and should  be fixed by some ways?
>> > Thank you!
>> >
>> > Runfeng Jin
>> >
>> >
>> > Jose E. Roman <jroman at dsic.upv.es> 于2022年6月12日周日 16:08写道:
>> > Please always respond to the list.
>> >
>> > Pay attention to the warnings in the log:
>> >
>> >       ##########################################################
>> >       #                                                        #
>> >       #                       WARNING!!!                       #
>> >       #                                                        #
>> >       #   This code was compiled with a debugging option.      #
>> >       #   To get timing results run ./configure                #
>> >       #   using --with-debugging=no, the performance will      #
>> >       #   be generally two or three times faster.              #
>> >       #                                                        #
>> >       ##########################################################
>> >
>> > With the debugging option the times are not trustworthy, so I suggest
>> repeating the analysis with an optimized build.
>> >
>> > Jose
>> >
>> >
>> > > El 12 jun 2022, a las 5:41, Runfeng Jin <jsfaraway at gmail.com>
>> escribió:
>> > >
>> > > Hello!
>> > >  I compare these two matrix solver's log view and find some strange
>> thing. Attachment files are the log view.:
>> > >    file 1:  log of matrix A solver. This is a larger
>> matrix(900,000*900,000) but solved quickly(30s);
>> > >    file 2: log of matix B solver. This is a smaller
>> matrix(2,547*2,547 , a little different from the matrix B that is mentioned
>> in initial email, but solved much slower too. I use this for a quicker
>> test) but solved much slower(1244s).
>> > >
>> > > By comparing these two files, I find some thing:
>> > > 1) Matrix A has more basis vectors(375) than B(189), but A spent less
>> time on BVCreate(0.349s) than B(296s);
>> > > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s)
>> > > 3) Matrix B distribute much more unbalancedly storage among
>> processors(memory max/min 4365) than A(memory max/min 1.113), but other
>> metrics seems more balanced.
>> > >
>> > > I don't do prealocation in A, and it is distributed across processors
>> by PETSc. For B , when preallocation I use PetscSplitOwnership to decide
>> which part belongs to local processor, and B is also distributed by PETSc
>> when compute matrix values.
>> > >
>> > > - Does this mean, for matrix B, too much nonzero elements are stored
>> in single process, and this is why it cost too much more time in solving
>> the matrix and find eigenvalues? If so,  are there some better ways to
>> distribute the matrix among processors?
>> > > - Or are there any else reasons for this difference in cost time?
>> > >
>> > > Hope to recieve your reply, thank you!
>> > >
>> > > Runfeng Jin
>> > >
>> > >
>> > >
>> > > Runfeng Jin <jsfaraway at gmail.com> 于2022年6月11日周六 20:33写道:
>> > > Hello!
>> > > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much
>> time. Is there anything else I can do? Attachment is log when use
>> PETSC_DEFAULT for eps_ncv.
>> > >
>> > > Thank you !
>> > >
>> > > Runfeng Jin
>> > >
>> > > Jose E. Roman <jroman at dsic.upv.es> 于2022年6月10日周五 20:50写道:
>> > > The value -eps_ncv 5000 is huge.
>> > > Better let SLEPc use the default value.
>> > >
>> > > Jose
>> > >
>> > >
>> > > > El 10 jun 2022, a las 14:24, Jin Runfeng <jsfaraway at gmail.com>
>> escribió:
>> > > >
>> > > > Hello!
>> > > >  I want to acquire the 3 smallest eigenvalue, and attachment is the
>> log  view output. I can see epssolve really cost the major time. But I can
>> not see why it cost so much time. Can you see something from it?
>> > > >
>> > > > Thank you !
>> > > >
>> > > > Runfeng Jin
>> > > >
>> > > > On 6月 4 2022, at 1:37 凌晨, Jose E. Roman <jroman at dsic.upv.es> wrote:
>> > > > Convergence depends on distribution of eigenvalues you want to
>> compute. On the other hand, the cost also depends on the time it takes to
>> build the preconditioner. Use -log_view to see the cost of the different
>> steps of the computation.
>> > > >
>> > > > Jose
>> > > >
>> > > >
>> > > > > El 3 jun 2022, a las 18:50, jsfaraway <jsfaraway at gmail.com>
>> escribió:
>> > > > >
>> > > > > hello!
>> > > > >
>> > > > > I am trying to use epsgd compute matrix's one smallest
>> eigenvalue. And I find a strang thing. There are two matrix
>> A(900000*900000) and B(90000*90000). While solve A use 371 iterations and
>> only 30.83s, solve B use 22 iterations and 38885s! What could be the reason
>> for this? Or what can I do to find the reason?
>> > > > >
>> > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ".
>> > > > > And there is one difference I can tell is matrix B has many small
>> value, whose absolute value is less than 10-6. Could this be the reason?
>> > > > >
>> > > > > Thank you!
>> > > > >
>> > > > > Runfeng Jin
>> > > > <log_view.txt>
>> > >
>> > >
>> <File2_lower-But-Smaller-Matrix.txt><File1_fatesr-But-Larger-MATRIX.txt>
>> >
>> > <file2_nodebug_MatrixB.txt><file1_nodebug_MatrixA.txt>
>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220615/93c48239/attachment.html>


More information about the petsc-users mailing list