[petsc-users] SLEPc EPSGD: too much time in single iteration

Runfeng Jin jsfaraway at gmail.com
Wed Jun 15 03:20:45 CDT 2022


Hi!
I use the same machine, same nodes and same processors per nodes. And I
test many times, so this seems not an accidental result. But your points do
inspire me. I use Global Array's communicator when solving matrix A, ang
just MPI_COMM_WORLD in B. In every node, Global Array's communicator
make one processor dedicated to  manage communicate, maybe this is the
reason for the difference in communicating speed?

I  will have a try and respond as soon as I get the result!

Runfeng Jin


Jose E. Roman <jroman at dsic.upv.es> 于2022年6月15日周三 16:09写道:

> You are comparing two different codes on two different machines? Or is it
> the same machine? with different number of processes and different solver
> options...
>
> If it is the same machine, the performance seems very different:
>
> Matrix A:
> Average time for MPI_Barrier(): 1.90986e-05
> Average time for zero size MPI_Send(): 3.44587e-06
>
> Matrix B:
> Average time for MPI_Barrier(): 0.0578456
> Average time for zero size MPI_Send(): 0.00358668
>
> The reductions (VecReduceComm) are taking 2.1629e-01 and 2.4972e+01,
> respectively. It's a two orders of magnitude difference.
>
> Jose
>
>
> > El 15 jun 2022, a las 8:58, Runfeng Jin <jsfaraway at gmail.com> escribió:
> >
> > Sorry ,I miss the attachment.
> >
> > Runfeng Jin
> >
> > Runfeng Jin <jsfaraway at gmail.com> 于2022年6月15日周三 14:56写道:
> > Hi! You are right!  I try to use a SLEPc and PETSc version with nodebug,
> and the matrix B's solver time become 99s. But It is still a little higher
> than matrix A(8s). Same as mentioned before, attachment is log view of
> no-debug version:
> >    file 1:  log of matrix A solver. This is a larger
> matrix(900,000*900,000) but solved quickly(8s);
> >    file 2: log of matix B solver. This is a smaller matrix(2,547*2,547)
> but solved much slower(99s).
> >
> > By comparing these two files,  the strang phenomenon still exist:
> > 1) Matrix A has more basis vectors(375) than B(189), but A spent less
> time on BVCreate(0.6s) than B(32s);
> > 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s)
> > 3) In debug version, matrix B distribute much more unbalancedly storage
> among processors(memory max/min 4365) than A(memory max/min 1.113), but
> other metrics seems more balanced. And in no-debug version there is no
> memory information output.
> >
> > The significant difference I can tell is :1) B use preallocation; 2) A's
> matrix elements are calculated by CPU, while B's matrix elements are
> calculated by GPU and then transfered to CPU and solved by PETSc in CPU.
> >
> > Does this is a normal result? I mean, the matrix with less non-zero
> elements and less dimension can cost more epssolve time? Is this due to the
> structure of matrix? IF so, is there any ways to increase the solve speed?
> >
> > Or this is weired and should  be fixed by some ways?
> > Thank you!
> >
> > Runfeng Jin
> >
> >
> > Jose E. Roman <jroman at dsic.upv.es> 于2022年6月12日周日 16:08写道:
> > Please always respond to the list.
> >
> > Pay attention to the warnings in the log:
> >
> >       ##########################################################
> >       #                                                        #
> >       #                       WARNING!!!                       #
> >       #                                                        #
> >       #   This code was compiled with a debugging option.      #
> >       #   To get timing results run ./configure                #
> >       #   using --with-debugging=no, the performance will      #
> >       #   be generally two or three times faster.              #
> >       #                                                        #
> >       ##########################################################
> >
> > With the debugging option the times are not trustworthy, so I suggest
> repeating the analysis with an optimized build.
> >
> > Jose
> >
> >
> > > El 12 jun 2022, a las 5:41, Runfeng Jin <jsfaraway at gmail.com>
> escribió:
> > >
> > > Hello!
> > >  I compare these two matrix solver's log view and find some strange
> thing. Attachment files are the log view.:
> > >    file 1:  log of matrix A solver. This is a larger
> matrix(900,000*900,000) but solved quickly(30s);
> > >    file 2: log of matix B solver. This is a smaller matrix(2,547*2,547
> , a little different from the matrix B that is mentioned in initial email,
> but solved much slower too. I use this for a quicker test) but solved much
> slower(1244s).
> > >
> > > By comparing these two files, I find some thing:
> > > 1) Matrix A has more basis vectors(375) than B(189), but A spent less
> time on BVCreate(0.349s) than B(296s);
> > > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s)
> > > 3) Matrix B distribute much more unbalancedly storage among
> processors(memory max/min 4365) than A(memory max/min 1.113), but other
> metrics seems more balanced.
> > >
> > > I don't do prealocation in A, and it is distributed across processors
> by PETSc. For B , when preallocation I use PetscSplitOwnership to decide
> which part belongs to local processor, and B is also distributed by PETSc
> when compute matrix values.
> > >
> > > - Does this mean, for matrix B, too much nonzero elements are stored
> in single process, and this is why it cost too much more time in solving
> the matrix and find eigenvalues? If so,  are there some better ways to
> distribute the matrix among processors?
> > > - Or are there any else reasons for this difference in cost time?
> > >
> > > Hope to recieve your reply, thank you!
> > >
> > > Runfeng Jin
> > >
> > >
> > >
> > > Runfeng Jin <jsfaraway at gmail.com> 于2022年6月11日周六 20:33写道:
> > > Hello!
> > > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much time.
> Is there anything else I can do? Attachment is log when use PETSC_DEFAULT
> for eps_ncv.
> > >
> > > Thank you !
> > >
> > > Runfeng Jin
> > >
> > > Jose E. Roman <jroman at dsic.upv.es> 于2022年6月10日周五 20:50写道:
> > > The value -eps_ncv 5000 is huge.
> > > Better let SLEPc use the default value.
> > >
> > > Jose
> > >
> > >
> > > > El 10 jun 2022, a las 14:24, Jin Runfeng <jsfaraway at gmail.com>
> escribió:
> > > >
> > > > Hello!
> > > >  I want to acquire the 3 smallest eigenvalue, and attachment is the
> log  view output. I can see epssolve really cost the major time. But I can
> not see why it cost so much time. Can you see something from it?
> > > >
> > > > Thank you !
> > > >
> > > > Runfeng Jin
> > > >
> > > > On 6月 4 2022, at 1:37 凌晨, Jose E. Roman <jroman at dsic.upv.es> wrote:
> > > > Convergence depends on distribution of eigenvalues you want to
> compute. On the other hand, the cost also depends on the time it takes to
> build the preconditioner. Use -log_view to see the cost of the different
> steps of the computation.
> > > >
> > > > Jose
> > > >
> > > >
> > > > > El 3 jun 2022, a las 18:50, jsfaraway <jsfaraway at gmail.com>
> escribió:
> > > > >
> > > > > hello!
> > > > >
> > > > > I am trying to use epsgd compute matrix's one smallest eigenvalue.
> And I find a strang thing. There are two matrix A(900000*900000) and
> B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B
> use 22 iterations and 38885s! What could be the reason for this? Or what
> can I do to find the reason?
> > > > >
> > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ".
> > > > > And there is one difference I can tell is matrix B has many small
> value, whose absolute value is less than 10-6. Could this be the reason?
> > > > >
> > > > > Thank you!
> > > > >
> > > > > Runfeng Jin
> > > > <log_view.txt>
> > >
> > >
> <File2_lower-But-Smaller-Matrix.txt><File1_fatesr-But-Larger-MATRIX.txt>
> >
> > <file2_nodebug_MatrixB.txt><file1_nodebug_MatrixA.txt>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220615/1ceda25d/attachment-0001.html>


More information about the petsc-users mailing list