[petsc-users] SLEPc EPSGD: too much time in single iteration

Thu Jun 16 07:12:24 CDT 2022

On Wed, Jun 15, 2022 at 9:32 PM Runfeng Jin <jsfaraway at gmail.com> wrote:

> Hi! Thank you for your reply.
>
> I am a little confused about the problem of machine. These two matrices
> solved in the same cluster, if there are some problems about the machine,
> why the low performance just happen to the matrix B?
>

The performance problem is not related to the matrix B. The MPI_Barrier
time on the second run is 1,000x slower. We just run MPI_Barrier() at
log output time to get this. It is not part of a solve.

It could be that there is a part of the cluster that is broken and your
second job got scheduled there.

>  And, what is the situation of oversubscribing? Could you give some
> examples?
>

Some MPI implementations perform extremely poorly when the number of
processes exceeds the number of cores. This is called oversubscription.

  Thanks,

     Matt

> Thank you!
>
> Runfeng Jin
>
> Matthew Knepley <knepley at gmail.com> 于2022年6月15日周三 19:22写道：
>
>> On Wed, Jun 15, 2022 at 4:21 AM Runfeng Jin <jsfaraway at gmail.com> wrote:
>>
>>> Hi!
>>> I use the same machine, same nodes and same processors per nodes. And I
>>> test many times, so this seems not an accidental result. But your points do
>>> inspire me. I use Global Array's communicator when solving matrix A, ang
>>> just MPI_COMM_WORLD in B. In every node, Global Array's communicator
>>> make one processor dedicated to  manage communicate, maybe this is the
>>> reason for the difference in communicating speed?
>>>
>>> I  will have a try and respond as soon as I get the result!
>>>
>>
>> I would ask the sysadmin for that machine. That Barrier time is so high,
>> I would think something is wrong with the switch. Or you are
>> oversubscribing which is causing massive slowdown.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Runfeng Jin
>>>
>>>
>>> Jose E. Roman <jroman at dsic.upv.es> 于2022年6月15日周三 16:09写道：
>>>
>>>> You are comparing two different codes on two different machines? Or is
>>>> it the same machine? with different number of processes and different
>>>> solver options...
>>>>
>>>> If it is the same machine, the performance seems very different:
>>>>
>>>> Matrix A:
>>>> Average time for MPI_Barrier(): 1.90986e-05
>>>> Average time for zero size MPI_Send(): 3.44587e-06
>>>>
>>>> Matrix B:
>>>> Average time for MPI_Barrier(): 0.0578456
>>>> Average time for zero size MPI_Send(): 0.00358668
>>>>
>>>> The reductions (VecReduceComm) are taking 2.1629e-01 and 2.4972e+01,
>>>> respectively. It's a two orders of magnitude difference.
>>>>
>>>> Jose
>>>>
>>>>
>>>> > El 15 jun 2022, a las 8:58, Runfeng Jin <jsfaraway at gmail.com>
>>>> escribió:
>>>> >
>>>> > Sorry ,I miss the attachment.
>>>> >
>>>> > Runfeng Jin
>>>> >
>>>> > Runfeng Jin <jsfaraway at gmail.com> 于2022年6月15日周三 14:56写道：
>>>> > Hi! You are right!  I try to use a SLEPc and PETSc version with
>>>> nodebug, and the matrix B's solver time become 99s. But It is still a
>>>> little higher than matrix A(8s). Same as mentioned before, attachment is
>>>> log view of no-debug version:
>>>> >    file 1:  log of matrix A solver. This is a larger
>>>> matrix(900,000*900,000) but solved quickly(8s);
>>>> >    file 2: log of matix B solver. This is a smaller
>>>> matrix(2,547*2,547) but solved much slower(99s).
>>>> >
>>>> > By comparing these two files,  the strang phenomenon still exist:
>>>> > 1) Matrix A has more basis vectors(375) than B(189), but A spent less
>>>> time on BVCreate(0.6s) than B(32s);
>>>> > 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s)
>>>> > 3) In debug version, matrix B distribute much more unbalancedly
>>>> storage among processors(memory max/min 4365) than A(memory max/min 1.113),
>>>> but other metrics seems more balanced. And in no-debug version there is no
>>>> memory information output.
>>>> >
>>>> > The significant difference I can tell is :1) B use preallocation; 2)
>>>> A's matrix elements are calculated by CPU, while B's matrix elements are
>>>> calculated by GPU and then transfered to CPU and solved by PETSc in CPU.
>>>> >
>>>> > Does this is a normal result? I mean, the matrix with less non-zero
>>>> elements and less dimension can cost more epssolve time? Is this due to the
>>>> structure of matrix? IF so, is there any ways to increase the solve speed?
>>>> >
>>>> > Or this is weired and should  be fixed by some ways?
>>>> > Thank you!
>>>> >
>>>> > Runfeng Jin
>>>> >
>>>> >
>>>> > Jose E. Roman <jroman at dsic.upv.es> 于2022年6月12日周日 16:08写道：
>>>> > Please always respond to the list.
>>>> >
>>>> > Pay attention to the warnings in the log:
>>>> >
>>>> >       ##########################################################
>>>> >       #                                                        #
>>>> >       #                       WARNING!!!                       #
>>>> >       #                                                        #
>>>> >       #   This code was compiled with a debugging option.      #
>>>> >       #   To get timing results run ./configure                #
>>>> >       #   using --with-debugging=no, the performance will      #
>>>> >       #   be generally two or three times faster.              #
>>>> >       #                                                        #
>>>> >       ##########################################################
>>>> >
>>>> > With the debugging option the times are not trustworthy, so I suggest
>>>> repeating the analysis with an optimized build.
>>>> >
>>>> > Jose
>>>> >
>>>> >
>>>> > > El 12 jun 2022, a las 5:41, Runfeng Jin <jsfaraway at gmail.com>
>>>> escribió:
>>>> > >
>>>> > > Hello!
>>>> > >  I compare these two matrix solver's log view and find some strange
>>>> thing. Attachment files are the log view.:
>>>> > >    file 1:  log of matrix A solver. This is a larger
>>>> matrix(900,000*900,000) but solved quickly(30s);
>>>> > >    file 2: log of matix B solver. This is a smaller
>>>> matrix(2,547*2,547 , a little different from the matrix B that is mentioned
>>>> in initial email, but solved much slower too. I use this for a quicker
>>>> test) but solved much slower(1244s).
>>>> > >
>>>> > > By comparing these two files, I find some thing:
>>>> > > 1) Matrix A has more basis vectors(375) than B(189), but A spent
>>>> less time on BVCreate(0.349s) than B(296s);
>>>> > > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s)
>>>> > > 3) Matrix B distribute much more unbalancedly storage among
>>>> processors(memory max/min 4365) than A(memory max/min 1.113), but other
>>>> metrics seems more balanced.
>>>> > >
>>>> > > I don't do prealocation in A, and it is distributed across
>>>> processors by PETSc. For B , when preallocation I use PetscSplitOwnership
>>>> to decide which part belongs to local processor, and B is also distributed
>>>> by PETSc when compute matrix values.
>>>> > >
>>>> > > - Does this mean, for matrix B, too much nonzero elements are
>>>> stored in single process, and this is why it cost too much more time in
>>>> solving the matrix and find eigenvalues? If so,  are there some better ways
>>>> to distribute the matrix among processors?
>>>> > > - Or are there any else reasons for this difference in cost time?
>>>> > >
>>>> > > Hope to recieve your reply, thank you!
>>>> > >
>>>> > > Runfeng Jin
>>>> > >
>>>> > >
>>>> > >
>>>> > > Runfeng Jin <jsfaraway at gmail.com> 于2022年6月11日周六 20:33写道：
>>>> > > Hello!
>>>> > > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much
>>>> time. Is there anything else I can do? Attachment is log when use
>>>> PETSC_DEFAULT for eps_ncv.
>>>> > >
>>>> > > Thank you !
>>>> > >
>>>> > > Runfeng Jin
>>>> > >
>>>> > > Jose E. Roman <jroman at dsic.upv.es> 于2022年6月10日周五 20:50写道：
>>>> > > The value -eps_ncv 5000 is huge.
>>>> > > Better let SLEPc use the default value.
>>>> > >
>>>> > > Jose
>>>> > >
>>>> > >
>>>> > > > El 10 jun 2022, a las 14:24, Jin Runfeng <jsfaraway at gmail.com>
>>>> escribió:
>>>> > > >
>>>> > > > Hello!
>>>> > > >  I want to acquire the 3 smallest eigenvalue, and attachment is
>>>> the log  view output. I can see epssolve really cost the major time. But I
>>>> can not see why it cost so much time. Can you see something from it?
>>>> > > >
>>>> > > > Thank you !
>>>> > > >
>>>> > > > Runfeng Jin
>>>> > > >
>>>> > > > On 6月 4 2022, at 1:37 凌晨, Jose E. Roman <jroman at dsic.upv.es>
>>>> wrote:
>>>> > > > Convergence depends on distribution of eigenvalues you want to
>>>> compute. On the other hand, the cost also depends on the time it takes to
>>>> build the preconditioner. Use -log_view to see the cost of the different
>>>> steps of the computation.
>>>> > > >
>>>> > > > Jose
>>>> > > >
>>>> > > >
>>>> > > > > El 3 jun 2022, a las 18:50, jsfaraway <jsfaraway at gmail.com>
>>>> escribió:
>>>> > > > >
>>>> > > > > hello!
>>>> > > > >
>>>> > > > > I am trying to use epsgd compute matrix's one smallest
>>>> eigenvalue. And I find a strang thing. There are two matrix
>>>> A(900000*900000) and B(90000*90000). While solve A use 371 iterations and
>>>> only 30.83s, solve B use 22 iterations and 38885s! What could be the reason
>>>> for this? Or what can I do to find the reason?
>>>> > > > >
>>>> > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real
>>>> ".
>>>> > > > > And there is one difference I can tell is matrix B has many
>>>> small value, whose absolute value is less than 10-6. Could this be the
>>>> reason?
>>>> > > > >
>>>> > > > > Thank you!
>>>> > > > >
>>>> > > > > Runfeng Jin
>>>> > > > <log_view.txt>
>>>> > >
>>>> > >
>>>> <File2_lower-But-Smaller-Matrix.txt><File1_fatesr-But-Larger-MATRIX.txt>
>>>> >
>>>> > <file2_nodebug_MatrixB.txt><file1_nodebug_MatrixA.txt>
>>>>
>>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220616/1ee21023/attachment-0001.html>