<div dir="ltr"><div dir="ltr">On Wed, Jun 15, 2022 at 9:32 PM Runfeng Jin <<a href="mailto:jsfaraway@gmail.com">jsfaraway@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Hi! Thank you for your reply.<div><br></div><div>I am a little confused about the problem of machine. These two matrices solved in the same cluster, if there are some problems about the machine, why the low performance just happen to the matrix B?</div></div></div></blockquote><div><br></div><div>The performance problem is not related to the matrix B. The MPI_Barrier time on the second run is 1,000x slower. We just run MPI_Barrier() at</div><div>log output time to get this. It is not part of a solve.</div><div><br></div><div>It could be that there is a part of the cluster that is broken and your second job got scheduled there.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div> And, what is the situation of oversubscribing? Could you give some examples? </div></div></div></blockquote><div><br></div><div>Some MPI implementations perform extremely poorly when the number of processes exceeds the number of cores. This is called oversubscription.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div>Thank you!</div><div><br></div><div>Runfeng Jin</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> 于2022年6月15日周三 19:22写道:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Wed, Jun 15, 2022 at 4:21 AM Runfeng Jin <<a href="mailto:jsfaraway@gmail.com" target="_blank">jsfaraway@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi! <div>I use the same machine, same nodes and same processors per nodes. And I test many times, so this seems not an accidental result. But your points do inspire me. I use Global Array's communicator when solving matrix A, ang just MPI_COMM_WORLD in B. In every node, Global Array's communicator make one processor dedicated to manage communicate, maybe this is the reason for the difference in communicating speed?</div><div><br></div><div>I will have a try and respond as soon as I get the result!</div></div></blockquote><div><br></div><div>I would ask the sysadmin for that machine. That Barrier time is so high, I would think something is wrong with the switch. Or you are</div><div>oversubscribing which is causing massive slowdown.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Runfeng Jin</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Jose E. Roman <<a href="mailto:jroman@dsic.upv.es" target="_blank">jroman@dsic.upv.es</a>> 于2022年6月15日周三 16:09写道:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">You are comparing two different codes on two different machines? Or is it the same machine? with different number of processes and different solver options...<br>
<br>
If it is the same machine, the performance seems very different:<br>
<br>
Matrix A:<br>
Average time for MPI_Barrier(): 1.90986e-05<br>
Average time for zero size MPI_Send(): 3.44587e-06<br>
<br>
Matrix B:<br>
Average time for MPI_Barrier(): 0.0578456<br>
Average time for zero size MPI_Send(): 0.00358668<br>
<br>
The reductions (VecReduceComm) are taking 2.1629e-01 and 2.4972e+01, respectively. It's a two orders of magnitude difference.<br>
<br>
Jose<br>
<br>
<br>
> El 15 jun 2022, a las 8:58, Runfeng Jin <<a href="mailto:jsfaraway@gmail.com" target="_blank">jsfaraway@gmail.com</a>> escribió:<br>
> <br>
> Sorry ,I miss the attachment.<br>
> <br>
> Runfeng Jin<br>
> <br>
> Runfeng Jin <<a href="mailto:jsfaraway@gmail.com" target="_blank">jsfaraway@gmail.com</a>> 于2022年6月15日周三 14:56写道:<br>
> Hi! You are right! I try to use a SLEPc and PETSc version with nodebug, and the matrix B's solver time become 99s. But It is still a little higher than matrix A(8s). Same as mentioned before, attachment is log view of no-debug version:<br>
> file 1: log of matrix A solver. This is a larger matrix(900,000*900,000) but solved quickly(8s);<br>
> file 2: log of matix B solver. This is a smaller matrix(2,547*2,547) but solved much slower(99s).<br>
> <br>
> By comparing these two files, the strang phenomenon still exist:<br>
> 1) Matrix A has more basis vectors(375) than B(189), but A spent less time on BVCreate(0.6s) than B(32s);<br>
> 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s)<br>
> 3) In debug version, matrix B distribute much more unbalancedly storage among processors(memory max/min 4365) than A(memory max/min 1.113), but other metrics seems more balanced. And in no-debug version there is no memory information output.<br>
> <br>
> The significant difference I can tell is :1) B use preallocation; 2) A's matrix elements are calculated by CPU, while B's matrix elements are calculated by GPU and then transfered to CPU and solved by PETSc in CPU.<br>
> <br>
> Does this is a normal result? I mean, the matrix with less non-zero elements and less dimension can cost more epssolve time? Is this due to the structure of matrix? IF so, is there any ways to increase the solve speed?<br>
> <br>
> Or this is weired and should be fixed by some ways?<br>
> Thank you!<br>
> <br>
> Runfeng Jin<br>
> <br>
> <br>
> Jose E. Roman <<a href="mailto:jroman@dsic.upv.es" target="_blank">jroman@dsic.upv.es</a>> 于2022年6月12日周日 16:08写道:<br>
> Please always respond to the list.<br>
> <br>
> Pay attention to the warnings in the log:<br>
> <br>
> ##########################################################<br>
> # #<br>
> # WARNING!!! #<br>
> # #<br>
> # This code was compiled with a debugging option. #<br>
> # To get timing results run ./configure #<br>
> # using --with-debugging=no, the performance will #<br>
> # be generally two or three times faster. #<br>
> # #<br>
> ##########################################################<br>
> <br>
> With the debugging option the times are not trustworthy, so I suggest repeating the analysis with an optimized build.<br>
> <br>
> Jose<br>
> <br>
> <br>
> > El 12 jun 2022, a las 5:41, Runfeng Jin <<a href="mailto:jsfaraway@gmail.com" target="_blank">jsfaraway@gmail.com</a>> escribió:<br>
> > <br>
> > Hello!<br>
> > I compare these two matrix solver's log view and find some strange thing. Attachment files are the log view.:<br>
> > file 1: log of matrix A solver. This is a larger matrix(900,000*900,000) but solved quickly(30s);<br>
> > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547 , a little different from the matrix B that is mentioned in initial email, but solved much slower too. I use this for a quicker test) but solved much slower(1244s).<br>
> > <br>
> > By comparing these two files, I find some thing:<br>
> > 1) Matrix A has more basis vectors(375) than B(189), but A spent less time on BVCreate(0.349s) than B(296s);<br>
> > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s)<br>
> > 3) Matrix B distribute much more unbalancedly storage among processors(memory max/min 4365) than A(memory max/min 1.113), but other metrics seems more balanced.<br>
> > <br>
> > I don't do prealocation in A, and it is distributed across processors by PETSc. For B , when preallocation I use PetscSplitOwnership to decide which part belongs to local processor, and B is also distributed by PETSc when compute matrix values. <br>
> > <br>
> > - Does this mean, for matrix B, too much nonzero elements are stored in single process, and this is why it cost too much more time in solving the matrix and find eigenvalues? If so, are there some better ways to distribute the matrix among processors? <br>
> > - Or are there any else reasons for this difference in cost time?<br>
> > <br>
> > Hope to recieve your reply, thank you!<br>
> > <br>
> > Runfeng Jin<br>
> > <br>
> > <br>
> > <br>
> > Runfeng Jin <<a href="mailto:jsfaraway@gmail.com" target="_blank">jsfaraway@gmail.com</a>> 于2022年6月11日周六 20:33写道:<br>
> > Hello!<br>
> > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much time. Is there anything else I can do? Attachment is log when use PETSC_DEFAULT for eps_ncv.<br>
> > <br>
> > Thank you !<br>
> > <br>
> > Runfeng Jin<br>
> > <br>
> > Jose E. Roman <<a href="mailto:jroman@dsic.upv.es" target="_blank">jroman@dsic.upv.es</a>> 于2022年6月10日周五 20:50写道:<br>
> > The value -eps_ncv 5000 is huge.<br>
> > Better let SLEPc use the default value.<br>
> > <br>
> > Jose<br>
> > <br>
> > <br>
> > > El 10 jun 2022, a las 14:24, Jin Runfeng <<a href="mailto:jsfaraway@gmail.com" target="_blank">jsfaraway@gmail.com</a>> escribió:<br>
> > > <br>
> > > Hello!<br>
> > > I want to acquire the 3 smallest eigenvalue, and attachment is the log view output. I can see epssolve really cost the major time. But I can not see why it cost so much time. Can you see something from it?<br>
> > > <br>
> > > Thank you !<br>
> > > <br>
> > > Runfeng Jin<br>
> > > <br>
> > > On 6月 4 2022, at 1:37 凌晨, Jose E. Roman <<a href="mailto:jroman@dsic.upv.es" target="_blank">jroman@dsic.upv.es</a>> wrote:<br>
> > > Convergence depends on distribution of eigenvalues you want to compute. On the other hand, the cost also depends on the time it takes to build the preconditioner. Use -log_view to see the cost of the different steps of the computation.<br>
> > > <br>
> > > Jose<br>
> > > <br>
> > > <br>
> > > > El 3 jun 2022, a las 18:50, jsfaraway <<a href="mailto:jsfaraway@gmail.com" target="_blank">jsfaraway@gmail.com</a>> escribió:<br>
> > > ><br>
> > > > hello!<br>
> > > ><br>
> > > > I am trying to use epsgd compute matrix's one smallest eigenvalue. And I find a strang thing. There are two matrix A(900000*900000) and B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B use 22 iterations and 38885s! What could be the reason for this? Or what can I do to find the reason?<br>
> > > ><br>
> > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ".<br>
> > > > And there is one difference I can tell is matrix B has many small value, whose absolute value is less than 10-6. Could this be the reason?<br>
> > > ><br>
> > > > Thank you!<br>
> > > ><br>
> > > > Runfeng Jin<br>
> > > <log_view.txt><br>
> > <br>
> > <File2_lower-But-Smaller-Matrix.txt><File1_fatesr-But-Larger-MATRIX.txt><br>
> <br>
> <file2_nodebug_MatrixB.txt><file1_nodebug_MatrixA.txt><br>
<br>
</blockquote></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>