[petsc-users] Very slow SVD with SLEPC

Jose E. Roman jroman at dsic.upv.es
Tue Nov 17 02:16:17 CST 2020


Your timing should be the same as the one in the log. SVDSolve logs the time of all relevant operations. I suggest doing a step-by-step execution in a debugger to see where those 1000 seconds are spent.
Jose


> El 17 nov 2020, a las 9:05, Rakesh Halder <rhalder at umich.edu> escribió:
> 
> When building the matrix, I use SVDGetSingularTriplet to get the left singular vectors for each singular value I want, and use VecGetArray to get the address and insert the values in a preallocated matrix I created to store the results in. I’m wondering if this is the best approach in doing so.
> 
>  I also mentioned earlier that in my code I calculated the time before and after calling SVDSolve and found that the elapsed time was around 1000 seconds, even though the log gave me 75 seconds. Could there be some issues with creating some of the internal data structures within the SVD object?
> 
> On Tue, Nov 17, 2020 at 2:43 AM Jose E. Roman <jroman at dsic.upv.es> wrote:
> What I meant is to send the output of -log_view without any xml formatting. Anyway, as you said the call to the SVD solver takes 75 seconds. The rest of the time should be attributed to your code I guess. Or maybe for not using preallocation if you are building the matrix in AIJ format.
> 
> Jose
> 
> 
> > El 17 nov 2020, a las 8:31, Rakesh Halder <rhalder at umich.edu> escribió:
> > 
> > And this output is from the small matrix log: 
> > 
> > <?xml version="1.0" encoding="UTF-8"?>
> > <?xml-stylesheet type="text/xsl" href="performance_xml2html.xsl"?>
> > <root>
> > <!-- PETSc Performance Summary: -->
> >   <petscroot>
> >     <runspecification desc="Run Specification">
> >       <executable desc="Executable">simpleROMFoam</executable>
> >       <architecture desc="Architecture">real-opt</architecture>
> >       <hostname desc="Host">pmultigrid</hostname>
> >       <nprocesses desc="Number of processes">1</nprocesses>
> >       <user desc="Run by user">rhalder</user>
> >       <date desc="Started at">Mon Nov 16 20:40:01 2020</date>
> >       <petscrelease desc="Petsc Release">Petsc Release Version 3.14.1, Nov 03, 2020 </petscrelease>
> >     </runspecification>
> >     <globalperformance desc="Global performance">
> >       <time desc="Time (sec)">
> >         <max>2.030551e+02</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>1.000000</ratio>
> >         <average>2.030551e+02</average>
> >       </time>
> >       <objects desc="Objects">
> >         <max>5.300000e+01</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>1.000000</ratio>
> >         <average>5.300000e+01</average>
> >       </objects>
> >       <mflop desc="MFlop">
> >         <max>0.000000e+00</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>0.000000</ratio>
> >         <average>0.000000e+00</average>
> >         <total>0.000000e+00</total>
> >       </mflop>
> >       <mflops desc="MFlop/sec">
> >         <max>0.000000e+00</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>0.000000</ratio>
> >         <average>0.000000e+00</average>
> >         <total>0.000000e+00</total>
> >       </mflops>
> >       <messagetransfers desc="MPI Message Transfers">
> >         <max>0.000000e+00</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>0.000000</ratio>
> >         <average>0.000000e+00</average>
> >         <total>0.000000e+00</total>
> >       </messagetransfers>
> >       <messagevolume desc="MPI Message Volume (MiB)">
> >         <max>0.000000e+00</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>0.000000</ratio>
> >         <average>0.000000e+00</average>
> >         <total>0.000000e+00</total>
> >       </messagevolume>
> >       <reductions desc="MPI Reductions">
> >         <max>0.000000e+00</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>0.000000</ratio>
> >       </reductions>
> >     </globalperformance>
> >     <timertree desc="Timings tree">
> >       <totaltime>203.055134</totaltime>
> >       <timethreshold>0.010000</timethreshold>
> >       <event>
> >         <name>MatConvert</name>
> >         <time>
> >           <value>0.0297699</value>
> >         </time>
> >         <events>
> >           <event>
> >             <name>self</name>
> >             <time>
> >               <value>0.029759</value>
> >             </time>
> >           </event>
> >         </events>
> >       </event>
> >       <event>
> >         <name>SVDSolve</name>
> >         <time>
> >           <value>0.0242731</value>
> >         </time>
> >         <events>
> >           <event>
> >             <name>self</name>
> >             <time>
> >               <value>0.0181869</value>
> >             </time>
> >           </event>
> >         </events>
> >       </event>
> >       <event>
> >         <name>MatView</name>
> >         <time>
> >           <value>0.0138235</value>
> >         </time>
> >       </event>
> >     </timertree>
> >     <selftimertable desc="Self-timings">
> >       <totaltime>203.055134</totaltime>
> >       <event>
> >         <name>MatConvert</name>
> >         <time>
> >           <value>0.0324545</value>
> >         </time>
> >       </event>
> >       <event>
> >         <name>SVDSolve</name>
> >         <time>
> >           <value>0.0181869</value>
> >         </time>
> >       </event>
> >       <event>
> >         <name>MatView</name>
> >         <time>
> >           <value>0.0138235</value>
> >         </time>
> >       </event>
> >     </selftimertable>
> >   </petscroot>
> > </root>
> > 
> > 
> > On Tue, Nov 17, 2020 at 2:30 AM Rakesh Halder <rhalder at umich.edu> wrote:
> > The following is from the large matrix log: 
> > 
> > <?xml version="1.0" encoding="UTF-8"?>
> > <?xml-stylesheet type="text/xsl" href="performance_xml2html.xsl"?>
> > <root>
> > <!-- PETSc Performance Summary: -->
> >   <petscroot>
> >     <runspecification desc="Run Specification">
> >       <executable desc="Executable">simpleROMFoam</executable>
> >       <architecture desc="Architecture">real-opt</architecture>
> >       <hostname desc="Host">pmultigrid</hostname>
> >       <nprocesses desc="Number of processes">1</nprocesses>
> >       <user desc="Run by user">rhalder</user>
> >       <date desc="Started at">Mon Nov 16 20:25:52 2020</date>
> >       <petscrelease desc="Petsc Release">Petsc Release Version 3.14.1, Nov 03, 2020 </petscrelease>
> >     </runspecification>
> >     <globalperformance desc="Global performance">
> >       <time desc="Time (sec)">
> >         <max>1.299397e+03</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>1.000000</ratio>
> >         <average>1.299397e+03</average>
> >       </time>
> >       <objects desc="Objects">
> >         <max>9.100000e+01</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>1.000000</ratio>
> >         <average>9.100000e+01</average>
> >       </objects>
> >       <mflop desc="MFlop">
> >         <max>0.000000e+00</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>0.000000</ratio>
> >         <average>0.000000e+00</average>
> >         <total>0.000000e+00</total>
> >       </mflop>
> >       <mflops desc="MFlop/sec">
> >         <max>0.000000e+00</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>0.000000</ratio>
> >         <average>0.000000e+00</average>
> >         <total>0.000000e+00</total>
> >       </mflops>
> >       <messagetransfers desc="MPI Message Transfers">
> >         <max>0.000000e+00</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>0.000000</ratio>
> >         <average>0.000000e+00</average>
> >         <total>0.000000e+00</total>
> >       </messagetransfers>
> >       <messagevolume desc="MPI Message Volume (MiB)">
> >         <max>0.000000e+00</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>0.000000</ratio>
> >         <average>0.000000e+00</average>
> >         <total>0.000000e+00</total>
> >       </messagevolume>
> >       <reductions desc="MPI Reductions">
> >         <max>0.000000e+00</max>
> >         <maxrank desc="rank at which max was found">0</maxrank>
> >         <ratio>0.000000</ratio>
> >       </reductions>
> >     </globalperformance>
> >     <timertree desc="Timings tree">
> >       <totaltime>1299.397478</totaltime>
> >       <timethreshold>0.010000</timethreshold>
> >       <event>
> >         <name>SVDSolve</name>
> >         <time>
> >           <value>75.5819</value>
> >         </time>
> >         <events>
> >           <event>
> >             <name>self</name>
> >             <time>
> >               <value>75.3134</value>
> >             </time>
> >           </event>
> >           <event>
> >             <name>MatConvert</name>
> >             <time>
> >               <value>0.165386</value>
> >             </time>
> >             <ncalls>
> >               <value>3.</value>
> >             </ncalls>
> >             <events>
> >               <event>
> >                 <name>self</name>
> >                 <time>
> >                   <value>0.165386</value>
> >                 </time>
> >               </event>
> >             </events>
> >           </event>
> >           <event>
> >             <name>SVDSetUp</name>
> >             <time>
> >               <value>0.102518</value>
> >             </time>
> >             <events>
> >               <event>
> >                 <name>self</name>
> >                 <time>
> >                   <value>0.0601394</value>
> >                 </time>
> >               </event>
> >               <event>
> >                 <name>VecSet</name>
> >                 <time>
> >                   <value>0.0423783</value>
> >                 </time>
> >                 <ncalls>
> >                   <value>4.</value>
> >                 </ncalls>
> >               </event>
> >             </events>
> >           </event>
> >         </events>
> >       </event>
> >       <event>
> >         <name>MatConvert</name>
> >         <time>
> >           <value>0.575872</value>
> >         </time>
> >         <events>
> >           <event>
> >             <name>self</name>
> >             <time>
> >               <value>0.575869</value>
> >             </time>
> >           </event>
> >         </events>
> >       </event>
> >       <event>
> >         <name>MatView</name>
> >         <time>
> >           <value>0.424561</value>
> >         </time>
> >       </event>
> >       <event>
> >         <name>BVCopy</name>
> >         <time>
> >           <value>0.0288127</value>
> >         </time>
> >         <ncalls>
> >           <value>2000.</value>
> >         </ncalls>
> >         <events>
> >           <event>
> >             <name>VecCopy</name>
> >             <time>
> >               <value>0.0284472</value>
> >             </time>
> >           </event>
> >         </events>
> >       </event>
> >       <event>
> >         <name>MatAssemblyEnd</name>
> >         <time>
> >           <value>0.0128941</value>
> >         </time>
> >       </event>
> >     </timertree>
> >     <selftimertable desc="Self-timings">
> >       <totaltime>1299.397478</totaltime>
> >       <event>
> >         <name>SVDSolve</name>
> >         <time>
> >           <value>75.3134</value>
> >         </time>
> >       </event>
> >       <event>
> >         <name>MatConvert</name>
> >         <time>
> >           <value>0.741256</value>
> >         </time>
> >       </event>
> >       <event>
> >         <name>MatView</name>
> >         <time>
> >           <value>0.424561</value>
> >         </time>
> >       </event>
> >       <event>
> >         <name>SVDSetUp</name>
> >         <time>
> >           <value>0.0601394</value>
> >         </time>
> >       </event>
> >       <event>
> >         <name>VecSet</name>
> >         <time>
> >           <value>0.0424012</value>
> >         </time>
> >       </event>
> >       <event>
> >         <name>VecCopy</name>
> >         <time>
> >           <value>0.0284472</value>
> >         </time>
> >       </event>
> >       <event>
> >         <name>MatAssemblyEnd</name>
> >         <time>
> >           <value>0.0128944</value>
> >         </time>
> >       </event>
> >     </selftimertable>
> >   </petscroot>
> > </root>
> > 
> > 
> > On Tue, Nov 17, 2020 at 2:28 AM Jose E. Roman <jroman at dsic.upv.es> wrote:
> > I cannot visualize the XML files. Please send the information in plain text.
> > Jose
> > 
> > 
> > > El 17 nov 2020, a las 5:33, Rakesh Halder <rhalder at umich.edu> escribió:
> > > 
> > > Hi Jose,
> > > 
> > > I attached two XML logs of two different SVD calculations where N ~= 140,000; first a small N x 5 matrix, and then a large N x 1000 matrix. The global timing starts before the SVD calculations. The small matrix calculation happens very quick in total (less than a second), while the larger one takes around 1,000 seconds. The "largeMat.xml" file shows that SVDSolve takes around 75 seconds, but when I time it myself by outputting the time difference to the console, it shows that it takes around 1,000 seconds, and I'm not sure where this mismatch is coming from.
> > > 
> > > This is using the scaLAPACK SVD solver on a single processor, and I call MatConvert to convert my matrix to the MATSCALAPACK format.
> > > 
> > > Thanks,
> > > 
> > > Rakesh
> > > 
> > > On Mon, Nov 16, 2020 at 2:45 AM Jose E. Roman <jroman at dsic.upv.es> wrote:
> > > For Cross and TRLanczos, make sure that the matrix is stored in DENSE format, not in the default AIJ format. On the other hand, these solvers build the transpose matrix explicitly, which is bad for dense matrices in parallel. Try using SVDSetImplicitTranspose(), this will also save memory.
> > > 
> > > For SCALAPACK, it is better if the matrix is passed in the MATSCALAPACK format already, otherwise the solver must convert it internally. Still, the matrix of singular vectors must be converted after computation.
> > > 
> > > In any case, performance questions should include information from -log_view so that we have a better idea of what is going on.
> > > 
> > > Jose
> > > 
> > > 
> > > > El 16 nov 2020, a las 6:04, Rakesh Halder <rhalder at umich.edu> escribió:
> > > > 
> > > > Hi Jose,
> > > > 
> > > > I'm only interested in part of the singular triplets, so those algorithms work for me. I tried using ScaLAPACK and it gives similar performance to Lanczos and Cross, so it's still very slow.... I'm still having memory issues with LAPACK and Elemental is giving me an error message indicating that the operation isn't supported for rectangular matrices. 
> > > > 
> > > > With regards to scaLAPACK or any other solver, I'm wondering if there's some settings to use with the SVD object to ensure optimal performance.
> > > > 
> > > > Thanks,
> > > > 
> > > > Rakesh
> > > > 
> > > > On Sun, Nov 15, 2020 at 2:59 PM Jose E. Roman <jroman at dsic.upv.es> wrote:
> > > > Rakesh,
> > > > 
> > > > The solvers you mention are not intended for computing the full SVD, only part of the singular triplets. In the latest version (3.14) there are now solvers that wrap external packages for parallel dense computations: ScaLAPACK and Elemental.
> > > > 
> > > > Jose
> > > > 
> > > > 
> > > > > El 15 nov 2020, a las 20:48, Matthew Knepley <knepley at gmail.com> escribió:
> > > > > 
> > > > > On Sun, Nov 15, 2020 at 2:18 PM Rakesh Halder <rhalder at umich.edu> wrote:
> > > > > Hi all,
> > > > > 
> > > > > A program I'm writing involves calculating the SVD of a large, dense N by n matrix (N ~= 150,000, n ~=10,000). I've used the different SVD solvers available through SLEPc, including the cross product, lanczos, and method available through the LAPACK library. The cross product and lanczos methods take a very long time to compute the SVD (around 7-8 hours on one processor) while the solver using the LAPACK library runs out of memory. If I write this matrix to a file and solve the SVD using MATLAB or python (numPy) it takes around 10 minutes. I'm wondering if there's a much cheaper way to solve the SVD.
> > > > > 
> > > > > This seems suspicious, since I know numpy just calls LAPACK, and I am fairly sure that Matlab does as well. Do the machines that you
> > > > > are running on have different amounts of RAM?
> > > > > 
> > > > >   Thanks,
> > > > > 
> > > > >      Matt
> > > > >  
> > > > > Thanks,
> > > > > 
> > > > > Rakesh
> > > > > 
> > > > > 
> > > > > -- 
> > > > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > > > > -- Norbert Wiener
> > > > > 
> > > > > https://www.cse.buffalo.edu/~knepley/
> > > > 
> > > 
> > > <largeMat.xml><smallMat.xml>
> > 
> 



More information about the petsc-users mailing list