<div dir="ltr">Sorry, I overlooked your attachment, which gives '-log_summary':<div>1.txt:</div><div><div>MatSolve 2 1.0 9.7397e-02 1.0 0.00e+00 0.0 5.4e+02 5.5e+03 6.0e+00 0 0 34 10 10 0 0 34 10 11 0</div>
<div>MatLUFactorSym 1 1.0 1.2882e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 6 0 0 0 12 6 0 0 0 12 0</div><div>MatLUFactorNum 1 1.0 1.8813e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 90 0 0 0 2 90 0 0 0 2 0</div>
<div><br></div><div>2.txt:</div><div><div>MatSolve 2 1.0 1.0811e-01 1.0 0.00e+00 0.0 4.9e+02 6.1e+03 6.0e+00 0 0 31 10 10 0 0 31 10 11 0</div><div>MatLUFactorSym 1 1.0 1.8920e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 8 0 0 0 12 8 0 0 0 12 0</div>
<div>MatLUFactorNum 1 1.0 2.1836e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 89 0 0 0 2 89 0 0 0 2 0</div></div><div><br></div><div>Again, only 1st solve calls LU factorization, which dominates as expected.</div>
<div>MatLUFactorSym is ignorable, but matrix ordering makes noticable effect. I would stay with sequential MatLUFactorSym and experiment different matrix orderings using '<span style="font-family:arial,sans-serif;font-size:13px">-mat_mumps_icntl_7 <>'.</span><br>
</div><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13px">Hong</span></div><div><br></div></div></div><div class="gmail_extra"><br>
<br><div class="gmail_quote">On Wed, Jan 29, 2014 at 11:33 AM, Hong Zhang <span dir="ltr"><<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Tabrez:<br><div class="gmail_extra"><div class="gmail_quote"><div class="im"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
I am getting the opposite result, i.e., MUMPS becomes slower when using ParMETIS for parallel ordering. What did I mess up? Is the problem too small?<br></div></blockquote><div> </div></div><div>I saw similar performance when adding parallel symbolic factorization into petsc interface, thus I did not set </div>
<div>parallel symbolic factorization as default for petsc/mumps interface.</div><div>How large is your matrix?</div><div><br></div><div>Can you send us output of '-log_summary' for these two runs?</div><span class="HOEnZb"><font color="#888888"><div>
<br></div>
<div>Hong</div></font></span><div><div class="h5"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<br>
<br>
Case 1 took 24.731s<br>
<br>
$ rm -f *vtk; time mpiexec -n 16 ./defmod -f point.inp -pc_type lu -pc_factor_mat_solver_package mumps -mat_mumps_icntl_4 1 -log_summary > 1.txt<br>
<br>
<br>
Case 2 with "-mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2" took 34.720s<br>
<br>
$ rm -f *vtk; time mpiexec -n 16 ./defmod -f point.inp -pc_type lu -pc_factor_mat_solver_package mumps -mat_mumps_icntl_4 1 -log_summary -mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 > 2.txt<br>
<br>
<br>
Both 1.txt and 2.txt are attached.<br>
<br>
Regards,<br>
<br>
Tabrez<div><div><br>
<br>
On 01/29/2014 09:18 AM, Hong Zhang wrote:
<blockquote type="cite">
<div dir="ltr">MUMPS now supports parallel symbolic factorization. With petsc-3.4 interface, you can use runtime option
<div><br>
<div>
<div> -mat_mumps_icntl_28 <1>: ICNTL(28): use 1 for sequential analysis and ictnl(7) ordering, or 2 for parallel analysis and ictnl(29) ordering </div>
<div> -mat_mumps_icntl_29 <0>: ICNTL(29): parallel ordering 1 = ptscotch 2 = parmetis </div>
</div>
</div>
<div><br>
</div>
<div>e.g, '-mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2' activates parallel symbolic factorization with pametis for matrix ordering. </div>
<div>Give it a try and let us know what you get.</div>
<div><br>
</div>
<div>Hong</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Tue, Jan 28, 2014 at 5:48 PM, Smith, Barry F. <span dir="ltr">
<<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div><br>
On Jan 28, 2014, at 5:39 PM, Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br>
<br>
> On Tue, Jan 28, 2014 at 5:25 PM, Tabrez Ali <<a href="mailto:stali@geology.wisc.edu" target="_blank">stali@geology.wisc.edu</a>> wrote:<br>
> Hello<br>
><br>
> This is my observation as well (with MUMPS). The first solve (after assembly which is super fast) takes a few mins (for ~1 million unknowns on 12/24 cores) but from then on only a few seconds for each subsequent solve for each time step.<br>
><br>
> Perhaps symbolic factorization in MUMPS is all serial?<br>
><br>
> Yes, it is.<br>
<br>
</div>
I missed this. I was just assuming a PETSc LU. Yes, I have no idea of relative time of symbolic and numeric for those other packages.<br>
<span><font color="#888888"><br>
Barry<br>
</font></span>
<div>
<div>><br>
> Matt<br>
><br>
> Like the OP I often do multiple runs on the same problem but I dont know if MUMPS or any other direct solver can save the symbolic factorization info to a file that perhaps can be utilized in subsequent reruns to avoid the costly "first solves".<br>
><br>
> Tabrez<br>
><br>
><br>
> On 01/28/2014 04:04 PM, Barry Smith wrote:<br>
> On Jan 28, 2014, at 1:36 PM, David Liu<<a href="mailto:daveliu@mit.edu" target="_blank">daveliu@mit.edu</a>> wrote:<br>
><br>
> Hi, I'm writing an application that solves a sparse matrix many times using Pastix. I notice that the first solves takes a very long time,<br>
> Is it the first “solve” or the first time you put values into that matrix that “takes a long time”? If you are not properly preallocating the matrix then the initial setting of values will be slow and waste memory. See
<a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html" target="_blank">
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html</a><br>
><br>
> The symbolic factorization is usually much faster than a numeric factorization so that is not the cause of the slow “first solve”.<br>
><br>
> Barry<br>
><br>
><br>
><br>
> while the subsequent solves are very fast. I don't fully understand what's going on behind the curtains, but I'm guessing it's because the very first solve has to read in the non-zero structure for the LU factorization, while the subsequent solves are faster
because the nonzero structure doesn't change.<br>
><br>
> My question is, is there any way to save the information obtained from the very first solve, so that the next time I run the application, the very first solve can be fast too (provided that I still have the same nonzero structure)?<br>
><br>
><br>
> --<br>
> No one trusts a model except the one who wrote it; Everyone trusts an observation except the one who made it- Harlow Shapley<br>
><br>
><br>
><br>
><br>
> --<br>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
> -- Norbert Wiener<br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<br>
<pre cols="72">--
No one trusts a model except the one who wrote it; Everyone trusts an observation except the one who made it- Harlow Shapley</pre>
</div></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div><br></div>