<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:宋体;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:宋体;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:"\@宋体";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma","sans-serif";}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I don’t know. TSARKIMEX doesn’t work for me either. <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>“TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery!”<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Am I use it wrong?<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I simply replaced:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>“ ierr = TSCreate(PETSC_COMM_WORLD, &ts); CHKERRQ(ierr);</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> ierr = TSSetType(ts, TSTHETA); CHKERRQ(ierr);</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> ierr = TSThetaSetTheta(ts, 0.5); CHKERRQ(ierr);”<o:p></o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>by:<o:p></o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>TSCreate(PETSC_COMM_WORLD,&ts);<o:p></o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> //TSSetType(ts,TSROSW);<o:p></o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> TSSetType(ts,TSARKIMEX);<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Thanks,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Shuangshuang<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> five9a2@gmail.com [mailto:five9a2@gmail.com] <b>On Behalf Of </b>Jed Brown<br><b>Sent:</b> Friday, August 30, 2013 4:39 PM<br><b>To:</b> Jin, Shuangshuang<br><b>Cc:</b> PETSc users list; Barry Smith; Shrirang Abhyankar<br><b>Subject:</b> RE: [petsc-users] Performance of PETSc TS solver<o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p><p>Do you have a time-dependent source term (non-autonomous)? I'm trying to determine why Rosenbrock did not converge for you. But since residual and Jacobian is similar cost, it may not be faster. How does TSARKIMEX work for you? It may be able to take larger time steps than THETA.<o:p></o:p></p><div><p class=MsoNormal>On Aug 30, 2013 4:23 PM, "Jin, Shuangshuang" <<a href="mailto:Shuangshuang.Jin@pnnl.gov">Shuangshuang.Jin@pnnl.gov</a>> wrote:<o:p></o:p></p><div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I’m using the Trapezoidal method with the command “-ts_theta_endpoint”</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> ierr = TSCreate(PETSC_COMM_WORLD, &ts); CHKERRQ(ierr);</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> ierr = TSSetType(ts, TSTHETA); CHKERRQ(ierr);</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> ierr = TSThetaSetTheta(ts, 0.5); CHKERRQ(ierr);</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Just did a quick try on Rosenbrock methods, and it’s diverged.</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I didn’t use VecSetValues. I only used MatSetValues multiple times inside IJacobian.</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I tried the –info option. The output file is too large to be sent out. I search the “Stash” and found 118678 hits in the file. All of them are like:</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Line 1668: [16] MatStashScatterBegin_Private(): No of messages: 0 </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> Line 1669: [16] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> Line 1670: [27] MatStashScatterBegin_Private(): No of messages: 0 </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> Line 1671: [27] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> Line 1672: [28] MatStashScatterBegin_Private(): No of messages: 0 </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> Line 1673: [28] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> Line 1674: [11] MatStashScatterBegin_Private(): No of messages: 0 </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Thanks,</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Shuangshuang</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> <a href="mailto:five9a2@gmail.com" target="_blank">five9a2@gmail.com</a> [mailto:<a href="mailto:five9a2@gmail.com" target="_blank">five9a2@gmail.com</a>] <b>On Behalf Of </b>Jed Brown<br><b>Sent:</b> Friday, August 30, 2013 3:52 PM<br><b>To:</b> Barry Smith<br><b>Cc:</b> PETSc users list; Shrirang Abhyankar; Jin, Shuangshuang<br><b>Subject:</b> Re: [petsc-users] Performance of PETSc TS solver</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p>Also, which TS method are you using? Rosenbrock methods will amortize a lot of assembly cost by reusing the matrix for several stages.<o:p></o:p></p><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>On Aug 30, 2013 3:48 PM, "Barry Smith" <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;margin-bottom:12.0pt'><br> I would next parallelize the function evaluation since it is the single largest consumer of time and should presumably be faster in parallel. After that revisit the -log_summary again to decide if the Jacobian evaluation can be improved.<br><br> Barry<br><br>On Aug 30, 2013, at 5:28 PM, "Jin, Shuangshuang" <<a href="mailto:Shuangshuang.Jin@pnnl.gov" target="_blank">Shuangshuang.Jin@pnnl.gov</a>> wrote:<br><br>> Hello, I'm trying to update some of my status here. I just managed to" _distribute_ the work of computing the Jacobian matrix" as you suggested, so each processor only computes a part of elements for the Jacobian matrix instead of a global Jacobian matrix. I observed a reduction of the computation time from 351 seconds to 55 seconds, which is much better but still slower than I expected given the problem size is small. (4n functions in IFunction, and 4n*4n Jacobian matrix in IJacobian, n = 288).<br>><br>> I looked at the log profile again, and saw that most of the computation time are still for Functioan Eval and Jacobian Eval:<br>><br>> TSStep 600 1.0 5.6103e+01 1.0 9.42e+0825.6 3.0e+06 2.9e+02 7.0e+04 93100 99 99 92 152100 99 99110 279<br>> TSFunctionEval 2996 1.0 2.9608e+01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+04 30 0 0 0 39 50 0 0 0 47 0<br>> TSJacobianEval 1796 1.0 2.3436e+01 1.0 0.00e+00 0.0 5.4e+02 3.8e+01 1.3e+04 39 0 0 0 16 64 0 0 0 20 0<br>> Warning -- total time of even greater than time of entire stage -- something is wrong with the timer<br>> SNESSolve 600 1.0 5.5692e+01 1.1 9.42e+0825.7 3.0e+06 2.9e+02 6.4e+04 88100 99 99 84 144100 99 99101 281<br>> SNESFunctionEval 2396 1.0 2.3715e+01 3.4 1.04e+06 1.0 0.0e+00 0.0e+00 2.4e+04 25 0 0 0 31 41 0 0 0 38 1<br>> SNESJacobianEval 1796 1.0 2.3447e+01 1.0 0.00e+00 0.0 5.4e+02 3.8e+01 1.3e+04 39 0 0 0 16 64 0 0 0 20 0<br>> SNESLineSearch 1796 1.0 1.8313e+01 1.0 1.54e+0831.4 4.9e+05 2.9e+02 2.5e+04 30 16 16 16 33 50 16 16 16 39 139<br>> KSPGMRESOrthog 9090 1.0 1.1399e+00 4.1 1.60e+07 1.0 0.0e+00 0.0e+00 9.1e+03 1 3 0 0 12 2 3 0 0 14 450<br>> KSPSetUp 3592 1.0 2.8342e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+01 0 0 0 0 0 0 0 0 0 0 0<br>> KSPSolve 1796 1.0 2.3052e+00 1.0 7.87e+0825.2 2.5e+06 2.9e+02 2.0e+04 4 84 83 83 26 6 84 83 83 31 5680<br>> PCSetUp 3592 1.0 9.1255e-02 1.7 6.47e+05 2.5 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 159<br>> PCSetUpOnBlocks 1796 1.0 6.6802e-02 2.3 6.47e+05 2.5 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 0 217<br>> PCApply 10886 1.0 2.6064e-01 1.3 4.70e+06 1.5 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 1 0 0 0 481<br>><br>> I was wondering why SNESFunctionEval and SNESJacobianEval took over 23 seconds each, however, the KSPSolve only took 2.3 seconds, which is 10 times faster. Is this normal? Do you have any more suggestion on how to reduce the FunctionEval and JacobianEval time?<br>> (Currently in IFunction, my f function is sequentially formulated; in IJacobian, the Jacobian matrix is distributed formulated).<br>><br>> Thanks,<br>> Shuangshuang<br>><br>><br>><br>><br>><br>> -----Original Message-----<br>> From: Jed Brown [mailto:<a href="mailto:five9a2@gmail.com" target="_blank">five9a2@gmail.com</a>] On Behalf Of Jed Brown<br>> Sent: Friday, August 16, 2013 5:00 PM<br>> To: Jin, Shuangshuang; Barry Smith; Shri (<a href="mailto:abhyshr@mcs.anl.gov" target="_blank">abhyshr@mcs.anl.gov</a>)<br>> Cc: <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a><br>> Subject: RE: [petsc-users] Performance of PETSc TS solver<br>><br>> "Jin, Shuangshuang" <<a href="mailto:Shuangshuang.Jin@pnnl.gov" target="_blank">Shuangshuang.Jin@pnnl.gov</a>> writes:<br>><br>>> ////////////////////////////////////////////////////////////////////////////////////////<br>>> // This proves to be the most time-consuming block in the computation:<br>>> // Assign values to J matrix for the first 2*n rows (constant values)<br>>> ... (skipped)<br>>><br>>> // Assign values to J matrix for the following 2*n rows (depends on X values)<br>>> for (i = 0; i < n; i++) {<br>>> for (j = 0; j < n; j++) {<br>>> ...(skipped)<br>><br>> This is a dense iteration. Are the entries really mostly nonzero? Why is your i loop over all rows instead of only over xstart to xstart+xlen?<br>><br>>> }<br>>><br>>> //////////////////////////////////////////////////////////////////////<br>>> //////////////////<br>>><br>>> for (i = 0; i < 4*n; i++) {<br>>> rowcol[i] = i;<br>>> }<br>>><br>>> // Compute function over the locally owned part of the grid<br>>> for (i = xstart; i < xstart+xlen; i++) {<br>>> ierr = MatSetValues(*B, 1, &i, 4*n, rowcol, &J[i][0],<br>>> INSERT_VALUES); CHKERRQ(ierr);<br>><br>> This is seems to be creating a distributed dense matrix from a dense matrix J of the global dimension. Is that correct? You need to _distribute_ the work of computing the matrix entries if you want to see a speedup.<o:p></o:p></p></div></div></div></div></div></body></html>