<p dir="ltr">You can choose the number of rows per process so that each has about the same number of entries.  "Residual" meant IFunction and/or RHSFunction, when applicable.</p>

<div class="gmail_quote">On Aug 31, 2013 3:53 PM, "Jin, Shuangshuang" <<a href="mailto:Shuangshuang.Jin@pnnl.gov">Shuangshuang.Jin@pnnl.gov</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>

<font face="Calibri, Verdana, Helvetica, Arial"><span style="font-size:11pt">Hi, Jed, I think you have a good point here. The load imbalance might be a big problem for us, since the Jaociban matrix is not symmetric, and the distributed computation of each part of the Jacobian matrix elements on different processor can vary a lot. However, that’s what the matrix looks like. Do we have any control over that? And what do you mean by “distribute the work for residual evaluation better?” I think I can only distribute the Ifunction and Ijacobian computation, but have no control of residual evaluation. Isn’t it a black box inside TS?<br>


<br>

For the gprof Barry suggested, I tried to compile with gcc –pg with the sequential mode, couldn’t create the gmon.out file after running the executable... <br>

<br>

Thanks,<br>

Shuangshuang<br>

<br>

<br>

On 8/30/13 4:57 PM, "Jed gov>" <<a href="http://jedbrown@mcs.anl.gov" target="_blank">jedbrown@mcs.anl.gov</a>> wrote:<br>

<br>

</span></font><blockquote><font face="Calibri, Verdana, Helvetica, Arial"><span style="font-size:11pt">"Jin, Shuangshuang" <<a href="http://Shuangshuang.Jin@pnnl.gov" target="_blank">Shuangshuang.Jin@pnnl.gov</a>> writes:<br>


<br>

> Hello, I'm trying to update some of my status here. I just managed to" _distribute_ the work of computing the Jacobian matrix" as you suggested, so each processor only computes a part of elements for the Jacobian matrix instead of a global Jacobian matrix. I observed a reduction of the computation time from 351 seconds to 55 seconds, which is much better but still slower than I expected given the problem size is small. (4n functions in IFunction, and 4n*4n Jacobian matrix in IJacobian, n = 288).<br>


><br>

> I looked at the log profile again, and saw that most of the computation time are still for Functioan Eval and Jacobian Eval:<br>

><br>

> TSStep               600 1.0 5.6103e+01 1.0 9.42e+0825.6 3.0e+06 2.9e+02 7.0e+04 93100 99 99 92 152100 99 99110   279<br>

> TSFunctionEval      2996 1.0 2.9608e+01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+04 30  0  0  0 39  50  0  0  0 47     0<br>

<br>

The load imbalance is pretty significant here, so maybe you can<br>

distribute the work for residual evaluation better?<br>

<br>

> TSJacobianEval      1796 1.0 2.3436e+01 1.0 0.00e+00 0.0 5.4e+02 3.8e+01 1.3e+04 39  0  0  0 16  64  0  0  0 20     0<br>

> Warning -- total time of even greater than time of entire stage -- something is wrong with the timer<br>

<br>

SNESSolve contains the Jacobian and residual evaluations, as well as<br>

KSPSolve.  Pretty much all the cost is in those three things.<br>

<br>

> SNESSolve            600 1.0 5.5692e+01 1.1 9.42e+0825.7 3.0e+06 2.9e+02 6.4e+04 88100 99 99 84 144100 99 99101   281<br>

> SNESFunctionEval    2396 1.0 2.3715e+01 3.4 1.04e+06 1.0 0.0e+00 0.0e+00 2.4e+04 25  0  0  0 31  41  0  0  0 38     1<br>

> SNESJacobianEval    1796 1.0 2.3447e+01 1.0 0.00e+00 0.0 5.4e+02 3.8e+01 1.3e+04 39  0  0  0 16  64  0  0  0 20     0<br>

> SNESLineSearch      1796 1.0 1.8313e+01 1.0 1.54e+0831.4 4.9e+05 2.9e+02 2.5e+04 30 16 16 16 33  50 16 16 16 39   139<br>

> KSPGMRESOrthog      9090 1.0 1.1399e+00 4.1 1.60e+07 1.0 0.0e+00 0.0e+00 9.1e+03  1  3  0  0 12   2  3  0  0 14   450<br>

> KSPSetUp            3592 1.0 2.8342e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+01  0  0  0  0  0   0  0  0  0  0     0<br>

> KSPSolve            1796 1.0 2.3052e+00 1.0 7.87e+0825.2 2.5e+06 2.9e+02 2.0e+04  4 84 83 83 26   6 84 83 83 31  5680<br>

> PCSetUp             3592 1.0 9.1255e-02 1.7 6.47e+05 2.5 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0   159<br>

> PCSetUpOnBlocks     1796 1.0 6.6802e-02 2.3 6.47e+05 2.5 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0   217<br>

> PCApply            10886 1.0 2.6064e-01 1.3 4.70e+06 1.5 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   1  1  0  0  0   481<br>

><br>

> I was wondering why SNESFunctionEval and SNESJacobianEval took over 23<br>

> seconds each, however, the KSPSolve only took 2.3 seconds, which is 10<br>

> times faster. Is this normal? Do you have any more suggestion on how<br>

> to reduce the FunctionEval and JacobianEval time?<br>

<br>

It means that the linear systems are easy to solve (probably because<br>

they are small), but the IFunction and IJacobian are expensive.  As<br>

Barry says, you might be able to speed it up by sequential optimization.<br>

<br>

</span></font></blockquote>

</div>


</blockquote></div>