Hi Barry,<br><br>Thank you for your reply.<br>I don&#39;t think this problem comes from the matrix assemble. Because the result I showed you in the last email is from a two-level Newton method which means I first solve a coarse problem and use the coarse solution as the fine level problem&#39;s initial guess. If I just use the one-level method, there is no such problem. The memory usage in the -log_summary output is correct and time spend on the SNESJacobianEval is also normal I think (see attached) for the one-level method. The strange memory usage just appear in the two-level method. The reason that I claim the two-level&#39;s computing time is not correct is that I solve the same problem with the same number of processors and the two-level&#39;s iteration number of SNES and GMRES is much smaller than the one-level method, but the compute time is opposite (the time spend on the coarse problem is just 25s). From the -log_summary outputs of the two methods I found that the matrix&#39;s memory usage is total different. So I think there must be some bugs in my two-level code. But I have no idea how to debug this problem. <br>

<br>Best,<br>Rongliang<br><br><div class="gmail_quote">On Fri, Oct 7, 2011 at 10:24 AM, Barry Smith <span dir="ltr">&lt;<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

<a href="http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#efficient-assembly" target="_blank">http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#efficient-assembly</a><br>

<br>

<br>

On Oct 7, 2011, at 11:22 AM, Rongliang Chen wrote:<br>

<br>

&gt; -------------------------------------------------<br>

&gt; Joab<br>

&gt;<br>

&gt; Shape Optimization solver<br>

&gt;  by Rongliang Chen<br>

&gt;  compiled on 15:54:32, Oct  3 2011<br>

&gt;  Running on: Wed Oct  5 10:24:10 2011<br>

&gt;<br>

&gt;  revision $Rev: 157 $<br>

&gt; -------------------------------------------------<br>

&gt; Command-line options: -coarse_ksp_rtol 1.0e-1 -coarsegrid /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E2000_N8241_D70170.fsi -computeinitialguess -f /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E32000_N128961_D1096650.fsi -geometric_asm -geometric_asm_overlap 8 -inletu 5.0 -ksp_atol 1e-8 -ksp_gmres_restart 600 -ksp_max_it 3000 -ksp_pc_side right -ksp_rtol 1.e-3 -ksp_type gmres -log_summary -mat_partitioning_type parmetis -nest_geometric_asm_overlap 4 -nest_ksp_atol 1e-8 -nest_ksp_gmres_restart 800 -nest_ksp_max_it 1000 -nest_ksp_pc_side right -nest_ksp_rtol 1.e-2 -nest_ksp_type gmres -nest_pc_asm_type basic -nest_pc_type asm -nest_snes_atol 1.e-10 -nest_snes_max_it 20 -nest_snes_rtol 1.e-4 -nest_sub_pc_factor_mat_ordering_type qmd -nest_sub_pc_factor_shift_amount 1e-8 -nest_sub_pc_factor_shift_type nonzero -nest_sub_pc_type lu -nested -noboundaryreduce -pc_asm_type basic -pc_type asm -shapebeta 10.0 -snes_atol 1.e-10 -snes_max_it 20 -snes_rtol 1.e-6 -sub_pc_f<br>


&gt; actor_mat_ordering_type qmd -sub_pc_factor_shift_amount 1e-8 -sub_pc_factor_shift_type nonzero -sub_pc_type lu -viscosity 0.01<br>

&gt; -------------------------------------------------<br>

&gt;<br>

&gt; Starting to load grid...<br>

&gt; Nodes on moving boundary: coarse 199, fine 799, Gridratio 0.250000.<br>

&gt; Setupping Interpolation matrix......<br>

&gt; Interpolation matrix done......Time spent: 0.405431<br>

&gt; finished.<br>

&gt; Grid has 32000 elements, 1096658 degrees of freedom.<br>

&gt; Coarse grid has 2000 elements, 70170 degrees of freedom.<br>

&gt;  [0] has 35380 degrees of freedom (matrix), 35380 degrees of freedom (including shared points).<br>

&gt;  [0] coarse grid has 2194 degrees of freedom (matrix), 2194 degrees of freedom (including shared points).<br>

&gt;  [31] has 32466 degrees of freedom (matrix), 34428 degrees of freedom (including shared points).<br>

&gt;  [31] coarse grid has 2250 degrees of freedom (matrix), 2826 degrees of freedom (including shared points).<br>

&gt; Time spend on the load grid and create matrix etc.: 3.577862.<br>

&gt; Solving fixed mesh (steady-state problem)<br>

&gt; Solving coarse problem......<br>

&gt;  0 SNES norm 3.1224989992e+01, 0 KSP its last norm 0.0000000000e+00.<br>

&gt;  1 SNES norm 1.3987219837e+00, 25 KSP its last norm 2.4915963656e-01.<br>

&gt;  2 SNES norm 5.1898321541e-01, 59 KSP its last norm 1.3451744761e-02.<br>

&gt;  3 SNES norm 4.0024228221e-02, 56 KSP its last norm 4.9036146089e-03.<br>

&gt;  4 SNES norm 6.7641787439e-04, 59 KSP its last norm 3.6925683196e-04.<br>

&gt; Coarse solver done......<br>

&gt; Initial value of object function (Energy dissipation) (Coarse): 38.9341108701<br>

&gt;  0 SNES norm 7.4575110699e+00, 0 KSP its last norm 0.0000000000e+00.<br>

&gt;  1 SNES norm 6.4497565921e-02, 51 KSP its last norm 7.4277453141e-03.<br>

&gt;  2 SNES norm 9.2093642958e-04, 90 KSP its last norm 5.4331380112e-05.<br>

&gt;  3 SNES norm 8.1283574549e-07, 103 KSP its last norm 7.5974191049e-07.<br>

&gt; Initial value of object function (Energy dissipation) (Fine): 42.5134271399<br>

&gt; Solution time of 17.180358 sec.<br>

&gt; Fixed mesh (Steady-state) solver done.<br>

&gt; Total number of nonlinear iterations = 3<br>

&gt; Total number of linear iterations = 244<br>

&gt; Average number of linear iterations = 81.333336<br>

&gt; Time computing: 17.180358 sec, Time outputting: 0.000000 sec.<br>

&gt; Time spent in coarse nonlinear solve: 0.793436 sec, 0.046183 fraction of total compute time.<br>

&gt; Solving Shape Optimization problem (steady-state problem)<br>

&gt; Solving coarse problem......<br>

&gt;  0 SNES norm 4.1963166116e+01, 0 KSP its last norm 0.0000000000e+00.<br>

&gt;  1 SNES norm 3.2749386875e+01, 132 KSP its last norm 4.0966334477e-01.<br>

&gt;  2 SNES norm 2.2874504408e+01, 130 KSP its last norm 3.2526355310e-01.<br>

&gt;  3 SNES norm 1.4327187891e+01, 132 KSP its last norm 2.1213029400e-01.<br>

&gt;  4 SNES norm 1.7283643754e+00, 81 KSP its last norm 1.4233338128e-01.<br>

&gt;  5 SNES norm 3.6703566918e-01, 133 KSP its last norm 1.6069896349e-02.<br>

&gt;  6 SNES norm 3.6554528686e-03, 77 KSP its last norm 3.5379167356e-03.<br>

&gt; Coarse solver done......<br>

&gt; Optimized value of object function (Energy dissipation) (Coarse): 29.9743062939<br>

&gt; The reduction of the energy dissipation (Coarse): 23.012737%<br>

&gt; The optimized curve (Coarse):<br>

&gt; a = (4.500000, -0.042893, -0.002030, 0.043721, -0.018798, 0.001824)<br>

&gt; Solving  moving mesh equation......<br>

&gt; KSP norm 2.3040219081e-07, KSP its. 741. Time spent 8.481956<br>

&gt; Moving mesh solver done.<br>

&gt;  0 SNES norm 4.7843968670e+02, 0 KSP its last norm 0.0000000000e+00.<br>

&gt;  1 SNES norm 1.0148854085e+02, 49 KSP its last norm 4.7373180511e-01.<br>

&gt;  2 SNES norm 1.8312214030e+00, 46 KSP its last norm 1.0133332840e-01.<br>

&gt;  3 SNES norm 3.3101970861e-03, 212 KSP its last norm 1.7753271069e-03.<br>

&gt;  4 SNES norm 4.9552614008e-06, 249 KSP its last norm 3.2293284103e-06.<br>

&gt; Optimized value of object function (Energy dissipation) (Fine): 33.2754372645<br>

&gt; Solution time of 4053.227456 sec.<br>

&gt; Number of unknowns = 1096658<br>

&gt; Parameters: kinematic viscosity = 0.01<br>

&gt;            inlet velocity: u = 5,  v = 0<br>

&gt; Total number of nonlinear iterations = 4<br>

&gt; Total number of linear iterations = 556<br>

&gt; Average number of linear iterations = 139.000000<br>

&gt; Time computing: 4053.227456 sec, Time outputting: 0.000001 sec.<br>

&gt; Time spent in coarse nonlinear solve: 24.239526 sec, 0.005980 fraction of total compute time.<br>

&gt; The optimized curve (fine):<br>

&gt; a = (4.500000, -0.046468, -0.001963, 0.045736, -0.019141, 0.001789)<br>

&gt; The reduction of the energy dissipation (Fine): 21.729582%<br>

&gt; Time spend on fixed mesh solving: 17.296872<br>

&gt; Time spend on shape opt. solving: 4053.250126<br>

&gt; Latex command line:<br>

&gt;  np    Newton   GMRES   Time(Total)    Time(Coarse)   Ratio<br>

&gt; 32 &amp;   4   &amp;   139.00   &amp;   4053.23  &amp;    24.24   &amp;  0.6\%<br>

&gt;<br>

&gt; Running finished on: Wed Oct  5 11:32:04 2011<br>

&gt; Total running time: 4070.644329<br>

&gt; ************************************************************************************************************************<br>

&gt; ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use &#39;enscript -r -fCourier9&#39; to print this document            ***<br>

&gt; ************************************************************************************************************************<br>

&gt;<br>

&gt; ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------<br>

&gt;<br>

&gt; ./joab on a Janus-nod named node1751 with 32 processors, by ronglian Wed Oct  5 11:32:04 2011<br>

&gt; Using Petsc Release Version 3.2.0, Patch 1, Mon Sep 12 16:01:51 CDT 2011<br>

&gt;<br>

&gt;                         Max       Max/Min        Avg      Total<br>

&gt; Time (sec):           4.074e+03      1.00000   4.074e+03<br>

&gt; Objects:              1.011e+03      1.00000   1.011e+03<br>

&gt; Flops:                2.255e+11      2.27275   1.471e+11  4.706e+12<br>

&gt; Flops/sec:            5.535e+07      2.27275   3.609e+07  1.155e+09<br>

&gt; MPI Messages:         1.103e+05      5.41392   3.665e+04  1.173e+06<br>

&gt; MPI Message Lengths:  1.326e+09      2.60531   2.416e+04  2.833e+10<br>

&gt; MPI Reductions:       5.969e+03      1.00000<br>

&gt;<br>

&gt; Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)<br>

&gt;                            e.g., VecAXPY() for real vectors of length N --&gt; 2N flops<br>

&gt;                            and VecAXPY() for complex vectors of length N --&gt; 8N flops<br>

&gt;<br>

&gt; Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --<br>

&gt;                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total<br>

&gt; 0:      Main Stage: 4.0743e+03 100.0%  4.7058e+12 100.0%  1.173e+06 100.0%  2.416e+04      100.0%  5.968e+03 100.0%<br>

&gt;<br>

&gt; ------------------------------------------------------------------------------------------------------------------------<br>

&gt; See the &#39;Profiling&#39; chapter of the users&#39; manual for details on interpreting output.<br>

&gt; Phase summary info:<br>

&gt;   Count: number of times phase was executed<br>

&gt;   Time and Flops: Max - maximum over all processors<br>

&gt;                   Ratio - ratio of maximum to minimum over all processors<br>

&gt;   Mess: number of messages sent<br>

&gt;   Avg. len: average message length<br>

&gt;   Reduct: number of global reductions<br>

&gt;   Global: entire computation<br>

&gt;   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().<br>

&gt;      %T - percent time in this phase         %F - percent flops in this phase<br>

&gt;      %M - percent messages in this phase     %L - percent message lengths in this phase<br>

&gt;      %R - percent reductions in this phase<br>

&gt;   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)<br>

&gt; ------------------------------------------------------------------------------------------------------------------------<br>

&gt; Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total<br>

&gt;                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s<br>

&gt; ------------------------------------------------------------------------------------------------------------------------<br>

&gt;<br>

&gt; --- Event Stage 0: Main Stage<br>

&gt;<br>

&gt; MatMult             2493 1.0 1.2225e+0218.4 4.37e+09 1.1 3.9e+05 2.2e+03 0.0e+00  2  3 33  3  0   2  3 33  3  0  1084<br>

&gt; MatMultTranspose       6 1.0 3.3590e-02 2.2 7.38e+06 1.1 8.0e+02 1.5e+03 0.0e+00  0  0  0  0  0   0  0  0  0  0  6727<br>

&gt; MatSolve            2467 1.0 1.1270e+02 1.7 5.95e+10 1.7 0.0e+00 0.0e+00 0.0e+00  2 33  0  0  0   2 33  0  0  0 13775<br>

&gt; MatLUFactorSym         4 1.0 3.4774e+00 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0<br>

&gt; MatLUFactorNum        18 1.0 2.0832e+02 3.7 1.55e+11 3.2 0.0e+00 0.0e+00 0.0e+00  2 56  0  0  0   2 56  0  0  0 12746<br>

&gt; MatILUFactorSym        1 1.0 8.3280e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

&gt; MatAssemblyBegin     103 1.0 7.6879e+0215.4 0.00e+00 0.0 1.6e+04 6.2e+04 1.7e+02  7  0  1  4  3   7  0  1  4  3     0<br>

&gt; MatAssemblyEnd       103 1.0 3.7818e+01 1.0 0.00e+00 0.0 3.0e+03 5.3e+02 1.6e+02  1  0  0  0  3   1  0  0  0  3     0<br>

&gt; MatGetRowIJ            5 1.0 4.8716e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

&gt; MatGetSubMatrice      18 1.0 4.3095e+00 2.5 0.00e+00 0.0 1.6e+04 3.5e+05 7.4e+01  0  0  1 20  1   0  0  1 20  1     0<br>

&gt; MatGetOrdering         5 1.0 1.4656e+00 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  0   0  0  0  0  0     0<br>

&gt; MatPartitioning        1 1.0 1.4356e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

&gt; MatZeroEntries        42 1.0 2.0939e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

&gt; VecDot                17 1.0 1.2719e-02 6.8 5.47e+05 1.1 0.0e+00 0.0e+00 1.7e+01  0  0  0  0  0   0  0  0  0  0  1317<br>

&gt; VecMDot             2425 1.0 1.7196e+01 2.2 5.82e+09 1.1 0.0e+00 0.0e+00 2.4e+03  0  4  0  0 41   0  4  0  0 41 10353<br>

&gt; VecNorm             2503 1.0 2.7923e+00 3.4 1.18e+08 1.1 0.0e+00 0.0e+00 2.5e+03  0  0  0  0 42   0  0  0  0 42  1293<br>

&gt; VecScale            2467 1.0 7.3112e-02 1.7 5.84e+07 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 24453<br>

&gt; VecCopy              153 1.0 1.1636e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

&gt; VecSet              5031 1.0 6.0423e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

&gt; VecAXPY              137 1.0 1.1462e-02 1.5 6.33e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 16902<br>

&gt; VecWAXPY              19 1.0 1.7784e-03 1.4 2.83e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4869<br>

&gt; VecMAXPY            2467 1.0 8.5820e+00 1.3 5.93e+09 1.1 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0 21153<br>

&gt; VecAssemblyBegin      69 1.0 1.0341e+0018.2 0.00e+00 0.0 4.9e+03 5.4e+02 2.1e+02  0  0  0  0  3   0  0  0  0  3     0<br>

&gt; VecAssemblyEnd        69 1.0 2.4939e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

&gt; VecScatterBegin     7491 1.0 1.3734e+00 1.7 0.00e+00 0.0 1.1e+06 1.9e+04 0.0e+00  0  0 96 76  0   0  0 96 76  0     0<br>

&gt; VecScatterEnd       7491 1.0 2.0055e+02 8.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0<br>

&gt; VecReduceArith         8 1.0 1.4977e-03 2.0 3.05e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  6232<br>

&gt; VecReduceComm          4 1.0 8.9908e-0412.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

&gt; VecNormalize        2467 1.0 2.8067e+00 3.4 1.75e+08 1.1 0.0e+00 0.0e+00 2.4e+03  0  0  0  0 41   0  0  0  0 41  1905<br>

&gt; SNESSolve              4 1.0 4.0619e+03 1.0 2.23e+11 2.3 9.4e+05 2.3e+04 4.1e+03100 98 80 77 68 100 98 80 77 68  1136<br>

&gt; SNESLineSearch        17 1.0 1.1423e+01 1.0 5.23e+07 1.1 1.8e+04 1.7e+04 3.3e+02  0  0  2  1  6   0  0  2  1  6   140<br>

&gt; SNESFunctionEval      23 1.0 2.9742e+01 1.0 2.60e+07 1.1 1.9e+04 1.9e+04 3.5e+02  1  0  2  1  6   1  0  2  1  6    27<br>

&gt; SNESJacobianEval      17 1.0 3.6786e+03 1.0 0.00e+00 0.0 9.8e+03 6.4e+04 1.4e+02 90  0  1  2  2  90  0  1  2  2     0<br>

&gt; KSPGMRESOrthog      2425 1.0 2.5150e+01 1.6 1.16e+10 1.1 0.0e+00 0.0e+00 2.4e+03  0  8  0  0 41   0  8  0  0 41 14157<br>

&gt; KSPSetup              36 1.0 2.5388e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

&gt; KSPSolve              18 1.0 3.6141e+02 1.0 2.25e+11 2.3 1.1e+06 2.4e+04 5.0e+03  9100 97 96 84   9100 97 96 84 13015<br>

&gt; PCSetUp               36 1.0 2.1635e+02 3.6 1.55e+11 3.2 1.8e+04 3.2e+05 1.5e+02  3 56  2 20  3   3 56  2 20  3 12274<br>

&gt; PCSetUpOnBlocks       18 1.0 2.1293e+02 3.7 1.55e+11 3.2 0.0e+00 0.0e+00 2.7e+01  2 56  0  0  0   2 56  0  0  0 12471<br>

&gt; PCApply             2467 1.0 2.5616e+02 2.5 5.95e+10 1.7 7.3e+05 2.8e+04 0.0e+00  4 33 62 73  0   4 33 62 73  0  6060<br>

&gt; ------------------------------------------------------------------------------------------------------------------------<br>

&gt;<br>

&gt; Memory usage is given in bytes:<br>

&gt;<br>

&gt; Object Type          Creations   Destructions     Memory  Descendants&#39; Mem.<br>

&gt; Reports information only for process 0.<br>

&gt;<br>

&gt; --- Event Stage 0: Main Stage<br>

&gt;<br>

&gt;              Matrix    39             39  18446744074642894848     0<br>

&gt; Matrix Partitioning     1              1          640     0<br>

&gt;           Index Set   184            184      2589512     0<br>

&gt;   IS L to G Mapping     2              2       301720     0<br>

&gt;              Vector   729            729    133662888     0<br>

&gt;      Vector Scatter    29             29        30508     0<br>

&gt;   Application Order     2              2      9335968     0<br>

&gt;                SNES     4              4         5088     0<br>

&gt;       Krylov Solver    10             10     32264320     0<br>

&gt;      Preconditioner    10             10         9088     0<br>

&gt;              Viewer     1              0            0     0<br>

&gt; ========================================================================================================================<br>

&gt; Average time to get PetscTime(): 1.19209e-07<br>

&gt; Average time for MPI_Barrier(): 1.20163e-05<br>

&gt; Average time for zero size MPI_Send(): 2.49594e-06<br>

&gt; #PETSc Option Table entries:<br>

&gt; -coarse_ksp_rtol 1.0e-1<br>

&gt; -coarsegrid /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E2000_N8241_D70170.fsi<br>

&gt; -computeinitialguess<br>

&gt; -f /scratch/stmp00/ronglian/input/Cannula/Cannula_Nest2_E32000_N128961_D1096650.fsi<br>

&gt; -geometric_asm<br>

&gt; -geometric_asm_overlap 8<br>

&gt; -inletu 5.0<br>

&gt; -ksp_atol 1e-8<br>

&gt; -ksp_gmres_restart 600<br>

&gt; -ksp_max_it 3000<br>

&gt; -ksp_pc_side right<br>

&gt; -ksp_rtol 1.e-3<br>

&gt; -ksp_type gmres<br>

&gt; -log_summary<br>

&gt; -mat_partitioning_type parmetis<br>

&gt; -nest_geometric_asm_overlap 4<br>

&gt; -nest_ksp_atol 1e-8<br>

&gt; -nest_ksp_gmres_restart 800<br>

&gt; -nest_ksp_max_it 1000<br>

&gt; -nest_ksp_pc_side right<br>

&gt; -nest_ksp_rtol 1.e-2<br>

&gt; -nest_ksp_type gmres<br>

&gt; -nest_pc_asm_type basic<br>

&gt; -nest_pc_type asm<br>

&gt; -nest_snes_atol 1.e-10<br>

&gt; -nest_snes_max_it 20<br>

&gt; -nest_snes_rtol 1.e-4<br>

&gt; -nest_sub_pc_factor_mat_ordering_type qmd<br>

&gt; -nest_sub_pc_factor_shift_amount 1e-8<br>

&gt; -nest_sub_pc_factor_shift_type nonzero<br>

&gt; -nest_sub_pc_type lu<br>

&gt; -nested<br>

&gt; -noboundaryreduce<br>

&gt; -pc_asm_type basic<br>

&gt; -pc_type asm<br>

&gt; -shapebeta 10.0<br>

&gt; -snes_atol 1.e-10<br>

&gt; -snes_max_it 20<br>

&gt; -snes_rtol 1.e-6<br>

&gt; -sub_pc_factor_mat_ordering_type qmd<br>

&gt; -sub_pc_factor_shift_amount 1e-8<br>

&gt; -sub_pc_factor_shift_type nonzero<br>

&gt; -sub_pc_type lu<br>

&gt; -viscosity 0.01<br>

&gt; #End of PETSc Option Table entries<br>

&gt; Compiled without FORTRAN kernels<br>

&gt; Compiled with full precision matrices (default)<br>

&gt; sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8<br>

&gt; Configure run at: Tue Sep 13 13:28:48 2011<br>

&gt; Configure options: --known-level1-dcache-size=32768 --known-level1-dcache-linesize=32 --known-level1-dcache-assoc=0 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=8 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-batch=1 --with-mpi-shared-libraries=1 --known-mpi-shared-libraries=0 --download-f-blas-lapack=1 --download-hypre=1 --download-superlu=1 --download-parmetis=1 --download-superlu_dist=1 --download-blacs=1 --download-scalapack=1 --download-mumps=1 --with-debugging=0<br>


&gt; -----------------------------------------<br>

&gt; Libraries compiled on Tue Sep 13 13:28:48 2011 on node1367<br>

&gt; Machine characteristics: Linux-2.6.18-238.12.1.el5-x86_64-with-redhat-5.6-Tikanga<br>

&gt; Using PETSc directory: /home/ronglian/soft/petsc-3.2-p1<br>

&gt; Using PETSc arch: Janus-nodebug<br>

&gt; -----------------------------------------<br>

&gt;<br>

&gt; Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}<br>

&gt; Using Fortran compiler: mpif90  -Wall -Wno-unused-variable -O   ${FOPTFLAGS} ${FFLAGS}<br>

&gt; -----------------------------------------<br>

&gt;<br>

&gt; Using include paths: -I/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/include -I/home/ronglian/soft/petsc-3.2-p1/include -I/home/ronglian/soft/petsc-3.2-p1/include -I/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/include -I/curc/tools/free/redhat_5_x86_64/openmpi-1.4.3_ib/include<br>


&gt; -----------------------------------------<br>

&gt;<br>

&gt; Using C linker: mpicc<br>

&gt; Using Fortran linker: mpif90<br>

&gt; Using libraries: -Wl,-rpath,/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -L/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -lpetsc -lX11 -Wl,-rpath,/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -L/home/ronglian/soft/petsc-3.2-p1/Janus-nodebug/lib -lsuperlu_dist_2.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lHYPRE -lmpi_cxx -lstdc++ -lscalapack -lblacs -lsuperlu_4.2 -lflapack -lfblas -L/curc/tools/free/redhat_5_x86_64/openmpi-1.4.3_ib/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl<br>


&gt; -----------------------------------------<br>

<div><div></div><div class="h5">&gt;<br>

&gt;&gt;<br>

&gt;&gt; Yes, it has no influence on performance. If you think it does, send<br>

&gt;&gt; -log_summary output to <a href="mailto:petsc-maint@mcs.anl.gov">petsc-maint@mcs.anl.gov</a><br>

&gt;&gt;<br>

&gt;&gt;  Matt<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt; Hi Matt,<br>

&gt;<br>

&gt; The -log_summary output is attached. I found that the SNESJacobianEval()<br>

&gt; takes 90% of the total time. I think this is abnormal because I use a hand<br>

&gt; coded Jacobian matrix. The reason, I think, for the 90% of the total time is<br>

&gt; that the matrix takes too much memory (over 1.8x10^19 bytes) which maybe<br>

&gt; have used the swap. But I do not know why 23 one million by one million<br>

&gt; matrices will use so much memory. Can you tell me how to debug this problem?<br>

&gt; Thank you.<br>

&gt;<br>

&gt; Best,<br>

&gt; Rongliang<br>

&gt;<br>

&gt;<br>

&gt; Yes, it has no influence on performance. If you think it does, send<br>

&gt; -log_summary output to <a href="mailto:petsc-maint@mcs.anl.gov">petsc-maint@mcs.anl.gov</a><br>

&gt;<br>

&gt;   Matt<br>

&gt;<br>

&gt;<br>

&gt; Hi Matt,<br>

&gt;<br>

&gt; The -log_summary output is attached. I found that the SNESJacobianEval() takes 90% of the total time. I think this is abnormal because I use a hand coded Jacobian matrix. The reason, I think, for the 90% of the total time is that the matrix takes too much memory (over 1.8x10^19 bytes) which maybe have used the swap. But I do not know why 23 one million by one million matrices will use so much memory. Can you tell me how to debug this problem? Thank you.<br>


&gt;<br>

&gt; Best,<br>

&gt; Rongliang<br>

<br>

</div></div></blockquote></div><br>