1) These MFlop rates are terrible. It seems like your problem is way too small.<br><br>2) The load balance is not good.<br><br> Matt<br><br><div><span class="gmail_quote">On 2/9/07, <b class="gmail_sendername">Ben Tay</b>
<<a href="mailto:zonexo@gmail.com">zonexo@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div>Ya, that's the mistake. I changed part of the code resulting in PetscFinalize not being called.
<div> </div>
<div>Here's the output:</div>
<div> </div>
<p>---------------------------------------------- PETSc Performance Summary: ----------------------------------------------</p>
<p>/home/enduser/g0306332/ns2d/a.out on a linux-mpi named <a href="http://atlas00.nus.edu.sg" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">atlas00.nus.edu.sg</a> with 4 processors, by g0306332 Sat Feb 10 08:32:08 2007
<br>Using Petsc Release Version 2.3.2, Patch 8, Tue Jan 2 14:33:59 PST 2007 HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80
<p> Max Max/Min Avg Total<br>Time (sec): 2.826e+02 2.08192 1.725e+02<br>Objects: 1.110e+02 1.00000 1.110e+02<br>Flops: 6.282e+08
1.00736 6.267e+08 2.507e+09<br>Flops/sec: 4.624e+06 2.08008 4.015e+06 1.606e+07<br>Memory: 1.411e+07 1.01142 5.610e+07<br>MPI Messages: 8.287e+03 1.90156
6.322e+03 2.529e+04<br>MPI Message Lengths: 6.707e+07 1.11755 1.005e+04 2.542e+08<br>MPI Reductions: 3.112e+03 1.00000</p>
<p>Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)<br> e.g., VecAXPY() for real vectors of length N --> 2N flops<br> and VecAXPY() for complex vectors of length N --> 8N flops
<p>Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --<br> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
<br> 0: Main Stage: 1.7247e+02 100.0% 2.5069e+09 100.0% 2.529e+04 100.0% 1.005e+04 100.0% 1.245e+04 100.0%</p>
<p>------------------------------------------------------------------------------------------------------------------------<br>See the 'Profiling' chapter of the users' manual for details on interpreting output.
<br>Phase summary info:<br> Count: number of times phase was executed<br> Time and Flops/sec: Max - maximum over all processors<br> Ratio - ratio of maximum to minimum over all processors<br> Mess: number of messages sent
<br> Avg. len: average message length<br> Reduct: number of global reductions<br> Global: entire computation<br> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().<br> %T - percent time in this phase %F - percent flops in this phase
<br> %M - percent messages in this phase %L - percent message lengths in this phase<br> %R - percent reductions in this phase<br> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
<p><br> ##########################################################<br> # #<br> # WARNING!!! #<br> # #
<br> # This code was compiled with a debugging option, #<br> # To get timing results run config/configure.py #<br> # using --with-debugging=no, the performance will #<br> # be generally two or three times faster. #
<br> # #<br> ##########################################################</p>
<p> </p>
<p><br> ##########################################################<br></p>
<p> </p>
<p> ##########################################################<br> # #<br> # WARNING!!! #<br> # #
<br> # This code was run without the PreLoadBegin() #<br> # macros. To get timing results we always recommend #<br> # preloading. otherwise timing numbers may be #<br> # meaningless. #
<br> ##########################################################</p>
<p><br>Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total<br> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
<p>--- Event Stage 0: Main Stage</p>
<p>MatMult 3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04 1.3e+03 0.0e+00 12 18 93 12 0 12 18 93 12 0 19<br>MatSolve 3967 1.0 2.5914e+00 1.9 7.99e+07 1.9 0.0e+00 0.0e+00 0.0e+00 1 17 0 0 0 1 17 0 0 0 168
<br>MatLUFactorNum 40 1.0 4.4779e-01 1.5 3.14e+07 1.5 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 85<br>MatILUFactorSym 2 1.0 3.1099e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
<br>MatScale 20 1.0 1.1487e-01 8.7 8.73e+07 8.9 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 39<br>MatAssemblyBegin 40 1.0 7.8844e+00 1.3 0.00e+00 0.0 7.6e+02 2.8e+05 8.0e+01 4 0 3 83 1 4 0 3 83 1 0
<br>MatAssemblyEnd 40 1.0 6.9408e+00 1.2 0.00e+00 0.0 1.2e+01 9.6e+02 6.4e+01 4 0 0 0 1 4 0 0 0 1 0<br>MatGetOrdering 2 1.0 8.0509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
<br>MatZeroEntries 21 1.0 1.4379e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecMDot 3792 1.0 4.7372e+01 1.4 5.20e+06 1.4 0.0e+00 0.0e+00 3.8e+03 24 29 0 0 30 24 29 0 0 30 15
<br>VecNorm 3967 1.0 3.9513e+01 1.2 4.11e+05 1.2 0.0e+00 0.0e+00 4.0e+03 21 2 0 0 32 21 2 0 0 32 1<br>VecScale 3947 1.0 3.4941e-02 1.2 2.18e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 738
<br>VecCopy 155 1.0 1.0029e-0125.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecSet 4142 1.0 3.4638e-01 6.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
<br>VecAXPY 290 1.0 5.9618e-03 1.2 2.14e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 709<br>VecMAXPY 3947 1.0 1.5566e+00 1.3 1.64e+08 1.3 0.0e+00 0.0e+00 0.0e+00 1 31 0 0 0 1 31 0 0 0 498
<br>VecAssemblyBegin 80 1.0 4.1793e+00 1.1 0.00e+00 0.0 9.6e+02 1.4e+04 2.4e+02 2 0 4 5 2 2 0 4 5 2 0<br>VecAssemblyEnd 80 1.0 2.0682e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
<br>VecScatterBegin 3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04 1.3e+03 0.0e+00 0 0 93 12 0 0 0 93 12 0 0<br>VecScatterEnd 3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11 0 0 0 0 11 0 0 0 0 0
<br>VecNormalize 3947 1.0 3.9593e+01 1.2 6.11e+05 1.2 0.0e+00 0.0e+00 3.9e+03 21 3 0 0 32 21 3 0 0 32 2<br>KSPGMRESOrthog 3792 1.0 4.8670e+01 1.3 9.92e+06 1.3 0.0e+00 0.0e+00 3.8e+03 25 58 0 0 30 25 58 0 0 30 30
<br>KSPSetup 80 1.0 2.0014e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 0 0 0 0 0 0 0<br>KSPSolve 40 1.0 1.0660e+02 1.0 5.90e+06 1.0 2.4e+04 1.3e+03 1.2e+04 62100 93 12 97 62100 93 12 97 23
<br>PCSetUp 80 1.0 4.5669e-01 1.5 3.05e+07 1.5 0.0e+00 0.0e+00 1.4e+01 0 2 0 0 0 0 2 0 0 0 83<br>PCSetUpOnBlocks 40 1.0 4.5418e-01 1.5 3.07e+07 1.5 0.0e+00 0.0e+00 1.0e+01 0 2 0 0 0 0 2 0 0 0 84
<br>PCApply 3967 1.0 4.1737e+00 2.0 5.30e+07 2.0 0.0e+00 0.0e+00 4.0e+03 2 17 0 0 32 2 17 0 0 32 104<br>------------------------------------------------------------------------------------------------------------------------
<div> </div>
<p>Memory usage is given in bytes:</p>
<p>Object Type Creations Destructions Memory Descendants' Mem.</p>
<p>--- Event Stage 0: Main Stage</p>
<p> Matrix 8 8 21136 0<br> Index Set 12 12 74952 0<br> Vec 81 81 1447476 0<br> Vec Scatter 2 2 0 0
<br> Krylov Solver 4 4 33760 0<br> Preconditioner 4 4 392 0<br>========================================================================================================================
<br>Average time to get PetscTime(): 1.09673e-06<br>Average time for MPI_Barrier(): 3.90053e-05<br>Average time for zero size MPI_Send(): 1.65105e-05<br>OptionTable: -log_summary<br>Compiled without FORTRAN kernels<br>Compiled with full precision matrices (default)
<br>sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4 sizeof(PetscScalar) 8<br>Configure run at: Thu Jan 18 12:23:31 2007<br>Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32 --with-mpi-dir=/opt/mpich/myrinet/intel/
<br>-----------------------------------------<br>Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on <a href="http://atlas1.nus.edu.sg" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">atlas1.nus.edu.sg
</a><br>Machine characteristics: Linux <a href="http://atlas1.nus.edu.sg" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
atlas1.nus.edu.sg</a> 2.4.21-20.ELsmp #1 SMP Wed Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux<br>Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8<br>Using PETSc arch: linux-mpif90<br>-----------------------------------------
<br>Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g<br>Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g -w90 -w<br>-----------------------------------------<br>Using include paths: -I/nas/lsftmp/g0306332/petsc-
2.3.2-p8 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include -I/opt/mpich/myrinet/intel/include<br>------------------------------------------<br>Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
<br>Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g -w90 -w<br>Using libraries: -Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc -Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32 -L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide -lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib -Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib -Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm -lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib -Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib -Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm -Wl,-rpath,\ -Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
<p> </p>
<div>This is the result I get for running 20 steps. There are 2 matrix to be solved. I've only parallize the solving of linear equations and kept the rest of the code serial for this test. However, I found that it's much slower than the sequential version.
<div> </div>
<div>From the ratio, it seems that MatScale and VecSet 's ratio are very high. I've done a scaling of 0.5 for momentum eqn. Is that the reason for the slowness? That is all I can decipher ....</div>
<div> </div>
<div>Thank you.</div><div><span class="e" id="q_110a930aefd60897_1">
<div> </div>
<div> </div>
<div><br><br> </div>
<div><span class="gmail_quote">On 2/10/07, <b class="gmail_sendername">Matthew Knepley</b> <<a href="mailto:knepley@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">knepley@gmail.com</a>
> wrote:</span>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"><span>On 2/9/07, <b class="gmail_sendername">Ben Tay</b> <<a href="mailto:zonexo@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
zonexo@gmail.com</a>> wrote:</span>
<div><span><span class="gmail_quote"></span>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>ops.... it worked for ex2 and ex2f ;-)</div>
<div> </div>
<div>so what could be wrong? is there some commands or subroutine which i must call? btw, i'm programming in fortran.</div></blockquote></span>
<div><br>Yes, you must call PetscFinalize() in your code.<br><br> Matt <br> </div>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>thank you.<br><br> </div>
<div><span class="gmail_quote">On 2/9/07, <b class="gmail_sendername">Matthew Knepley</b> <<a href="mailto:knepley@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">knepley@gmail.com</a>
> wrote:</span>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;">Problems do not go away by ignoring them. Something is wrong here, and it may<br>affect the rest of your program. Please try to run an example:
<br><br> cd src/ksp/ksp/examples/tutorials<br> make ex2<br> ./ex2 -log_summary
<div><span><br><br> Matt<br><br>
<div><span class="gmail_quote">On 2/9/07, <b class="gmail_sendername">Ben Tay</b> <<a href="mailto:zonexo@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">zonexo@gmail.com</a>> wrote:
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>Well, I don't know what's wrong. I did the same thing for -info and it worked. Anyway, is there any other way?</div>
<div> </div>
<div>Like I can use -mat_view or call matview( ... ) to view a matrix. Is there a similar subroutine for me to call?</div>
<div> </div>
<div>Thank you.<br><br> </div>
<div><span class="gmail_quote">On 2/9/07, <b class="gmail_sendername">Matthew Knepley</b> <<a href="mailto:knepley@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">knepley@gmail.com</a>
> wrote:</span>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;">Impossible, please check the spelling, and make sure your<br>command line was not truncated.
<br> Matt
<div><span class="gmail_quote">On 2/9/07, <b class="gmail_sendername">Ben Tay</b> <<a href="mailto:zonexo@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)"> zonexo@gmail.com</a>> wrote:
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">ya, i did use -log_summary. but no output.....
<div><span class="gmail_quote">On 2/9/07, <b class="gmail_sendername">Barry Smith</b> <<a href="mailto:bsmith@mcs.anl.gov" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">bsmith@mcs.anl.gov</a>> wrote:
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"><br>-log_summary<br><br><br>On Fri, 9 Feb 2007, Ben Tay wrote:<br><br>> Hi,<br>><br>> I've tried to use log_summary but nothing came out? Did I miss out
<br>> something? It worked when I used -info...<br>><br>><br>> On 2/9/07, Lisandro Dalcin <<a href="mailto:dalcinl@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">dalcinl@gmail.com
</a>> wrote:<br>> ><br>> > On 2/8/07, Ben Tay < <a href="mailto:zonexo@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">zonexo@gmail.com</a>> wrote:<br>> > > i'm trying to solve my cfd code using PETSc in parallel. Besides the
<br>> > linear<br>> > > eqns for PETSc, other parts of the code has also been parallelized using <br>> > > MPI.<br>> ><br>> > Finite elements or finite differences, or what?<br>> >
<br>> > > however i find that the parallel version of the code running on 4<br>> > processors<br>> > > is even slower than the sequential version.<br>> ><br>> > Can you monitor the convergence and iteration count of momentum and
<br>> > poisson steps?<br>> ><br>> ><br>> > > in order to find out why, i've used the -info option to print out the <br>> > > details. there are 2 linear equations being solved - momentum and
<br>> > poisson.<br>> > > the momentum one is twice the size of the poisson. it is shown below:<br>> ><br>> > Can you use -log_summary command line option and send the output attached? <br>> >
<br>> > > i saw some statements stating "seq". am i running in sequential or<br>> > parallel<br>> > > mode? have i preallocated too much space?<br>> ><br>> > It seems you are running in parallel. The "Seq" are related to local,
<br>> > internal objects. In PETSc, parallel matrices have inner sequential<br>> > matrices.<br>> ><br>> > > lastly, if Ax=b, A_sta and A_end from MatGetOwnershipRange and b_sta<br>> > and
<br>> > > b_end from VecGetOwnershipRange should always be the same value, right?<br>> ><br>> > I should. If not, you are likely going to get an runtime error.<br>> ><br>> > Regards,<br>> >
<br>> > --<br>> > Lisandro Dalc�n<br>> > ---------------<br>> > Centro Internacional de M�todos Computacionales en Ingenier�a (CIMEC)<br>> > Instituto de Desarrollo Tecnol�gico para la Industria Qu�mica (INTEC)
<br>> > Consejo Nacional de Investigaciones Cient�ficas y T�cnicas (CONICET)<br>> > PTLC - G�emes 3450, (3000) Santa Fe, Argentina<br>> > Tel/Fax: +54-(0)342-451.1594<br>> ><br>> ><br>> </blockquote>
</div><br></span></div></blockquote></div><br><br clear="all"><br></span></div>-- <br>One trouble is that despite this system, anyone who reads journals widely<br>and critically is forced to realize that there are scarcely any bars to eventual
<br>publication. There seems to be no study too fragmented, no hypothesis too<br>trivial, no literature citation too biased or too egotistical, no design too<br>warped, no methodology too bungled, no presentation of results too
<br>inaccurate, too obscure, and too contradictory, no analysis too self-serving,<br>no argument too circular, no conclusions too trifling or too unjustified, and<br>no grammar and syntax too offensive for a paper to end up in print. -- Drummond Rennie
</blockquote></div><br></span></div></blockquote></div><br><br clear="all"><br>-- <br>One trouble is that despite this system, anyone who reads journals widely<br>and critically is forced to realize that there are scarcely any bars to eventual
<br>publication. There seems to be no study too fragmented, no hypothesis too<br>trivial, no literature citation too biased or too egotistical, no design too<br>warped, no methodology too bungled, no presentation of results too
<br>inaccurate, too obscure, and too contradictory, no analysis too self-serving,<br>no argument too circular, no conclusions too trifling or too unjustified, and<br>no grammar and syntax too offensive for a paper to end up in print. -- Drummond Rennie
<div><span><br><br clear="all"><br>-- <br>One trouble is that despite this system, anyone who reads journals widely<br>and critically is forced to realize that there are scarcely any bars to eventual
<br>publication. There seems to be no study too fragmented, no hypothesis too<br>trivial, no literature citation too biased or too egotistical, no design too<br>warped, no methodology too bungled, no presentation of results too
<br>inaccurate, too obscure, and too contradictory, no analysis too self-serving,<br>no argument too circular, no conclusions too trifling or too unjustified, and<br>no grammar and syntax too offensive for a paper to end up in print. -- Drummond Rennie
</span></div></blockquote></div><br><br clear="all"><br>-- <br>One trouble is that despite this system, anyone who reads journals widely<br>and critically is forced to realize that there are scarcely any bars to eventual
<br>publication. There seems to be no study too fragmented, no hypothesis too<br>trivial, no literature citation too biased or too egotistical, no design too<br>warped, no methodology too bungled, no presentation of results too
<br>inaccurate, too obscure, and too contradictory, no analysis too self-serving,<br>no argument too circular, no conclusions too trifling or too unjustified, and<br>no grammar and syntax too offensive for a paper to end up in print. -- Drummond Rennie