On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <span dir="ltr"><<a href="mailto:vijay.m@gmail.com">vijay.m@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Matt,<br>
<br>
The -with-debugging=1 option is certainly not meant for performance<br>
studies but I didn't expect it to yield the same cpu time as a single<br>
processor for snes/ex20 i.e., my runs with 1 and 2 processors take<br>
approximately the same amount of time for computation of solution. But<br>
I am currently configuring without debugging symbols and shall let you<br>
know what that yields.<br>
<br>
On a similar note, is there something extra that needs to be done to<br>
make use of multi-core machines while using MPI ? I am not sure if<br>
this is even related to PETSc but could be an MPI configuration option<br>
that maybe either I or the configure process is missing. All ideas are<br>
much appreciated.</blockquote><div><br></div><div>Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most</div><div>cheap multicore machines, there is a single memory bus, and thus using more</div><div>cores gains you very little extra performance. I still suspect you are not actually</div>
<div>running in parallel, because you usually see a small speedup. That is why I</div><div>suggested looking at -log_summary since it tells you how many processes were</div><div>run and breaks down the time.</div><div><br>
</div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><font color="#888888"><br>
Vijay<br>
</font><div><div></div><div class="h5"><br>
On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br>
> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <<a href="mailto:vijay.m@gmail.com">vijay.m@gmail.com</a>><br>
> wrote:<br>
>><br>
>> Hi,<br>
>><br>
>> I am trying to configure my petsc install with an MPI installation to<br>
>> make use of a dual quad-core desktop system running Ubuntu. But<br>
>> eventhough the configure/make process went through without problems,<br>
>> the scalability of the programs don't seem to reflect what I expected.<br>
>> My configure options are<br>
>><br>
>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1<br>
>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g<br>
>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1<br>
>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++<br>
>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes<br>
>> --with-debugging=1 --with-errorchecking=yes<br>
><br>
> 1) For performance studies, make a build using --with-debugging=0<br>
> 2) Look at -log_summary for a breakdown of performance<br>
> Matt<br>
><br>
>><br>
>> Is there something else that needs to be done as part of the configure<br>
>> process to enable a decent scaling ? I am only comparing programs with<br>
>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the<br>
>> same time as noted from -log_summary. If it helps, I've been testing<br>
>> with snes/examples/tutorials/ex20.c for all purposes with a custom<br>
>> -grid parameter from command-line to control the number of unknowns.<br>
>><br>
>> If there is something you've witnessed before in this configuration or<br>
>> if you need anything else to analyze the problem, do let me know.<br>
>><br>
>> Thanks,<br>
>> Vijay<br>
><br>
><br>
><br>
> --<br>
> What most experimenters take for granted before they begin their experiments<br>
> is infinitely more interesting than any results to which their experiments<br>
> lead.<br>
> -- Norbert Wiener<br>
><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>