[petsc-users] Configuring petsc with MPI on ubuntu quad-core

Barry Smith bsmith at mcs.anl.gov
Wed Feb 2 18:06:29 CST 2011


  We need all the information from -log_summary to see what is going on.

  Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process.

   Barry

On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:

> Here's the performance statistic on 1 and 2 processor runs.
> 
> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary
> 
>                         Max       Max/Min        Avg      Total
> Time (sec):           8.452e+00      1.00000   8.452e+00
> Objects:              1.470e+02      1.00000   1.470e+02
> Flops:                5.045e+09      1.00000   5.045e+09  5.045e+09
> Flops/sec:            5.969e+08      1.00000   5.969e+08  5.969e+08
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       4.440e+02      1.00000
> 
> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary
> 
>                         Max       Max/Min        Avg      Total
> Time (sec):           7.851e+00      1.00000   7.851e+00
> Objects:              2.000e+02      1.00000   2.000e+02
> Flops:                4.670e+09      1.00580   4.657e+09  9.313e+09
> Flops/sec:            5.948e+08      1.00580   5.931e+08  1.186e+09
> MPI Messages:         7.965e+02      1.00000   7.965e+02  1.593e+03
> MPI Message Lengths:  1.412e+07      1.00000   1.773e+04  2.824e+07
> MPI Reductions:       1.046e+03      1.00000
> 
> I am not entirely sure if I can make sense out of that statistic but
> if there is something more you need, please feel free to let me know.
> 
> Vijay
> 
> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>> wrote:
>>> 
>>> Matt,
>>> 
>>> The -with-debugging=1 option is certainly not meant for performance
>>> studies but I didn't expect it to yield the same cpu time as a single
>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
>>> approximately the same amount of time for computation of solution. But
>>> I am currently configuring without debugging symbols and shall let you
>>> know what that yields.
>>> 
>>> On a similar note, is there something extra that needs to be done to
>>> make use of multi-core machines while using MPI ? I am not sure if
>>> this is even related to PETSc but could be an MPI configuration option
>>> that maybe either I or the configure process is missing. All ideas are
>>> much appreciated.
>> 
>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most
>> cheap multicore machines, there is a single memory bus, and thus using more
>> cores gains you very little extra performance. I still suspect you are not
>> actually
>> running in parallel, because you usually see a small speedup. That is why I
>> suggested looking at -log_summary since it tells you how many processes were
>> run and breaks down the time.
>>    Matt
>> 
>>> 
>>> Vijay
>>> 
>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am trying to configure my petsc install with an MPI installation to
>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>> eventhough the configure/make process went through without problems,
>>>>> the scalability of the programs don't seem to reflect what I expected.
>>>>> My configure options are
>>>>> 
>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>> --with-debugging=1 --with-errorchecking=yes
>>>> 
>>>> 1) For performance studies, make a build using --with-debugging=0
>>>> 2) Look at -log_summary for a breakdown of performance
>>>>    Matt
>>>> 
>>>>> 
>>>>> Is there something else that needs to be done as part of the configure
>>>>> process to enable a decent scaling ? I am only comparing programs with
>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
>>>>> same time as noted from -log_summary. If it helps, I've been testing
>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom
>>>>> -grid parameter from command-line to control the number of unknowns.
>>>>> 
>>>>> If there is something you've witnessed before in this configuration or
>>>>> if you need anything else to analyze the problem, do let me know.
>>>>> 
>>>>> Thanks,
>>>>> Vijay
>>>> 
>>>> 
>>>> 
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments
>>>> is infinitely more interesting than any results to which their
>>>> experiments
>>>> lead.
>>>> -- Norbert Wiener
>>>> 
>> 
>> 
>> 
>> --
>> What most experimenters take for granted before they begin their experiments
>> is infinitely more interesting than any results to which their experiments
>> lead.
>> -- Norbert Wiener
>> 



More information about the petsc-users mailing list