[petsc-users] Configuring petsc with MPI on ubuntu quad-core
Barry Smith
bsmith at mcs.anl.gov
Wed Feb 2 18:06:29 CST 2011
We need all the information from -log_summary to see what is going on.
Not sure what -grid 20 means but don't expect any good parallel performance with less than at least 10,000 unknowns per process.
Barry
On Feb 2, 2011, at 5:38 PM, Vijay S. Mahadevan wrote:
> Here's the performance statistic on 1 and 2 processor runs.
>
> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 1 ./ex20 -grid 20 -log_summary
>
> Max Max/Min Avg Total
> Time (sec): 8.452e+00 1.00000 8.452e+00
> Objects: 1.470e+02 1.00000 1.470e+02
> Flops: 5.045e+09 1.00000 5.045e+09 5.045e+09
> Flops/sec: 5.969e+08 1.00000 5.969e+08 5.969e+08
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 4.440e+02 1.00000
>
> /usr/lib/petsc/linux-gnu-cxx-opt/bin/mpiexec -n 2 ./ex20 -grid 20 -log_summary
>
> Max Max/Min Avg Total
> Time (sec): 7.851e+00 1.00000 7.851e+00
> Objects: 2.000e+02 1.00000 2.000e+02
> Flops: 4.670e+09 1.00580 4.657e+09 9.313e+09
> Flops/sec: 5.948e+08 1.00580 5.931e+08 1.186e+09
> MPI Messages: 7.965e+02 1.00000 7.965e+02 1.593e+03
> MPI Message Lengths: 1.412e+07 1.00000 1.773e+04 2.824e+07
> MPI Reductions: 1.046e+03 1.00000
>
> I am not entirely sure if I can make sense out of that statistic but
> if there is something more you need, please feel free to let me know.
>
> Vijay
>
> On Wed, Feb 2, 2011 at 5:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
>> On Wed, Feb 2, 2011 at 5:04 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>> wrote:
>>>
>>> Matt,
>>>
>>> The -with-debugging=1 option is certainly not meant for performance
>>> studies but I didn't expect it to yield the same cpu time as a single
>>> processor for snes/ex20 i.e., my runs with 1 and 2 processors take
>>> approximately the same amount of time for computation of solution. But
>>> I am currently configuring without debugging symbols and shall let you
>>> know what that yields.
>>>
>>> On a similar note, is there something extra that needs to be done to
>>> make use of multi-core machines while using MPI ? I am not sure if
>>> this is even related to PETSc but could be an MPI configuration option
>>> that maybe either I or the configure process is missing. All ideas are
>>> much appreciated.
>>
>> Sparse MatVec (MatMult) is a memory bandwidth limited operation. On most
>> cheap multicore machines, there is a single memory bus, and thus using more
>> cores gains you very little extra performance. I still suspect you are not
>> actually
>> running in parallel, because you usually see a small speedup. That is why I
>> suggested looking at -log_summary since it tells you how many processes were
>> run and breaks down the time.
>> Matt
>>
>>>
>>> Vijay
>>>
>>> On Wed, Feb 2, 2011 at 4:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>> On Wed, Feb 2, 2011 at 4:46 PM, Vijay S. Mahadevan <vijay.m at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to configure my petsc install with an MPI installation to
>>>>> make use of a dual quad-core desktop system running Ubuntu. But
>>>>> eventhough the configure/make process went through without problems,
>>>>> the scalability of the programs don't seem to reflect what I expected.
>>>>> My configure options are
>>>>>
>>>>> --download-f-blas-lapack=1 --with-mpi-dir=/usr/lib/ --download-mpich=1
>>>>> --with-mpi-shared=0 --with-shared=0 --COPTFLAGS=-g
>>>>> --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1
>>>>> --download-blacs=1 --download-scalapack=1 --with-clanguage=C++
>>>>> --download-plapack=1 --download-mumps=1 --download-umfpack=yes
>>>>> --with-debugging=1 --with-errorchecking=yes
>>>>
>>>> 1) For performance studies, make a build using --with-debugging=0
>>>> 2) Look at -log_summary for a breakdown of performance
>>>> Matt
>>>>
>>>>>
>>>>> Is there something else that needs to be done as part of the configure
>>>>> process to enable a decent scaling ? I am only comparing programs with
>>>>> mpiexec (-n 1) and (-n 2) but they seem to be taking approximately the
>>>>> same time as noted from -log_summary. If it helps, I've been testing
>>>>> with snes/examples/tutorials/ex20.c for all purposes with a custom
>>>>> -grid parameter from command-line to control the number of unknowns.
>>>>>
>>>>> If there is something you've witnessed before in this configuration or
>>>>> if you need anything else to analyze the problem, do let me know.
>>>>>
>>>>> Thanks,
>>>>> Vijay
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments
>>>> is infinitely more interesting than any results to which their
>>>> experiments
>>>> lead.
>>>> -- Norbert Wiener
>>>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their experiments
>> is infinitely more interesting than any results to which their experiments
>> lead.
>> -- Norbert Wiener
>>
More information about the petsc-users
mailing list