[petsc-users] petsc example with known scalability
Matthew Knepley
knepley at gmail.com
Fri May 18 18:47:17 CDT 2012
On Fri, May 18, 2012 at 7:43 PM, Mohammad Mirzadeh <mirzadeh at gmail.com>wrote:
> I see; well that's a fair point. So i have my timing results obtained via
> -log_summary; what should I be looking into for MatMult? Should I be
> looking at wall timings? Or do I need to look into MFlops/s? I'm sorry but
> I'm not sure what measure I should be looking into to determine scalability.
Time is only meaningful in isolation if I know how big your matrix is, but
you obviously take the ratio to look how it is scaling. I am
assuming you are looking at weak scalability so it should remain constant.
MF/s will let you know how the routine is performing
independent of size, and thus is an easy way to see what is happening. It
should scale like P, and when that drops off you have
insufficient bandwidth. VecMDot is a good way to look at the latency of
reductions (assuming you use GMRES). There is indeed no
good guide to this. Barry should write one.
Matt
> Also, is there any general meaningful advice one could give? in terms of
> using the resources, compiler flags (beyond -O3), etc?
> Thanks,
> Mohammad
> On Fri, May 18, 2012 at 4:18 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Fri, May 18, 2012 at 7:06 PM, Mohammad Mirzadeh <mirzadeh at gmail.com>wrote:
>>
>>> Hi guys,
>>> I'm trying to generate scalability plots for my code and do profiling
>>> and fine tuning. In doing so I have noticed that some of the factors
>>> affecting my results are sort of subtle. For example, I figured, the other
>>> day, that using all of the cores on a single node is somewhat (50-60%)
>>> slower when compared to using only half of the cores which I suspect is due
>>> to memory bandwidth and/or other hardware-related issues.
>>> So I thought to ask and see if there is any example in petsc that has
>>> been tested for scalability and has been documented? Basically I want to
>>> use this test example as a benchmark to compare my results with. My own
>>> test code is currently a linear Poisson solver on an adaptive quadtree grid
>>> and involves non-trivial geometry (well basically a circle for the boundary
>>> but still not a simple box).
>>>
>>
>> Unfortunately, I do not even know what that means. We can't guarantee a
>> certain level of performance because it not
>> only depends on the hardware, but how you use it (as evident in your
>> case). In a perfect world, we would have an abstract
>> model of the computation (available for MatMult) and your machine (not
>> available anywhere) and we would automatically
>> work out the consequences and tell you what to expect. Instead today, we
>> tell you to look at a few key indicators like the
>> MatMult event, to see what is going on. When MatMult stops scaling, you
>> have run out of bandwidth.
>>
>> Matt
>>
>>> Thanks,
>>> Mohammad
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
