[Nek5000-users] Performance problem
nek5000-users at lists.mcs.anl.gov
nek5000-users at lists.mcs.anl.gov
Fri Aug 6 06:09:13 CDT 2010
If you set p94 & p95 to 5 (say), I recommend strongly to set p12 (dt) to
-3e-5
in your case. The reason for this is that the projection scheme is much
more stable for fixed dt.
On the whole, however, Stefan's earlier comments about using, say, lx1=8
and fewer elements is a better strategy. It's also possible that we
should switch your coarse-grid solve at this scale to AMG.
Paul
On Fri, 6 Aug 2010, nek5000-users at lists.mcs.anl.gov wrote:
>
>
>> - set param(102) and param(103) to 5 (this will turn on the residual
>> projection)
>
> Should this be param 94 & 95 ?
>
> - Paul
>
>
>
>
> On Fri, 6 Aug 2010, nek5000-users at lists.mcs.anl.gov wrote:
>
>> Ok here are some suggestions to improve the performance:
>>
>> - set timestep, param(12), to -3e-5
>> - set param(102) and param(103) to 5 (this will turn on the residual
>> projection)
>> - increase lgmres (in SIZE) to 40
>> - you have want to tune the Helmholtz (velocity) and pressure tolerance
>> (e.g. 1e-8 and 1e-5)
>>
>> btw: what's the Reynolds number of this flow?
>>
>>
>> Stefan
>>
>>
>> On Aug 6, 2010, at 11:13 AM, <nek5000-users at lists.mcs.anl.gov>
>> <nek5000-users at lists.mcs.anl.gov> wrote:
>>
>>> Dear Mani,
>>>
>>> I haven't checked your logfile yet but there are my first thoughts:
>>>
>>> N=4 is low
>>> Your polynomial order (N=4) is low and the tensor-product formulation
>>> won't buy you much. The performance of all matrix-matrix multiplies (MxM)
>>> will limited by the memory access times. This is in particular a problem
>>> on multi-core and multi-socket machines. We have seen that the performance
>>> drop can be significant.
>>> On top of that you carry around a large number of duplicate DOF and your
>>> surface to volume ratio is high (more communication). I
>>>
>>>
>>> Parallel Performance
>>> Your gridpoints per core (~4700) is quite small!
>>> On Blue Gene (BG) systems we can scale well (e.g. 70-80% parallel
>>> efficiency) with around 10k gridpoints per core. On other system (e.g.
>>> Cray XT5) you need much more gridpoints per core (say 80k) because the
>>> network has a higher latency (NEK is sensitive to latency not bandwidth)
>>> and the processors are much faster.
>>>
>>> Cheers,
>>> Stefan
>>>
>>> On Aug 6, 2010, at 10:51 AM, <nek5000-users at lists.mcs.anl.gov> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm solving for Rayleigh-Benard convection in a 3D box of 37632, 4rth
>>>> order elements. I fired the job on 512 processors on a machine with
>>>> quad-core, quad socket configuration (32 nodes with 16 cores each ) with
>>>> a 20 Gbps infiniband interconnect. In 12 hours it has run 163 time steps.
>>>> Is this normal or is there maybe some way to improve performance?
>>>> Attached is the SIZE file.
>>>>
>>>> Regards,
>>>> Mani chandra
>>>>
>>>> <SIZE.txt>_______________________________________________
>>>> Nek5000-users mailing list
>>>> Nek5000-users at lists.mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>>
>>> _______________________________________________
>>> Nek5000-users mailing list
>>> Nek5000-users at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>
>> _______________________________________________
>> Nek5000-users mailing list
>> Nek5000-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>
More information about the Nek5000-users
mailing list