[Nek5000-users] [] Re: [] Re: Performance problem

Fri Aug 6 14:20:06 CDT 2010

Hi,

   I will check with the sysadmin about rogue processes and get back to
you. But the performance problem for this run was also there in a
completely different cluster (which also has an infiniband
interconnect).

Mani

>
> 2D is generally always fast but more attention is required in 3D.
>
> Regardless of the parameter settings I would concur w/ Stefan's
> analysis that there is something happening on the machine.
>
> Are others occupying the resource during the run ?  (Sometimes
> there are rogue processes on a cluster from a prior run, etc.)
>
> Paul
>
> On Fri, 6 Aug 2010, nek5000-users at lists.mcs.anl.gov wrote:
>
>> This is weird. I have a 2D equivalent of this system, with the same
>> parameters, mesh structure and it is quite fast.
>> On 08/06/2010 06:49 PM, nek5000-users at lists.mcs.anl.gov wrote:
>>> I guess your system has a problem. The TEMP solve takes ~0.14 sec for
>>> step
>>> 6 and 22.2 sec for step 7. Overall step 6 takes 8.4 sec and step 7 51.1
>>> sec
>>> although the iteration counts are very similar!
>>>
>>> Stefan
>>>
>>>
>>>
>>> Check this:
>>>
>>> 0: Step      6, t= 2.0307248E-01, DT= 3.0347043E-05, C=  2.043
>>> 4.8383E+01
>>> 3.0220E+00
>>> 0:              Solving for heat
>>> 0:              Solving for fluid
>>> 0:   0.000000000000000E+000  p22            6           2
>>> 0:           6    Hmholtz TEMP:     64   7.0899E-08   7.6377E+01
>>> 7.7888E-08
>>> 0:           6   2.0307E-01  1.3816E-01 Heat done
>>> 0:   0.000000000000000E+000  p22            6           1
>>> 0:           6    Hmholtz VELX:     57   1.0070E-05   5.8954E+04
>>> 1.2278E-05
>>> 0:   0.000000000000000E+000  p22            6           1
>>> 0:           6    Hmholtz VELY:     56   1.1755E-05   5.8020E+04
>>> 1.2278E-05
>>> 0:   0.000000000000000E+000  p22            6           1
>>> 0:           6    Hmholtz VELZ:     57   1.0011E-05   7.7873E+04
>>> 1.2278E-05
>>> 0:           6    U-Pres gmres:       48   1.6044E-09   2.3009E-09
>>> 2.3009E+00   4.5887E+00   6.8322E+00
>>> 0:            6  DNORM, DIVEX  1.604430720383758E-009
>>> 1.604426975107925E-009
>>> 0:           6   2.0307E-01  7.4035E+00 Fluid done
>>> 0: Step      7, t= 2.0310283E-01, DT= 3.0347043E-05, C=  2.052
>>> 5.6850E+01
>>> 8.4673E+00
>>> 0:              Solving for heat
>>> 0:              Solving for fluid
>>> 0:   0.000000000000000E+000  p22            7           2
>>> 0:           7    Hmholtz TEMP:     64   6.9851E-08   7.6526E+01
>>> 7.8021E-08
>>> 0:           7   2.0310E-01  2.2240E+01 Heat done
>>> 0:   0.000000000000000E+000  p22            7           1
>>> 0:           7    Hmholtz VELX:     57   1.0101E-05   5.8874E+04
>>> 1.2295E-05
>>> 0:   0.000000000000000E+000  p22            7           1
>>> 0:           7    Hmholtz VELY:     56   1.1723E-05   5.7947E+04
>>> 1.2295E-05
>>> 0:   0.000000000000000E+000  p22            7           1
>>> 0:           7    Hmholtz VELZ:     57   9.9682E-06   7.7881E+04
>>> 1.2295E-05
>>> 0:           7    U-Pres gmres:       48   1.6001E-09   2.2892E-09
>>> 2.2892E+00   1.9881E+00   3.3138E+00
>>> 0:            7  DNORM, DIVEX  1.600110264916237E-009
>>> 1.600109913837109E-009
>>> 0:           7   2.0310E-01  1.9966E+01 Fluid done
>>> 0: Step      8, t= 2.0313318E-01, DT= 3.0347043E-05, C=  2.060
>>> 1.0837E+02
>>> 5.1516E+01
>>>
>>> On Aug 6, 2010, at 3:08 PM,<nek5000-users at lists.mcs.anl.gov>  wrote:
>>>
>>>> Mani,
>>>>
>>>> I think there must be something else wrong ... I'm seeing about
>>>> 5 sec/step on a 64 proc. linux cluster.
>>>>
>>>> If you'd like to send me a gzip'd file w/ the essentials, contact
>>>> me off-list (fischer at mcs.anl.gov) and I can take a closer look.
>>>>
>>>> Paul
>>>>
>>>>
>>>> On Fri, 6 Aug 2010, nek5000-users at lists.mcs.anl.gov wrote:
>>>>
>>>>> If you set p94&  p95 to 5 (say), I recommend strongly to set p12 (dt)
>>>>> to
>>>>>
>>>>>   -3e-5
>>>>>
>>>>> in your case.   The reason for this is that the projection scheme is
>>>>> much
>>>>> more stable for fixed dt.
>>>>>
>>>>> On the whole, however, Stefan's earlier comments about using, say,
>>>>> lx1=8
>>>>> and fewer elements is a better strategy.   It's also possible that we
>>>>> should switch your coarse-grid solve at this scale to AMG.
>>>>>
>>>>> Paul
>>>>>
>>>>>
>>>>> On Fri, 6 Aug 2010, nek5000-users at lists.mcs.anl.gov wrote:
>>>>>
>>>>>>> - set param(102) and param(103) to 5 (this will turn on the
>>>>>>> residual
>>>>>>> projection)
>>>>>>    Should this be param 94&  95 ?
>>>>>>
>>>>>>    - Paul
>>>>>> On Fri, 6 Aug 2010, nek5000-users at lists.mcs.anl.gov wrote:
>>>>>>> Ok here are some suggestions to improve the performance:
>>>>>>> - set timestep, param(12), to -3e-5
>>>>>>> - set param(102) and param(103) to 5 (this will turn on the
>>>>>>> residual
>>>>>>> projection)
>>>>>>> - increase lgmres (in SIZE) to 40
>>>>>>> - you have want to tune the Helmholtz (velocity) and pressure
>>>>>>> tolerance
>>>>>>> (e.g. 1e-8 and 1e-5)
>>>>>>> btw: what's the Reynolds number of this flow?
>>>>>>> Stefan
>>>>>>> On Aug 6, 2010, at 11:13 AM,<nek5000-users at lists.mcs.anl.gov>
>>>>>>> <nek5000-users at lists.mcs.anl.gov>  wrote:
>>>>>>>> Dear Mani,
>>>>>>>> I haven't checked your logfile yet but there are my first
>>>>>>>> thoughts:
>>>>>>>> N=4 is low
>>>>>>>> Your polynomial order (N=4) is low and the tensor-product
>>>>>>>> formulation
>>>>>>>> won't buy you much. The performance of all matrix-matrix
>>>>>>>> multiplies
>>>>>>>> (MxM) will limited by the memory access times. This is in
>>>>>>>> particular a
>>>>>>>> problem on multi-core and multi-socket machines. We have seen that
>>>>>>>> the
>>>>>>>> performance drop can be significant.
>>>>>>>> On top of that you carry around a large number of duplicate DOF
>>>>>>>> and
>>>>>>>> your surface to volume ratio is high (more communication). I
>>>>>>>> Parallel Performance
>>>>>>>> Your gridpoints per core (~4700) is quite small!
>>>>>>>> On Blue Gene (BG) systems we can scale well (e.g. 70-80% parallel
>>>>>>>> efficiency) with around 10k gridpoints per core. On other system
>>>>>>>> (e.g.
>>>>>>>> Cray XT5) you need much more gridpoints per core (say 80k) because
>>>>>>>> the
>>>>>>>> network has a higher latency (NEK is sensitive to latency not
>>>>>>>> bandwidth) and the processors are much faster.
>>>>>>>> Cheers,
>>>>>>>> Stefan
>>>>>>>> On Aug 6, 2010, at 10:51 AM,<nek5000-users at lists.mcs.anl.gov>
>>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>   I'm solving for Rayleigh-Benard convection in a 3D box of
>>>>>>>>> 37632,
>>>>>>>>> 4rth order elements. I fired the job on 512 processors on a
>>>>>>>>> machine
>>>>>>>>> with quad-core, quad socket configuration (32 nodes with 16 cores
>>>>>>>>> each ) with a 20 Gbps infiniband interconnect. In 12 hours it has
>>>>>>>>> run
>>>>>>>>> 163 time steps. Is this normal or is there maybe some way to
>>>>>>>>> improve
>>>>>>>>> performance? Attached is the SIZE file.
>>>>>>>>> Regards,
>>>>>>>>> Mani chandra
>>>>>>>>> <SIZE.txt>_______________________________________________
>>>>>>>>> Nek5000-users mailing list
>>>>>>>>> Nek5000-users at lists.mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>>>>>>> _______________________________________________
>>>>>>>> Nek5000-users mailing list
>>>>>>>> Nek5000-users at lists.mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>>>>>> _______________________________________________
>>>>>>> Nek5000-users mailing list
>>>>>>> Nek5000-users at lists.mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>>>>> _______________________________________________
>>>>>> Nek5000-users mailing list
>>>>>> Nek5000-users at lists.mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>>>> _______________________________________________
>>>>> Nek5000-users mailing list
>>>>> Nek5000-users at lists.mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>>>>
>>>> _______________________________________________
>>>> Nek5000-users mailing list
>>>> Nek5000-users at lists.mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>> _______________________________________________
>>> Nek5000-users mailing list
>>> Nek5000-users at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>>
>>
>> _______________________________________________
>> Nek5000-users mailing list
>> Nek5000-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>

[Nek5000-users] [*] Re: [*] Re: Performance problem

[Nek5000-users] [] Re: [] Re: Performance problem