[Nek5000-users] NEK gets stuck

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Tue Oct 11 15:23:18 CDT 2011


Thanks Mani. Let me check them and I'll come back to you.
We may need to do some heavy debugging sessions. Any chance to get
access to that machine? Please contact me off-list to arrange the
details.

-Stefan

On 10/11/11, nek5000-users at lists.mcs.anl.gov
<nek5000-users at lists.mcs.anl.gov> wrote:
> Hi Stefan,
>
> I've uploaded the log files on the following links:
>
> 64 processors with SEMG disabled: https://gist.github.com/1279232
> 128 processors with SEMG disabled: https://gist.github.com/1279256
> 256 processors with SEMG disabled: https://gist.github.com/1279237
> 216 processors with SEMG enabled: https://gist.github.com/1279239
>
> It works with 64 processors and then gets stuck from 128 onwards.
>
> Mani
>
> On Tue, Oct 11, 2011 at 7:06 PM, <nek5000-users at lists.mcs.anl.gov> wrote:
>
>> Doesn't sound like a memory problem given 4GB of memory per core and a
>> total static data size of ~350MB (according to the output of size).
>> The size of the executable doesn't matter in this case.
>>
>> - Can you post your logfile again (for the case where the SEMG was
>> disabled).
>> - What's the lowest number of processors you can reproduce the problem
>> (try with lx1=4)
>>
>> -Stefan
>>
>> On 10/11/11, nek5000-users at lists.mcs.anl.gov
>> <nek5000-users at lists.mcs.anl.gov> wrote:
>> > Hi Stefan,
>> >
>> > Each node has 64 GB of RAM. There are 16 cores in each node. Each core
>> has
>> > the 4096 KB of cache. The size of the executable 'nek5000' is 5.6 MB. I
>> > tried running with p43=1 and it still gets stuck. The full
>> > specifications
>> of
>> > each core are given below:
>> >
>> > vendor_id       : GenuineIntel
>> > cpu family      : 6
>> > model           : 15
>> > model name      : Intel(R) Xeon(R) CPU           X7350  @ 2.93GHz
>> > stepping        : 11
>> > cpu MHz         : 2933.445
>> > cache size      : 4096 KB
>> > physical id     : 6
>> > siblings        : 4
>> > core id         : 3
>> > cpu cores       : 4
>> > fpu             : yes
>> > fpu_exception   : yes
>> > cpuid level     : 10
>> > wp              : yes
>> > flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>> mca
>> > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm
>> > constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>> > bogomips        : 5866.92
>> > clflush size    : 64
>> > cache_alignment : 64
>> > address sizes   : 40 bits physical, 48 bits virtual
>> >
>> > Thanks,
>> > Mani
>> >
>> >
>> > On Mon, Oct 10, 2011 at 11:56 PM, <nek5000-users at lists.mcs.anl.gov>
>> wrote:
>> >
>> >> What's the memory size per core?
>> >>
>> >> Sure p43=0 is correct if you want to use the multilevel Schwarz
>> >> solver. Just as a cross check: set p43=1 and try again.
>> >>
>> >> On 10/10/11, nek5000-users at lists.mcs.anl.gov
>> >> <nek5000-users at lists.mcs.anl.gov> wrote:
>> >> > Hi Stefan,
>> >> >
>> >> > The following is the output of 'size nek5000'
>> >> >
>> >> >    text    data     bss     dec     hex filename
>> >> > 5163006   59896 333337824       338560726       142e06d6
>>  nek5000
>> >> >
>> >> >
>> >> > In the .rea file, p43 has been set to 0.
>> >> >
>> >> > Mani
>> >> >
>> >> _______________________________________________
>> >> Nek5000-users mailing list
>> >> Nek5000-users at lists.mcs.anl.gov
>> >> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>> >>
>> >
>> _______________________________________________
>> Nek5000-users mailing list
>> Nek5000-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>
>



More information about the Nek5000-users mailing list