[Nek5000-users] NEK gets stuck
nek5000-users at lists.mcs.anl.gov
nek5000-users at lists.mcs.anl.gov
Tue Oct 11 15:23:18 CDT 2011
Thanks Mani. Let me check them and I'll come back to you.
We may need to do some heavy debugging sessions. Any chance to get
access to that machine? Please contact me off-list to arrange the
details.
-Stefan
On 10/11/11, nek5000-users at lists.mcs.anl.gov
<nek5000-users at lists.mcs.anl.gov> wrote:
> Hi Stefan,
>
> I've uploaded the log files on the following links:
>
> 64 processors with SEMG disabled: https://gist.github.com/1279232
> 128 processors with SEMG disabled: https://gist.github.com/1279256
> 256 processors with SEMG disabled: https://gist.github.com/1279237
> 216 processors with SEMG enabled: https://gist.github.com/1279239
>
> It works with 64 processors and then gets stuck from 128 onwards.
>
> Mani
>
> On Tue, Oct 11, 2011 at 7:06 PM, <nek5000-users at lists.mcs.anl.gov> wrote:
>
>> Doesn't sound like a memory problem given 4GB of memory per core and a
>> total static data size of ~350MB (according to the output of size).
>> The size of the executable doesn't matter in this case.
>>
>> - Can you post your logfile again (for the case where the SEMG was
>> disabled).
>> - What's the lowest number of processors you can reproduce the problem
>> (try with lx1=4)
>>
>> -Stefan
>>
>> On 10/11/11, nek5000-users at lists.mcs.anl.gov
>> <nek5000-users at lists.mcs.anl.gov> wrote:
>> > Hi Stefan,
>> >
>> > Each node has 64 GB of RAM. There are 16 cores in each node. Each core
>> has
>> > the 4096 KB of cache. The size of the executable 'nek5000' is 5.6 MB. I
>> > tried running with p43=1 and it still gets stuck. The full
>> > specifications
>> of
>> > each core are given below:
>> >
>> > vendor_id : GenuineIntel
>> > cpu family : 6
>> > model : 15
>> > model name : Intel(R) Xeon(R) CPU X7350 @ 2.93GHz
>> > stepping : 11
>> > cpu MHz : 2933.445
>> > cache size : 4096 KB
>> > physical id : 6
>> > siblings : 4
>> > core id : 3
>> > cpu cores : 4
>> > fpu : yes
>> > fpu_exception : yes
>> > cpuid level : 10
>> > wp : yes
>> > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>> mca
>> > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm
>> > constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>> > bogomips : 5866.92
>> > clflush size : 64
>> > cache_alignment : 64
>> > address sizes : 40 bits physical, 48 bits virtual
>> >
>> > Thanks,
>> > Mani
>> >
>> >
>> > On Mon, Oct 10, 2011 at 11:56 PM, <nek5000-users at lists.mcs.anl.gov>
>> wrote:
>> >
>> >> What's the memory size per core?
>> >>
>> >> Sure p43=0 is correct if you want to use the multilevel Schwarz
>> >> solver. Just as a cross check: set p43=1 and try again.
>> >>
>> >> On 10/10/11, nek5000-users at lists.mcs.anl.gov
>> >> <nek5000-users at lists.mcs.anl.gov> wrote:
>> >> > Hi Stefan,
>> >> >
>> >> > The following is the output of 'size nek5000'
>> >> >
>> >> > text data bss dec hex filename
>> >> > 5163006 59896 333337824 338560726 142e06d6
>> nek5000
>> >> >
>> >> >
>> >> > In the .rea file, p43 has been set to 0.
>> >> >
>> >> > Mani
>> >> >
>> >> _______________________________________________
>> >> Nek5000-users mailing list
>> >> Nek5000-users at lists.mcs.anl.gov
>> >> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>> >>
>> >
>> _______________________________________________
>> Nek5000-users mailing list
>> Nek5000-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>>
>
More information about the Nek5000-users
mailing list