[Nek5000-users] startup time nek5000

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Mon Sep 10 15:54:26 CDT 2012


Dear all,
We have just successfully compiled and run nek5000 on another cluster, using
Intel MPI and the corresponding wrappers mpiifort and mpiicc. The code runs
fine, without problem, but it stays for about 10 minutes (using 4096 cores)
during the startup with the following output:
....
gs_setup: 559948 unique labels shared
    pairwise times (avg, min, max): 0.000220039 0.000176096 0.000265098
    crystal router                : 0.000166412 0.000162292 0.000180507
    used all_to_all method: crystal router


Attaching gdb tells me the following location:

(gdb) where
#0  0x00002adafd4b51db in MPIDI_CH3I_Progress () from  
/pdc/vol/intelmpi/4.0.3/lib64/libmpi.so.4
#1  0x00002adafd625fe6 in PMPI_Recv () from  
/pdc/vol/intelmpi/4.0.3/lib64/libmpi.so.4
#2  0x000000000083041c in orthogonalize ()
#3  0x000000000082ed23 in jl_crs_setup ()
#4  0x0000000000831d69 in crs_setup_ ()
#5  0x0000000000632760 in set_up_h1_crs_ ()
#6  0x000000000061feba in set_overlap_ ()
#7  0x000000000040b7c1 in nek_init_ ()
#8  0x000000000040a824 in MAIN__ ()
#9  0x000000000040472c in main ()

As I said, the code runs fine, and very fast, so no problem. Just wanted to
ask whether these 10 minutes in the startup would be to be expected, or
whether we could try to bring that time down a bit. We restart every say 24
hours so it's not a big problem. I have to say that our size is very close
to the memory available per core. 

Thanks,
Philipp




More information about the Nek5000-users mailing list