[Nek5000-users] startup time nek5000

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Tue Sep 11 07:29:02 CDT 2012


Dear Paul,
the case that I am talking about is our largest pipe DNS with about 1.2
million elements with polynomial order 11. The reason why I was asking is
that we have not observed such a "long" time spent in the simulation
startup, also not for the same case run on other architectures (for instance
Cray XE6 with even more processes). Therefore I was suspecting either
something with the IntelMPI or the infiniband-type network...

Best regards, Philipp

-----Original Message-----
From: nek5000-users-bounces at lists.mcs.anl.gov
[mailto:nek5000-users-bounces at lists.mcs.anl.gov] On Behalf Of
nek5000-users at lists.mcs.anl.gov
Sent: den 10 september 2012 23:09
To: nek5000-users at lists.mcs.anl.gov
Subject: Re: [Nek5000-users] startup time nek5000


Dear Philipp,

This is generally expected for the direct, XX^T-based, coarse
grid solve.   How many elements in your problem?

The only alternative is to switch to AMG, but that is less automatic than
XXT at this point.  (It is faster for some problems, but I don't think it's
faster for your class of problems.  By "faster" here I refer to the
execution phase rather than the setup costs.)

Best regards,

Paul



On Mon, 10 Sep 2012, nek5000-users at lists.mcs.anl.gov wrote:

> Dear all,
> We have just successfully compiled and run nek5000 on another cluster, 
> using Intel MPI and the corresponding wrappers mpiifort and mpiicc. 
> The code runs fine, without problem, but it stays for about 10 minutes 
> (using 4096 cores) during the startup with the following output:
> ....
> gs_setup: 559948 unique labels shared
>    pairwise times (avg, min, max): 0.000220039 0.000176096 0.000265098
>    crystal router                : 0.000166412 0.000162292 0.000180507
>    used all_to_all method: crystal router
>
>
> Attaching gdb tells me the following location:
>
> (gdb) where
> #0  0x00002adafd4b51db in MPIDI_CH3I_Progress () from
> /pdc/vol/intelmpi/4.0.3/lib64/libmpi.so.4
> #1  0x00002adafd625fe6 in PMPI_Recv () from
> /pdc/vol/intelmpi/4.0.3/lib64/libmpi.so.4
> #2  0x000000000083041c in orthogonalize ()
> #3  0x000000000082ed23 in jl_crs_setup ()
> #4  0x0000000000831d69 in crs_setup_ ()
> #5  0x0000000000632760 in set_up_h1_crs_ ()
> #6  0x000000000061feba in set_overlap_ ()
> #7  0x000000000040b7c1 in nek_init_ ()
> #8  0x000000000040a824 in MAIN__ ()
> #9  0x000000000040472c in main ()
>
> As I said, the code runs fine, and very fast, so no problem. Just 
> wanted to ask whether these 10 minutes in the startup would be to be 
> expected, or whether we could try to bring that time down a bit. We 
> restart every say 24 hours so it's not a big problem. I have to say 
> that our size is very close to the memory available per core.
>
> Thanks,
> Philipp
>
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>
_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users




More information about the Nek5000-users mailing list